Enable Semantic Search

Prerequisites

OpenSearch as your search backend (Elasticsearch is not supported)
An external embedding provider: OpenAI or AWS Bedrock, or DJL for HuggingFace models.
Network access from the OpenMetadata server to the embedding provider API (unless using DJL)

Overview

Semantic Search enhances OpenMetadata’s search capabilities by using vector embeddings to understand the meaning behind queries, rather than relying solely on keyword matching. This means users and AI agents can search using natural language — for example, “tables with customer demographics and purchase history” — and get meaningful results even if those exact words don’t appear in the metadata.

Semantic Search is currently supported only with OpenSearch as the search backend.

Semantic Search also powers the Semantic Search MCP tool, enabling AI assistants connected via the Model Context Protocol to perform natural language queries against your metadata catalog.

How It Works

Text Construction

For each entity, a structured text representation is constructed from its metadata — including name, description, entity type, tags, glossary terms, owners, and other relevant fields.

Embedding Generation & Vector Indexing

The text is sent to the configured embedding provider to generate a numerical vector (embedding), which is stored in a dedicated OpenSearch vector_search_index using the HNSW algorithm with cosine similarity. At query time, the search text is also embedded and a KNN (K-Nearest Neighbor) similarity search finds the most relevant results.

Automatic Lifecycle Management

Embeddings follow the same lifecycle as the entities themselves. When entities are created, updated, deleted, or restored, their embeddings are automatically kept in sync using the same indexing strategies the platform already uses for search. No manual intervention is required after initial setup.

Supported Entity Types

table, glossary, glossaryTerm, chart, dashboard, dashboardDataModel, database, databaseSchema, dataProduct, pipeline, mlmodel, metric, apiEndpoint, apiCollection, page, storedProcedure, searchIndex, topic

Configuration

Semantic Search is configured in openmetadata.yaml under the elasticsearch.naturalLanguageSearch section. All settings can be overridden with environment variables.

Enable Semantic Search

Environment Variable	Default	Description
`SEMANTIC_SEARCH_ENABLED`	`false`	Master switch to enable semantic search
`EMBEDDING_PROVIDER`	`bedrock`	Embedding provider to use: `openai`, `bedrock`, or `djl`

elasticsearch:
  naturalLanguageSearch:
    semanticSearchEnabled: ${SEMANTIC_SEARCH_ENABLED:-false}
    embeddingProvider: ${EMBEDDING_PROVIDER:-bedrock}

Embedding Providers

Choose one of the following embedding providers and configure it accordingly.

OpenAI
AWS Bedrock
DJL

Supports both OpenAI and Azure OpenAI endpoints.

Environment Variable	Default	Description
`OPENAI_API_KEY`	`""`	Your OpenAI API key
`OPENAI_API_ENDPOINT`	`""`	API endpoint. For Azure, use `https://your-resource.openai.azure.com`
`OPENAI_DEPLOYMENT_NAME`	`""`	Deployment name (required for Azure OpenAI)
`OPENAI_API_VERSION`	`2024-02-01`	API version (Azure OpenAI)
`OPENAI_EMBEDDING_MODEL_ID`	`text-embedding-3-small`	Embedding model to use
`OPENAI_EMBEDDING_DIMENSION`	`1536`	Embedding vector dimension

elasticsearch:
  naturalLanguageSearch:
    semanticSearchEnabled: true
    embeddingProvider: openai
    openai:
      apiKey: ${OPENAI_API_KEY:-""}
      endpoint: ${OPENAI_API_ENDPOINT:-""}
      deploymentName: ${OPENAI_DEPLOYMENT_NAME:-""}
      apiVersion: ${OPENAI_API_VERSION:-"2024-02-01"}
      embeddingModelId: ${OPENAI_EMBEDDING_MODEL_ID:-"text-embedding-3-small"}
      embeddingDimension: ${OPENAI_EMBEDDING_DIMENSION:-1536}

Uses AWS Bedrock for embedding generation.

Environment Variable	Default	Description
`AWS_REGION`	`""`	AWS region
`AWS_ACCESS_KEY_ID`	`""`	AWS access key
`AWS_SECRET_ACCESS_KEY`	`""`	AWS secret access key
`AWS_BEDROCK_EMBED_MODEL_ID`	`""`	Bedrock embedding model ID
`AWS_BEDROCK_EMBEDDING_DIMENSION`	`""`	Embedding vector dimension

elasticsearch:
  naturalLanguageSearch:
    semanticSearchEnabled: true
    embeddingProvider: bedrock
    bedrock:
      awsConfig:
        region: ${AWS_REGION:-""}
        accessKeyId: ${AWS_ACCESS_KEY_ID:-""}
        secretAccessKey: ${AWS_SECRET_ACCESS_KEY:-""}
      embeddingModelId: ${AWS_BEDROCK_EMBED_MODEL_ID:-""}
      embeddingDimension: ${AWS_BEDROCK_EMBEDDING_DIMENSION:-""}

Uses Deep Java Library to run embedding models locally. No external API calls required.

DJL downloads and runs the HuggingFace model in your server directly. This will have an impact on the necessary resources depending on the chosen model. If you are resource constrainted, use external providers.The example model we provide is a rather small one that fits development/testing use cases. In case of choosing DJL, choose a model that fits your use case.

Environment Variable	Default	Description
`DJL_EMBEDDING_MODEL`	`ai.djl.huggingface.pytorch/sentence-transformers/all-MiniLM-L6-v2`	HuggingFace model identifier

The embedding dimension is auto-detected from the model at startup. The default model all-MiniLM-L6-v2 produces 384-dimensional vectors.

elasticsearch:
  naturalLanguageSearch:
    semanticSearchEnabled: true
    embeddingProvider: djl
    djl:
      embeddingModel: ${DJL_EMBEDDING_MODEL:-"ai.djl.huggingface.pytorch/sentence-transformers/all-MiniLM-L6-v2"}

Docker Deployment

To enable Semantic Search in a Docker deployment, set the required environment variables in your docker-compose override or .env file:

environment:
  SEMANTIC_SEARCH_ENABLED: "true"
  EMBEDDING_PROVIDER: "openai"
  OPENAI_API_KEY: "sk-..."
  OPENAI_EMBEDDING_MODEL_ID: "text-embedding-3-small"
  OPENAI_EMBEDDING_DIMENSION: "1536"

Kubernetes Deployment

For Kubernetes deployments using the OpenMetadata Helm chart, add the environment variables to your values.yaml:

openmetadata:
  config:
    extraEnvs:
      - name: SEMANTIC_SEARCH_ENABLED
        value: "true"
      - name: EMBEDDING_PROVIDER
        value: "openai"
      - name: OPENAI_API_KEY
        valueFrom:
          secretKeyRef:
            name: openmetadata-secrets
            key: openai-api-key
      - name: OPENAI_EMBEDDING_MODEL_ID
        value: "text-embedding-3-small"
      - name: OPENAI_EMBEDDING_DIMENSION
        value: "1536"

Store sensitive values like API keys in Kubernetes Secrets and reference them with secretKeyRef rather than hardcoding them in values.yaml.

Validating the Configuration

After configuring your embedding provider, you can verify that everything is set up correctly by navigating to Settings > Preferences > Health in the OpenMetadata UI. This page shows the status of the embedding provider connection and will flag any misconfiguration.

Generating Embeddings

Once Semantic Search is enabled, embeddings are generated and kept in sync automatically as entities are created or updated. To generate embeddings for all existing entities, run a Reindex from the OpenMetadata UI (Settings > Applications > Search Indexing). Every Reindex operation computes embeddings taking a fingerprint into account — if the text representation of an entity has not changed since its last embedding, the embedding is not recomputed. This avoids unnecessary calls to the embedding provider and makes re-indexing efficient even for large catalogs.

API Reference

Semantic Search exposes a REST API endpoint for vector queries:

POST `/api/v1/search/vector/query`

Performs a semantic search against the vector index. Request Body:

{
  "query": "customer demographics purchase history",
  "filters": {
    "entityType": ["table"],
    "owners": ["admin"],
    "tags": ["PII.Sensitive"],
    "domains": ["Marketing"],
    "tier": ["Tier.Tier1"],
    "serviceType": ["Postgres"]
  },
  "size": 10,
  "k": 1000,
  "threshold": 0.0
}

Parameter	Type	Default	Description
`query`	string	(required)	Natural language search text
`filters`	map	`{}`	Filter map by entity type, owners, tags, domains, tier, service type, certification, or custom properties
`size`	int	`10`	Number of distinct entities to return (max 100)
`k`	int	`500`	KNN parameter — number of nearest neighbors to consider (max 10,000)
`threshold`	double	`0.0`	Minimum similarity score to include in results

Results are deduplicated by parent entity, so you will receive at most size distinct entities even if an entity has multiple text chunks.

Troubleshooting

Semantic Search returns no results

Verify that SEMANTIC_SEARCH_ENABLED is set to true and the server has been restarted.
Confirm that OpenSearch is your search backend (Elasticsearch is not supported).
Check that the vector_search_index exists in OpenSearch.
Run a Reindex to generate embeddings for existing entities.

Embedding generation fails

Verify network connectivity from the OpenMetadata server to your embedding provider.
Check that API keys and credentials are correct.
Review the OpenMetadata server logs for detailed error messages.

Deployment

​Enable Semantic Search

​Prerequisites

​Overview

​How It Works

​Supported Entity Types

​Configuration

​Enable Semantic Search

​Embedding Providers

​Docker Deployment

​Kubernetes Deployment

​Validating the Configuration

​Generating Embeddings

​API Reference

​POST /api/v1/search/vector/query

​Troubleshooting

​Semantic Search returns no results

​Embedding generation fails

Enable Semantic Search

Prerequisites

Overview

How It Works

Supported Entity Types

Configuration

Enable Semantic Search

Embedding Providers

Docker Deployment

Kubernetes Deployment

Validating the Configuration

Generating Embeddings

API Reference

POST `/api/v1/search/vector/query`

Troubleshooting

Semantic Search returns no results

Embedding generation fails