Enable Semantic Search
Prerequisites
- OpenSearch as your search backend (Elasticsearch is not supported)
- An external embedding provider: OpenAI or AWS Bedrock, or DJL for HuggingFace models.
- Network access from the OpenMetadata server to the embedding provider API (unless using DJL)
Overview
Semantic Search enhances OpenMetadata’s search capabilities by using vector embeddings to understand the meaning behind queries, rather than relying solely on keyword matching. This means users and AI agents can search using natural language — for example, “tables with customer demographics and purchase history” — and get meaningful results even if those exact words don’t appear in the metadata.Semantic Search is currently supported only with OpenSearch as the search backend.
How It Works
Text Construction
For each entity, a structured text representation is constructed from its metadata — including name, description,
entity type, tags, glossary terms, owners, and other relevant fields.
Embedding Generation & Vector Indexing
The text is sent to the configured embedding provider to generate a numerical vector (embedding), which is stored
in a dedicated OpenSearch
vector_search_index using the HNSW algorithm with cosine similarity. At query time,
the search text is also embedded and a KNN (K-Nearest Neighbor) similarity search finds the most relevant results.Automatic Lifecycle Management
Embeddings follow the same lifecycle as the entities themselves. When entities are created, updated, deleted, or
restored, their embeddings are automatically kept in sync using the same indexing strategies the platform already
uses for search. No manual intervention is required after initial setup.
Supported Entity Types
table, glossary, glossaryTerm, chart, dashboard, dashboardDataModel, database, databaseSchema,
dataProduct, pipeline, mlmodel, metric, apiEndpoint, apiCollection, page, storedProcedure,
searchIndex, topic
Configuration
Semantic Search is configured inopenmetadata.yaml under the elasticsearch.naturalLanguageSearch section.
All settings can be overridden with environment variables.
Enable Semantic Search
| Environment Variable | Default | Description |
|---|---|---|
SEMANTIC_SEARCH_ENABLED | false | Master switch to enable semantic search |
EMBEDDING_PROVIDER | bedrock | Embedding provider to use: openai, bedrock, or djl |
Embedding Providers
Choose one of the following embedding providers and configure it accordingly.- OpenAI
- AWS Bedrock
- DJL
Supports both OpenAI and Azure OpenAI endpoints.
| Environment Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY | "" | Your OpenAI API key |
OPENAI_API_ENDPOINT | "" | API endpoint. For Azure, use https://your-resource.openai.azure.com |
OPENAI_DEPLOYMENT_NAME | "" | Deployment name (required for Azure OpenAI) |
OPENAI_API_VERSION | 2024-02-01 | API version (Azure OpenAI) |
OPENAI_EMBEDDING_MODEL_ID | text-embedding-3-small | Embedding model to use |
OPENAI_EMBEDDING_DIMENSION | 1536 | Embedding vector dimension |
Docker Deployment
To enable Semantic Search in a Docker deployment, set the required environment variables in yourdocker-compose override
or .env file:
Kubernetes Deployment
For Kubernetes deployments using the OpenMetadata Helm chart, add the environment variables to yourvalues.yaml:
Validating the Configuration
After configuring your embedding provider, you can verify that everything is set up correctly by navigating toSettings > Preferences > Health in the OpenMetadata UI. This page shows the status of the embedding provider
connection and will flag any misconfiguration.
Generating Embeddings
Once Semantic Search is enabled, embeddings are generated and kept in sync automatically as entities are created or updated. To generate embeddings for all existing entities, run a Reindex from the OpenMetadata UI (Settings > Applications > Search Indexing).
Every Reindex operation computes embeddings taking a fingerprint into account — if the text representation of an entity
has not changed since its last embedding, the embedding is not recomputed. This avoids unnecessary calls to the
embedding provider and makes re-indexing efficient even for large catalogs.
API Reference
Semantic Search exposes a REST API endpoint for vector queries:POST /api/v1/search/vector/query
Performs a semantic search against the vector index.
Request Body:
| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | (required) | Natural language search text |
filters | map | {} | Filter map by entity type, owners, tags, domains, tier, service type, certification, or custom properties |
size | int | 10 | Number of distinct entities to return (max 100) |
k | int | 500 | KNN parameter — number of nearest neighbors to consider (max 10,000) |
threshold | double | 0.0 | Minimum similarity score to include in results |
size distinct entities even if an entity has
multiple text chunks.
Troubleshooting
Semantic Search returns no results
- Verify that
SEMANTIC_SEARCH_ENABLEDis set totrueand the server has been restarted. - Confirm that OpenSearch is your search backend (Elasticsearch is not supported).
- Check that the
vector_search_indexexists in OpenSearch. - Run a Reindex to generate embeddings for existing entities.
Embedding generation fails
- Verify network connectivity from the OpenMetadata server to your embedding provider.
- Check that API keys and credentials are correct.
- Review the OpenMetadata server logs for detailed error messages.