Semantic Search MCP Tool
Overview
The Semantic Search tool extends OpenMetadata’s MCP Server with the ability to perform natural language vector searches against your metadata catalog. Instead of relying on exact keyword matches, this tool uses vector embeddings to understand the semantic meaning of queries, returning results that are conceptually relevant even if they don’t contain the exact search terms. For example, a query like “tables storing customer purchase behavior” can surface tables namedorder_transactions
or buyer_activity that are semantically related but don’t share the same keywords.
Semantic Search requires OpenSearch as the search backend and must be enabled in your deployment
configuration before it can be used through MCP.
How It Works
When an AI assistant calls the Semantic Search tool through MCP:- The natural language query is sent to the OpenMetadata server’s vector search endpoint (
/api/v1/search/vector/query). - The query text is converted into a vector embedding using the configured embedding provider (OpenAI, AWS Bedrock, or DJL).
- OpenSearch performs a KNN (K-Nearest Neighbor) similarity search against pre-computed entity embeddings.
- Results are deduplicated by entity and returned to the AI assistant with metadata including entity type, fully qualified name, owners, tags, and similarity score.
Prerequisites
- OpenMetadata v1.12.x or later
- Semantic Search enabled and configured in your deployment
- The MCP Application installed in OpenMetadata
- A Personal Access Token for authentication
Using the Semantic Search Tool
The Semantic Search tool is automatically available through the MCP Server when Semantic Search is enabled in your deployment. AI assistants connected via MCP can call it directly.Tool: search_metadata (with Semantic Search)
When Semantic Search is enabled, the existing search_metadata MCP tool is enhanced with vector search capabilities.
The AI agent’s natural language query is used for semantic similarity matching, providing more accurate and contextually
relevant results.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Natural language search text |
entity_type | string | No | Filter by entity type (table, dashboard, pipeline, etc.) |
limit | integer | No | Maximum number of results to return (default: 10) |
fields | string | No | Comma-separated additional fields to include |
Example Prompts
Once your MCP client is connected and Semantic Search is enabled, try these natural language prompts with your AI assistant: Find related data assets:Sample MCP Request
Under the hood, when an AI assistant calls the semantic search tool, it sends an MCP request like:Sample Response
Keyword Search vs Semantic Search
| Feature | Keyword Search | Semantic Search |
|---|---|---|
| Matching | Exact keyword and text matching | Meaning-based similarity matching |
| Query style | Specific keywords and filters | Natural language questions |
| Results | Documents containing the search terms | Conceptually related documents |
| Search backend | OpenSearch or Elasticsearch | OpenSearch only |
| Configuration | Available by default | Requires enabling and an embedding provider |
Troubleshooting
The AI assistant doesn’t use Semantic Search
- Verify that
SEMANTIC_SEARCH_ENABLEDis set totruein your OpenMetadata deployment. - Ensure entities have been embedded by running a Reindex from the OpenMetadata UI. See the deployment guide for details.
Results seem irrelevant
- Try rephrasing your query with more descriptive language.
- The quality of results depends on the embedding model. Consider using a higher-quality model like
text-embedding-3-largeif using OpenAI. - Ensure entity descriptions and metadata are well-populated — richer metadata produces better embeddings.
No results returned
- Check that the
vector_search_indexexists in OpenSearch and contains documents. - Verify the embedding provider is correctly configured and accessible.
- Review the OpenMetadata server logs for errors related to vector search.