Latest Release 🎉
You can find the GitHub release here.
What’s New
MCP Services
MCP (Model Context Protocol) is now a first-class service category with service entities, server entities, execution logs, test-connection support, REST resources, and UI pages.- Usage analytics expose summary, history, tool breakdown, user breakdown, and current-user usage
- MCP OAuth now supports SAML SSO authentication
- Client secrets are not issued to public clients
get_entity_detailsnow surfaces custom properties in responses
Knowledge Graph and RDF
Requires Apache Jena. Run the RDF Knowledge Graph Index App after upgrade for first-time users.- Distributed RDF indexing with state tables for jobs, partitions, locks, and server stats
- Glossary membership scoping, relation cleanup, distributed mode, and compaction
- Revamped graph with custom nodes, relation details, and distributed indexing status
Search Index Performance and Live Indexing
- Tunable settings: refresh interval, replica count, translog durability, sync interval, and per-entity overrides
- Per-stage reindex timing metrics for reader, process, sink, and vector stages
- Live indexing retries on failure with a dead-letter queue for failed items
- Search results can be exported to CSV from the Explore page under Tools
Ontology Explorer
New first-class governance page at/governance/ontology with graph filters, layout controls, side-panel entity details, and export controls.Typed Glossary Term Relations
New relation types:relatedTo, synonym, antonym, broader, narrower, partOf, hasPart, calculatedFrom, usedToCalculate, seeAlso- New governance settings page to manage relation types
- Relation badges, filters, and graph views throughout Glossary UI
- Concept mappings for external IRIs and SKOS-style relation types
- APIs for relation usage counts, asset counts, batch fetch, add/remove, and relation graph
Data Marketplace
- New sidebar and routes at
/data-marketplace,/data-marketplace/domains,/data-marketplace/data-products - Customizable landing page with widgets for domains, data products, announcements, and search
AI and Hybrid Search
- Google Gemini embedding provider with configurable dimensions and endpoint override
- OpenAI NLQ:
modelId, request timeouts, max tokens, and temperature now configurable - Hybrid search tuning: keyword/semantic weights, RRF settings, semantic score threshold, highlight fragment size
textToLLMContextand vector body-text extension hooks
Data Quality and Profiler
- Dynamic and static sampling via
profileSampleConfig - Explicit metrics selection per profiler run
- Top-dimension controls for dimensional test cases
- Bulk add and select-all for logical and bundle test suites
- Dashboard widgets and filters: data products, certification, incident status, tiers, entity health
- Storage auto-classification for containers with language-aware recognizer selection
- Deterministic MySQL median behavior
Governance and Workflows
- Data-contract references across data assets and service entities
- Workflow triggers extended: data product, data contract, glossary terms, input ports, output ports
- Approval tasks show proposed changes with clickable entity links, domain stamped on creation
- Self-approval prevention for workflow change requests
- New Archived entity status
New Connectors
| Connector | Type | Highlights |
|---|---|---|
| Microsoft Fabric | Database & Pipeline | Lineage, usage, and profiler |
| Google Drive | Storage | Ingestion connector and example workflow |
| Pub/Sub | Messaging | Test-connection support |
| QuestDB | Database | — |
| IOMETE | Database | — |
| SAP SuccessFactors | Database | — |
| SAP S/4HANA | Dashboard | — |
| Matillion Data Cloud | Pipeline | — |
| Airflow 3.x | Pipeline | API-based connector; constraints upgraded to 3.2.1 |
Connector Improvements
- Snowflake — opt-in
ACCESS_HISTORYlineage path; queries chunked by day to avoid timeouts - Unity Catalog — incremental metadata extraction, only fetching changed entities since last run
- SSRS — report-to-dataset lineage
- Metabase — chart-level lineage extraction
- OpenLineage — Glue, Kusto, Cosmos DB naming; symlinks facet for Iceberg; pipeline node for single-sided lineage
- Storage — compressed archive ingestion (ZIP, tar, gzip) in S3, ADLS, GCS; Redis caching for container ancestors
- MySQL —
queryHistoryTableoption; GCP Cloud SQL IAM support - Athena —
catalogIdfor S3 Tables and cross-account Glue - Oracle —
preserveIdentifierCaseanduseDBATableoptions - S3, ADLS, GCS — profiling capability flags; REST connector S3 and SSL config
Platform, Cache, and Operability
- Read-bundle prefetch and cache warmup for tags, certifications, relationships, containers, and ancestors
- Redis: cache metrics, distributed warmup, per-command timeout defaulting to 300 ms
- Deadlock retry handling and reduced write deadlocks
- JSON log format via
LOG_FORMAT=json, streamable logs, non-blocking handlers - QoS request admission enabled by default via
QOS_*settings - CSP nonce handling and web security headers: COEP, CORP, COOP
- Regenerate-bot-tokens for JWT key rotation
db-tuneops subcommand and production RDS runbook- Diagnostics v2 framework — legacy
ExecutionTimeTrackerremoved
Columns as Independent Entities
Columns are now indexed as independent entities. They appear in asset counts and are the default entity shown in Explore when selecting a database service.Upgrade Notes and Breaking Changes
Connector and Ingestion Changes
- Iceberg connector removed — services migrated to
CustomDatabase, pipelines hard-deleted. Update any YAML or automation referencingserviceType: Iceberg - Databricks/Unity Catalog scheme changed from
databricks+connectortodatabricks. Stored configs are migrated; external YAMLs must be updated manually - Profiler sampling changed to
profileSampleConfig. Old fieldsprofileSample,profileSampleType,samplingMethodType, andcomputeMetricsare removed randomizedSampledefaults now explicitlyfalsein migrated configs- Python ingestion targets 3.10, 3.11, 3.12. Key deps: SQLAlchemy 2.x, pandas 2.1.x, pyodbc 5.3.x, Airflow 3.2.1, Databricks SQLAlchemy 2.x
- Storage manifest
partitionColumnsuses a smaller partition-column shape
API and Schema Changes
- Feed APIs no longer accept
fromincreateThreadorcreatePost— remove it from client payloads - Search payloads removed the
semanticSearchboolean - Application schemas renamed
previewtoenabledwith inverted meaning — custom app manifests must useenabled - Webhook moved from
secretKeytoauthTypeobject (no auth / bearer / OAuth2) - Custom property names must start alphanumeric and cannot contain
/or~ - Glossary
relatedTermschanged to typedTermRelationobjects — existing data migrated torelatedTo entity_relationshipprimary key now includesrelationType- Logical-suite add endpoint deprecated — use
PUT /api/v1/dataQuality/testCases/logicalTestCases/bulk - Bulk Assets
dryRunnow enforced for tag, glossary, dataProduct, and team removes - New Archived entity status — update any hard-coded status enums
Operational Notes
- Postgres
fqnHashtext_pattern_opsindexes added or replaced — runbook included in the migration file if the build is interrupted - New tables for MCP services, servers, executions, RDF indexing jobs, partitions, locks, and server stats
SERVER_CHANGE_LOGhistorical gaps backfilled — missing entries caused data-insights timeline holes- Profiler pipeline cleanup force-executed on upgrade to clear stuck pre-1.13 state
LOG_FORMAT=jsonnow supported — review any custom Dropwizard logging config- QoS admission enabled by default — check
QOS_*settings if adjustment needed - Redis per-command timeout defaults to 300 ms — tune for slow Redis deployments
Bug Fixes
Search and Reindexing
- Fixed nested children causing Elasticsearch/OpenSearch mapping-depth failures
- Fixed stale file-extension aggregation on v1.13.0 upgrade causing 500 errors on file search
- Fixed stale flattened-children highlight field on v1.13.0 upgrade causing 500 errors on container search
- Fixed
search_aftersilently dropping entities when sort value contains a comma - Fixed query, worksheet, and file reindexing missing relationship fields
- Fixed search-index alias resolution for entity-specific and OpenSearch cluster prefixes
- Fixed batch-prefetch of upstream lineage leaking Hikari connections during bulk reindex
- Fixed soft-delete propagation to time-series child aliases
- Fixed clean reindex jobs incorrectly marked failed when only warnings existed
- Fixed text-field sorting and aggregation
.keywordresolution - Fixed user index searches on nested owners queries
- Fixed advanced-search Contains and Not Contains operators for description field
Glossary, Tags, and Governance
- Fixed glossary relation rendering for multiple relation types between the same term pair
- Fixed related-term tooltip sanitization and relation badge colors and icons
- Fixed tag rename and relationship cache invalidation
- Fixed
TagLabelserver fields lost when saving tags - Fixed certification tags leaking into regular tags and missing
appliedByaudit trail - Fixed soft-deleted users appearing in experts and reviewers selectors
- Fixed hyperlink workflow rules and Tags/Tier field ambiguity
Data Quality and Profiler
- Fixed test-case suite search membership preservation
- Fixed tier and certification filter queries in Data Quality dashboard
- Fixed incident manager status and severity chip behavior
- Fixed
TableColumnCountToBeBetweenAPI responses - Fixed column profile percentages showing 0% for zero proportions
- Fixed
tableCustomSQLQueryignoringcomputePassedFailedRowCountflag - Fixed orphan test cases breaking search indexing
- Fixed sample randomization at 100% sample
Ingestion and Connectors
- Fixed single bad table aborting entire schema ingestion run
- Fixed Snowflake and OpenMetadata socket waits causing silent hangs
- Fixed Power BI lineage buffer flushing, TSQL
Sql.Databaseparsing, and workspace cache scope - Fixed Databricks nested column descriptions and SQLAlchemy 2.x compatibility
- Fixed Databricks and Unity Catalog valueless tags being silently dropped
- Fixed Datalake JSON columns typed as string for empty dict values
- Fixed MySQL profiler median query quoting and deterministic behavior
- Fixed Redshift interval, numeric, and timestamp precision parsing, view definition, IAM auth, and LISTAGG errors
- Fixed Oracle, MSSQL, Athena, and Redshift profiler under SQLAlchemy 2.0
- Fixed dbt column tags, snapshot model patching, compiled-only test results, and test entity links
- Fixed SQL Server temporal-table period columns classified as PII
- Fixed SQLAlchemy engine resource leak on multi-database source iteration
- Fixed ADLS object counts scoped to configured sub-path
- Fixed PII recognizer selection based on configured language
- Fixed runtime spaCy model loading for non-root containers
UI and UX
- Fixed unknown service categories returning 404
- Fixed Explore page column icon display, search term warnings, and text overflow
- Fixed lineage edge misalignment, edge hover, temporary lineage table nodes, and service nodes
- Fixed table constraints UI and cluster-key constraint display and editing
- Fixed dotted custom-property names display
- Fixed custom relation badge color handling and overlapping badges
- Fixed activity feed, task notification refresh, and approval task rendering
- Fixed MSAL and SAML token renewal and Safari SSO session loss
- Fixed copy-to-clipboard in non-secure contexts
- Fixed charts not deleted when parent dashboard or service is deleted
- Fixed
column.extensionvalues silently dropped on entity creation
Security and Dependencies
- AWS SDK pinned to 2.41.30 — clears CloudFront CVE
- Airflow upgraded to 3.2.1 — clears 7 CVEs
- gnutls, libcap, openssh, and rsync CVEs closed in ingestion Docker images
- Test-connection workflow triggers now require authorization
- Python ingestion: explicit
jsonifyat route level to break XSS taint chain - Axios, dompurify, follow-redirects, and related UI CVE fixes
- Jetty and pac4j upgraded for Java-side CVEs