Documentation Index
Fetch the complete documentation index at: https://docs.open-metadata.org/llms.txt
Use this file to discover all available pages before exploring further.
Profiler Module: Execution & Extension
This page covers the execution internals and extension points. For the architecture overview and metrics system, see Profiler Module | Architecture & Metrics System.Profiler Execution Call Chain
Threading Model
The profiler uses a thread pool to parallelize metric computation across columns:QueryRunner instance (_create_thread_safe_runner()) with a dedicated database session. Thread count scales dynamically based on the number of tasks (min 5, max 20).
Metric Filtering
Not all metrics apply to all columns.MetricFilter (processor/metric_filter.py) selects metrics based on:
- Column data type — numeric metrics skip string columns (uses
orm/registry.pyclassifiers likeis_quantifiable(),is_concatenable()) - Global profiler config — admin can enable/disable specific metrics
- Table-level config — per-table metric overrides
- Database service type — some metrics only work on specific databases
ORM Layer
Theorm/ directory bridges SQLAlchemy types with OpenMetadata’s type system:
registry.py—PythonDialectsmaps service types to SQLAlchemy dialects;CustomTypeshandles special types (UUID, BYTES, ARRAY)converter/— database-specific converters (BigQuery, Snowflake, etc.) that map SQLAlchemy column types to OpenMetadata column typesfunctions/— custom SQL functions used by metrics (e.g.,SumFn,LenFn) that handle dialect differences
Database Backend Pluggability
Each database can override the default profiler behavior:Profiler Configuration
The profiler is configured viaProfilerProcessorConfig (api/models.py):
Adding a New Metric
Choose the metric type
Decide which base class fits your metric:
| If your metric… | Extend |
|---|---|
| Is a SQL aggregate (one value per column) | StaticMetric |
| Needs a full query (multiple rows) | QueryMetric |
| Is derived from other metrics (no DB query) | ComposedMetric |
| Needs both a query and prior metric results | HybridMetric |
| Uses window functions (percentiles) | WindowMetric |
Implement the metric class
Create a file in the appropriate For a
metrics/ subdirectory.For a StaticMetric, implement fn() returning a SQLAlchemy expression:ComposedMetric, implement fn() using prior results:Pandas / DataFrame Support
For non-SQL sources (datalakes, files), the profiler uses a Pandas-based interface with an accumulator pattern (metrics/pandas_metric_protocol.py):
Key Design Patterns
| Pattern | Where | Why |
|---|---|---|
| Registry | Metrics enum, SystemMetricsRegistry, TypeRegistry | Central catalog with callable instantiation |
| Thread Pool | SQAProfilerInterface.get_all_metrics() | Parallelize metric computation across columns |
| Decorator | @add_props(table=table) | Dynamically attach table/column context to metric instances |
| Accumulator | PandasComputation protocol | Streaming computation for large DataFrames |
| Factory | ProfilerInterface.create(), ProfilerSourceFactory | Database-specific implementations via config |
| Strategy | Different _compute_*_metrics() methods dispatched by type | Each metric type has its own execution strategy |
Key Files Quick Reference
| What you want to do | Start here |
|---|---|
| Understand the workflow pipeline | workflow/profiler.py |
| See how tables are fetched and filtered | profiler/source/metadata.py |
| Read the core profiler orchestration | profiler/processor/core.py → Profiler class |
| See how metrics execute in threads | profiler/interface/sqlalchemy/profiler_interface.py → get_all_metrics() |
| Read metric base classes | profiler/metrics/core.py |
| Browse all built-in metrics | profiler/metrics/registry.py |
| See a static metric implementation | profiler/metrics/static/mean.py |
| See a composed metric | profiler/metrics/composed/null_ratio.py |
| Understand column type filtering | profiler/orm/registry.py → is_quantifiable(), etc. |
| See how results are published | ingestion/sink/metadata_rest.py → write_profiler_response() |
| Add database-specific overrides | profiler/interface/sqlalchemy/{dialect}/ |