Kubernetes Native Orchestrator
Starting with OpenMetadata 1.12, you can run ingestion pipelines directly using native Kubernetes, eliminating the need for Apache Airflow. This is ideal for organizations that:- Already run workloads on Kubernetes and prefer native solutions
- Don’t need the full feature set of Apache Airflow
Orchestration Modes
The Kubernetes orchestrator supports two modes for running ingestion pipelines:Option 1: OMJob Operator (Recommended)
Uses custom Kubernetes CRDs (OMJob and CronOMJob) managed by the OpenMetadata operator.
| Resource | Description |
|---|---|
| CronOMJob | Scheduled pipelines - runs on a cron schedule |
| OMJob | On-demand pipelines - one-off execution when triggered |
- Exit Handler Guarantee: Even if the ingestion pod crashes (OOMKilled, node failure, etc.), the operator ensures pipeline status is always reported back to OpenMetadata
- Failure Diagnostics: Automatically collects detailed error context from pod logs and events when pipelines fail
- Pod Lifecycle Monitoring: The operator watches pod events and updates pipeline status in real-time
- Elevated permissions to install Custom Resource Definitions (CRDs)
- The OMJob Operator deployment running in your cluster
Option 2: Native Kubernetes Jobs
Uses standard Kubernetes resources (Job and CronJob) without any custom CRDs.
| Resource | Description |
|---|---|
| CronJob | Scheduled pipelines - runs on a cron schedule |
| Job | On-demand pipelines - one-off execution when triggered |
- No CRD installation required - uses only built-in Kubernetes resources
- Works in environments with restricted permissions
- Simpler setup
- No guaranteed exit handler - if a pod is killed unexpectedly, status updates may not reach OpenMetadata
- No automatic failure diagnostics
Features
Native K8s Integration
Pipelines run as standard Kubernetes Jobs, making them easy to monitor with existing K8s tooling.
Automatic Status Updates
Pipeline status is automatically reported back to OpenMetadata, including success/failure details.
Failure Diagnostics
When pipelines fail, detailed diagnostics are collected from pod logs and events. (OMJob Operator only)
Resource Control
Configure CPU, memory, node selectors, and security contexts for ingestion pods.
Setup Option 1: OMJob Operator (Recommended)
This setup uses custom CRDs for guaranteed exit handler execution and failure diagnostics.Prerequisites
- OpenMetadata deployed on Kubernetes (Helm chart recommended)
- Permissions to install CRDs in your cluster
- Ingestion image accessible from your cluster (
docker.getcollate.io/openmetadata/ingestion-base)
Helm Values Configuration
Required RBAC Permissions
When using the OMJob Operator, additional permissions are needed for the custom resources:Setup Option 2: Native Kubernetes Jobs
This setup uses standard Kubernetes Jobs and CronJobs without any custom CRDs.Prerequisites
- OpenMetadata deployed on Kubernetes (Helm chart recommended)
- RBAC permissions for the OpenMetadata service account to manage Jobs, CronJobs, ConfigMaps, and Secrets
- Ingestion image accessible from your cluster (
docker.getcollate.io/openmetadata/ingestion-base)
Helm Values Configuration
Required RBAC Permissions
Validating the Setup
1. Check Service Health
Navigate to Settings → Preferences → Health in the OpenMetadata UI to verify the Kubernetes pipeline client is properly configured and can connect to the Kubernetes API.2. Deploy a Test Pipeline
Create a simple metadata ingestion pipeline from the OpenMetadata UI. The pipeline should:- Show “Deployed” status
- Display the Kubernetes Job/CronJob name
3. Check Kubernetes Resources
Pipeline Logs
Pipeline logs are retrieved directly from Kubernetes pod logs. OpenMetadata implements log pagination for large log files, splitting them into ~1MB chunks for efficient retrieval. To view logs:- Navigate to Settings → Services → Agents
- Select your pipeline
- Click on Logs to view them directly on OpenMetadata UI
Troubleshooting
Pipeline stuck in “Queued” state
If the pipeline cannot start and remains in “Queued” state, check if the pod can be scheduled:- Image pull errors (check
imagePullSecrets) - Insufficient cluster resources (increase CPU/memory limits or add nodes)
- Node selector constraints
Permission Denied Errors
If you see RBAC-related errors:Ingestion Pod Crashes (OOMKilled)
Increase memory limits in the Helm values:CronJob Not Triggering
Check CronJob status and events:- Invalid cron expression
startingDeadlineSecondstoo short- Concurrency policy blocking execution
Migrating from Airflow
If you’re migrating from Airflow to the Kubernetes orchestrator:- Stop existing Airflow-managed pipelines - Disable or delete pipelines managed by Airflow
- Update Helm values - Switch
type: "airflow"totype: "k8s" - Redeploy OpenMetadata - Apply the new Helm configuration
- Re-deploy pipelines - Navigate to each pipeline and click “Deploy” to create the Kubernetes resources
Comparison: Airflow vs Kubernetes Orchestrator
| Feature | Airflow | K8s Native | K8s with OMJob Operator |
|---|---|---|---|
| Infrastructure | Requires Airflow deployment | Uses existing K8s cluster | Uses existing K8s cluster |
| CRD Installation | N/A | Not required | Required |
| Exit Handler Guarantee | ✅ Airflow handles | ❌ Best effort | ✅ Guaranteed |
| Failure Diagnostics | ❌ | ❌ | ✅ |
| UI for DAGs | ✅ Airflow UI | OpenMetadata UI | OpenMetadata UI |
| Resource efficiency | Always running | Jobs on-demand | Jobs on-demand |
| K8s-native monitoring | Extra setup | ✅ Native | ✅ Native |