> ## Documentation Index
> Fetch the complete documentation index at: https://docs.open-metadata.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Kubernetes Native Orchestrator

> Run ingestion pipelines using native Kubernetes Jobs and CronJobs without requiring Apache Airflow.

# Kubernetes Native Orchestrator

Starting with OpenMetadata 1.12, you can run ingestion pipelines directly using **native Kubernetes**,
eliminating the need for Apache Airflow. This is ideal for organizations that:

* Already run workloads on Kubernetes and prefer native solutions
* Don't need the full feature set of Apache Airflow

## Orchestration Modes

The Kubernetes orchestrator supports two modes for running ingestion pipelines:

### Option 1: OMJob Operator (Recommended)

Uses custom Kubernetes CRDs (`OMJob` and `CronOMJob`) managed by the OpenMetadata operator.

| Resource      | Description                                            |
| ------------- | ------------------------------------------------------ |
| **CronOMJob** | Scheduled pipelines - runs on a cron schedule          |
| **OMJob**     | On-demand pipelines - one-off execution when triggered |

<Tip>
  **Recommended for production.** The OMJob Operator provides guaranteed exit handler execution and failure diagnostics.
</Tip>

**Advantages:**

* **Exit Handler Guarantee**: Even if the ingestion pod crashes (OOMKilled, node failure, etc.), the operator ensures pipeline status is always reported back to OpenMetadata
* **Failure Diagnostics**: Automatically collects detailed error context from pod logs and events when pipelines fail
* **Pod Lifecycle Monitoring**: The operator watches pod events and updates pipeline status in real-time

**Requirements:**

* Elevated permissions to install Custom Resource Definitions (CRDs)
* The OMJob Operator deployment running in your cluster

### Option 2: Native Kubernetes Jobs

Uses standard Kubernetes resources (`Job` and `CronJob`) without any custom CRDs.

| Resource    | Description                                            |
| ----------- | ------------------------------------------------------ |
| **CronJob** | Scheduled pipelines - runs on a cron schedule          |
| **Job**     | On-demand pipelines - one-off execution when triggered |

**Advantages:**

* No CRD installation required - uses only built-in Kubernetes resources
* Works in environments with restricted permissions
* Simpler setup

**Limitations:**

* No guaranteed exit handler - if a pod is killed unexpectedly, status updates may not reach OpenMetadata
* No automatic failure diagnostics

## Features

<CardGroup cols={2}>
  <Card title="Native K8s Integration" icon="cube">
    Pipelines run as standard Kubernetes Jobs, making them easy to monitor with existing K8s tooling.
  </Card>

  <Card title="Automatic Status Updates" icon="rotate">
    Pipeline status is automatically reported back to OpenMetadata, including success/failure details.
  </Card>

  <Card title="Failure Diagnostics" icon="bug">
    When pipelines fail, detailed diagnostics are collected from pod logs and events. **(OMJob Operator only)**
  </Card>

  <Card title="Resource Control" icon="sliders">
    Configure CPU, memory, node selectors, and security contexts for ingestion pods.
  </Card>
</CardGroup>

***

## Setup Option 1: OMJob Operator (Recommended)

This setup uses custom CRDs for guaranteed exit handler execution and failure diagnostics.

### Prerequisites

1. **OpenMetadata deployed on Kubernetes** (Helm chart recommended)
2. **Permissions to install CRDs** in your cluster
3. **Ingestion image** accessible from your cluster (`docker.getcollate.io/openmetadata/ingestion-base`)

### Helm Values Configuration

```yaml theme={null}
# Enable the OMJob Operator
omjobOperator:
  enabled: true
  image:
    repository: docker.getcollate.io/openmetadata/omjob-operator
    tag: "1.12.0"
    pullPolicy: IfNotPresent
  resources:
    requests:
      cpu: "100m"
      memory: "128Mi"
    limits:
      cpu: "500m"
      memory: "256Mi"

openmetadata:
  config:
    pipelineServiceClientConfig:
      enabled: true
      type: "k8s"
      metadataApiEndpoint: http://openmetadata:8585/api

      k8s:
        # Use the OMJob Operator
        useOMJobOperator: true
        
        # Container image for ingestion jobs
        ingestionImage: "docker.getcollate.io/openmetadata/ingestion-base:1.12.0"
        imagePullPolicy: "IfNotPresent"
        imagePullSecrets: ""
        
        # Service account for ingestion jobs
        serviceAccountName: "openmetadata-ingestion"
        
        # Job lifecycle settings
        ttlSecondsAfterFinished: 86400  # Keep completed jobs for 24 hours
        activeDeadlineSeconds: 7200      # Max 2 hour runtime
        backoffLimit: 3                  # Retry up to 3 times
        
        # Job history
        successfulJobsHistoryLimit: 3
        failedJobsHistoryLimit: 3
        
        # Pod security context
        securityContext:
          runAsUser: 1000
          runAsGroup: 1000
          fsGroup: 1000
          runAsNonRoot: true
        
        # Resource limits
        resources:
          limits:
            cpu: "2"
            memory: "4Gi"
          requests:
            cpu: "500m"
            memory: "1Gi"
        
        # Enable failure diagnostics (only works with OMJob Operator)
        enableFailureDiagnostics: true
        
        # RBAC - set to false if managed externally
        rbac:
          enabled: true
```

### Required RBAC Permissions

When using the OMJob Operator, additional permissions are needed for the custom resources:

```yaml theme={null}
rules:
  # Pod management for pipeline jobs and diagnostics
  - apiGroups: [""]
    resources: ["pods", "pods/log"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  # ConfigMaps for pipeline configuration
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  # Secrets for pipeline credentials
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  # Events for diagnostics
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["get", "list"]
  # Jobs and CronJobs management
  - apiGroups: ["batch"]
    resources: ["jobs", "cronjobs"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  # OMJob CRDs
  - apiGroups: ["pipelines.openmetadata.org"]
    resources: ["omjobs"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  - apiGroups: ["pipelines.openmetadata.org"]
    resources: ["omjobs/status"]
    verbs: ["get", "patch"]
  - apiGroups: ["pipelines.openmetadata.org"]
    resources: ["cronomjobs"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  - apiGroups: ["pipelines.openmetadata.org"]
    resources: ["cronomjobs/status"]
    verbs: ["get", "patch"]
```

***

## Setup Option 2: Native Kubernetes Jobs

This setup uses standard Kubernetes Jobs and CronJobs without any custom CRDs.

### Prerequisites

1. **OpenMetadata deployed on Kubernetes** (Helm chart recommended)
2. **RBAC permissions** for the OpenMetadata service account to manage Jobs, CronJobs, ConfigMaps, and Secrets
3. **Ingestion image** accessible from your cluster (`docker.getcollate.io/openmetadata/ingestion-base`)

### Helm Values Configuration

```yaml theme={null}
openmetadata:
  config:
    pipelineServiceClientConfig:
      enabled: true
      type: "k8s"
      metadataApiEndpoint: http://openmetadata:8585/api

      k8s:
        # Do NOT use the OMJob Operator (default)
        useOMJobOperator: false
        
        # Container image for ingestion jobs
        ingestionImage: "docker.getcollate.io/openmetadata/ingestion-base:1.12.0"
        imagePullPolicy: "IfNotPresent"
        imagePullSecrets: ""
        
        # Service account for ingestion jobs
        serviceAccountName: "openmetadata-ingestion"
        
        # Job lifecycle settings
        ttlSecondsAfterFinished: 86400
        activeDeadlineSeconds: 7200
        backoffLimit: 3
        
        # Job history
        successfulJobsHistoryLimit: 3
        failedJobsHistoryLimit: 3
        
        # Pod security context
        securityContext:
          runAsUser: 1000
          runAsGroup: 1000
          fsGroup: 1000
          runAsNonRoot: true
        
        # Resource limits
        resources:
          limits:
            cpu: "2"
            memory: "4Gi"
          requests:
            cpu: "500m"
            memory: "1Gi"
        
        # RBAC - set to false if managed externally
        rbac:
          enabled: true
```

### Required RBAC Permissions

```yaml theme={null}
rules:
  # Pod management for pipeline jobs
  - apiGroups: [""]
    resources: ["pods", "pods/log"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  # ConfigMaps for pipeline configuration
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  # Secrets for pipeline credentials
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
  # Events for diagnostics
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["get", "list"]
  # Jobs and CronJobs management
  - apiGroups: ["batch"]
    resources: ["jobs", "cronjobs"]
    verbs: ["get", "list", "create", "update", "patch", "delete"]
```

***

<Info>
  For validating your setup, viewing pipeline logs, troubleshooting, and migrating from Airflow, see the [Operations & Troubleshooting](/v1.12.x/deployment/ingestion/kubernetes/troubleshooting) guide.
</Info>
