OpenMetadata Deployment on Azure Kubernetes Service Cluster
OpenMetadata can be deployed on Azure Kubernetes Service. This guide covers both the recommended Kubernetes orchestrator (new in 1.12) and the alternative Airflow-based orchestrator.
Prerequisites
Azure Services for Database and Search Engine as Elastic Cloud
It is recommended to use Azure SQL and Elastic Cloud on Azure for Production Deployments.
We support:
- Azure SQL (MySQL) engine version 8 or higher
- Azure SQL (PostgreSQL) engine version 12 or higher
- Elastic Cloud (ElasticSearch version 8.11.4)
We recommend:
- Azure SQL to be Multi Zone Available and Production Workload Environment
- Elastic Cloud Environment with multiple zones and minimum 2 nodes
Step 1 - Create an AKS Cluster
If you are deploying on a new cluster set the EnableAzureDiskFileCSIDriver=true to enable container storage interface storage drivers.
az aks create --resource-group MyResourceGroup \
--name MyAKSClusterName \
--nodepool-name agentpool \
--outbound-type loadbalancer \
--location YourPreferredLocation \
--generate-ssh-keys \
--enable-addons monitoring \
EnableAzureDiskFileCSIDriver=true
For existing cluster it is important to enable the CSI storage drivers:
az aks update -n MyAKSCluster -g MyResourceGroup --enable-disk-driver --enable-file-driver
Step 2 - Create a Namespace (optional)
kubectl create namespace openmetadata
helm repo add open-metadata https://helm.open-metadata.org/
helm repo update
Kubernetes Orchestrator Configuration (Recommended)
Starting with OpenMetadata 1.12, we recommend using the Kubernetes native orchestrator for running ingestion pipelines. This eliminates the need for Apache Airflow and simplifies your deployment.
The Kubernetes orchestrator runs ingestion pipelines as native K8s Jobs and CronJobs. For full documentation on features, configuration options, and troubleshooting, see the Kubernetes Orchestrator Guide.
The recommended OMJob Operator approach requires installing Custom Resource Definitions (CRDs), which needs elevated cluster permissions. If your cluster policies don’t allow CRDs, you can disable the operator by setting useOMJobOperator: false and omjobOperator.enabled: false in your values file to use native K8s Jobs instead.
Create Kubernetes Secrets
Create the required secrets for your database and search engine:
# Database secret (for MySQL)
kubectl create secret generic mysql-secrets \
--namespace openmetadata \
--from-literal=openmetadata-mysql-password=<YOUR_AZURE_SQL_PASSWORD>
# ElasticSearch secret
kubectl create secret generic elasticsearch-secrets \
--namespace openmetadata \
--from-literal=openmetadata-elasticsearch-password=<YOUR_ELASTIC_CLOUD_PASSWORD>
Create your openmetadata-values.yaml with the following configuration:
# openmetadata-values.yaml
openmetadata:
config:
# Database configuration
elasticsearch:
host: <ELASTIC_CLOUD_ENDPOINT_WITHOUT_HTTPS>
searchType: elasticsearch
port: 443
scheme: https
connectionTimeoutSecs: 5
socketTimeoutSecs: 60
keepAliveTimeoutSecs: 600
batchSize: 10
auth:
enabled: true
username: <ELASTIC_CLOUD_USERNAME>
password:
secretRef: elasticsearch-secrets
secretKey: openmetadata-elasticsearch-password
database:
host: <AZURE_SQL_ENDPOINT>
port: 3306
driverClass: com.mysql.cj.jdbc.Driver
dbScheme: mysql
dbUseSSL: true
databaseName: <AZURE_SQL_DATABASE_NAME>
auth:
username: <AZURE_SQL_DATABASE_USERNAME>
password:
secretRef: mysql-secrets
secretKey: openmetadata-mysql-password
# Kubernetes Orchestrator configuration
pipelineServiceClientConfig:
enabled: true
type: "k8s"
metadataApiEndpoint: http://openmetadata.openmetadata.svc.cluster.local:8585/api
k8s:
ingestionImage: "docker.getcollate.io/openmetadata/ingestion-base:1.12.0"
useOMJobOperator: true
# Enable the OMJob Operator (recommended for production)
omjobOperator:
enabled: true
image:
repository: docker.getcollate.io/openmetadata/omjob-operator
tag: "1.12.0"
image:
tag: "1.12.0"
For advanced configuration options such as resource limits, job lifecycle settings, failure diagnostics, RBAC, and security contexts, see the Kubernetes Orchestrator Guide.
For Database as PostgreSQL, use the below config for database values:database:
host: <AZURE_SQL_ENDPOINT>
port: 5432
driverClass: org.postgresql.Driver
dbScheme: postgresql
dbUseSSL: true
databaseName: <AZURE_SQL_DATABASE_NAME>
auth:
username: <AZURE_SQL_DATABASE_USERNAME>
password:
secretRef: postgresql-secret
secretKey: postgresql-password
# Install OpenMetadata (no dependencies chart needed with K8s orchestrator)
helm install openmetadata open-metadata/openmetadata \
--namespace openmetadata \
--values openmetadata-values.yaml
With the Kubernetes orchestrator, you don’t need to deploy the openmetadata-dependencies chart that includes Airflow. This significantly simplifies your deployment.
Verify the Deployment
# Check pods are running
kubectl get pods -n openmetadata
# Check the K8s orchestrator health in OpenMetadata UI
# Navigate to Settings → Preferences → Health
kubectl port-forward service/openmetadata 8585:8585 -n openmetadata
Using Airflow Orchestrator (Alternative)
If you prefer to use Apache Airflow as the orchestrator (e.g., for existing Airflow investments or complex DAG requirements), follow the configuration below.
Using Airflow requires additional infrastructure: persistent volumes with ReadWriteMany access, the openmetadata-dependencies Helm chart, and more complex configuration.
Create Persistent Volumes
OpenMetadata helm chart depends on Airflow and Airflow expects a persistent disk that support ReadWriteMany (the volume can be mounted as read-write by many nodes). The Azure CSI storage drivers we enabled earlier support the provisioning of the disks in ReadWriteMany mode.
# logs_dags_pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: openmetadata-dependencies-dags-pvc
namespace: openmetadata
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
storageClassName: azurefile-csi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: openmetadata-dependencies-logs-pvc
namespace: openmetadata
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
storageClassName: azurefile-csi
Create the volume claims by applying the manifest:
kubectl apply -f logs_dags_pvc.yaml
Change Owner and Update Permission for Persistent Volumes
Airflow pods run as non-root user and lack write access to our persistent volumes. To fix this we create a job permissions_pod.yaml that runs a pod that mounts volumes into the persistent volume claim and updates the owner of the mounted folders /airflow-dags and /airflow-logs to user id 50000, which is the default linux user id of Airflow pods.
# permissions_pod.yaml
apiVersion: batch/v1
kind: Job
metadata:
labels:
run: my-permission-pod
name: my-permission-pod
namespace: openmetadata
spec:
template:
spec:
containers:
- image: busybox
name: my-permission-pod
volumeMounts:
- name: airflow-dags
mountPath: /airflow-dags
- name: airflow-logs
mountPath: /airflow-logs
command: ["/bin/sh", "-c", "chown -R 50000 /airflow-dags /airflow-logs", "chmod -R a+rwx /airflow-dags"]
restartPolicy: Never
volumes:
- name: airflow-logs
persistentVolumeClaim:
claimName: openmetadata-dependencies-logs-pvc
- name: airflow-dags
persistentVolumeClaim:
claimName: openmetadata-dependencies-dags-pvc
Start the job by applying the manifest:
kubectl apply -f permissions_pod.yaml
Create Airflow Secrets
kubectl create secret generic airflow-secrets \
--namespace openmetadata \
--from-literal=openmetadata-airflow-password=<AdminPassword>
For production deployments connecting external postgresql database:
kubectl create secret generic postgresql-secret \
--namespace openmetadata \
--from-literal=postgresql-password=<MyPGDBPassword>
Create values-dependencies.yaml to configure Airflow with persistent volumes:
# values-dependencies.yaml
airflow:
airflow:
extraVolumeMounts:
- mountPath: /airflow-logs
name: aks-airflow-logs
- mountPath: /airflow-dags/dags
name: aks-airflow-dags
extraVolumes:
- name: aks-airflow-logs
persistentVolumeClaim:
claimName: openmetadata-dependencies-logs-pvc
- name: aks-airflow-dags
persistentVolumeClaim:
claimName: openmetadata-dependencies-dags-pvc
config:
AIRFLOW__OPENMETADATA_AIRFLOW_APIS__DAG_GENERATED_CONFIGS: "/airflow-dags/dags"
dags:
path: /airflow-dags/dags
persistence:
enabled: false
logs:
path: /airflow-logs
persistence:
enabled: false
externalDatabase:
type: postgres # default mysql
host: Host_db_address
database: Airflow_metastore_dbname
user: db_userName
port: 5432
dbUseSSL: true
passwordSecret: postgresql-secret
passwordSecretKey: postgresql-password
Install the dependencies:
helm install openmetadata-dependencies open-metadata/openmetadata-dependencies \
--values values-dependencies.yaml \
--namespace openmetadata \
--set mysql.enabled=false
It takes a few minutes for all the pods to be correctly set-up and running:
kubectl get pods -n openmetadata
Create openmetadata-values.yaml for Airflow-based deployment:
# openmetadata-values.yaml
global:
pipelineServiceClientConfig:
apiEndpoint: http://openmetadata-dependencies-web.openmetadata.svc.cluster.local:8080
metadataApiEndpoint: http://openmetadata.openmetadata.svc.cluster.local:8585/api
openmetadata:
config:
elasticsearch:
host: <ELASTIC_CLOUD_ENDPOINT_WITHOUT_HTTPS>
searchType: elasticsearch
port: 443
scheme: https
auth:
enabled: true
username: <ELASTIC_CLOUD_USERNAME>
password:
secretRef: elasticsearch-secrets
secretKey: openmetadata-elasticsearch-password
database:
host: <AZURE_SQL_ENDPOINT>
port: 5432
driverClass: org.postgresql.Driver
dbScheme: postgresql
databaseName: openmetadata_db
auth:
username: <DB_USERNAME>
password:
secretRef: postgresql-secret
secretKey: postgresql-password
image:
tag: "1.12.0"
helm install openmetadata open-metadata/openmetadata \
--values openmetadata-values.yaml \
--namespace openmetadata
Troubleshooting
Troubleshooting Airflow
JSONDecodeError: Unterminated string starting
If you are using Airflow with Azure Blob Storage as PersistentVolume as explained in Storage class using blobfuse,
you may encounter the following error after a few days:
{dagbag.py:346} ERROR - Failed to import: /airflow-dags/dags/...py
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 3552
Moreover, the Executor pods would actually be using old files. This behaviour is caused by the recommended config by the
mentioned documentation:
- -o allow_other
- --file-cache-timeout-in-seconds=120
- --use-attr-cache=true
- --cancel-list-on-mount-seconds=10 # prevent billing charges on mounting
- -o attr_timeout=120
- -o entry_timeout=120
- -o negative_timeout=120
- --log-level=LOG_WARNING # LOG_WARNING, LOG_INFO, LOG_DEBUG
- --cache-size-mb=1000 # Default will be 80% of available memory, eviction will happen beyond that.
Disabling the cache will help here. In this case it won’t have any negative impact, since the .py and .json
files are small enough and not heavily used.
The same configuration without cache:
- --o direct_io
- --file-cache-timeout-in-seconds=0
- --use-attr-cache=false
- --cancel-list-on-mount-seconds=10
- --o attr_timeout=0
- --o entry_timeout=0
- --o negative_timeout=0
- --log-level=LOG_WARNING
- --cache-size-mb=0
You can find more information about this error here, and similar
discussions here and here.
FAQs
Java Memory Heap Issue
If your openmetadata pods are not in ready state at any point in time and the openmetadata pod logs speaks about the below issue -
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "AsyncAppender-Worker-async-file-appender"
Exception in thread "pool-5-thread-1" java.lang.OutOfMemoryError: Java heap space
Exception in thread "AsyncAppender-Worker-async-file-appender" java.lang.OutOfMemoryError: Java heap space
Exception in thread "dw-46" java.lang.OutOfMemoryError: Java heap space
Exception in thread "AsyncAppender-Worker-async-console-appender" java.lang.OutOfMemoryError: Java heap space
This is due to the default JVM Heap Space configuration (1 GiB) being not enough for your workloads. In order to resolve this issue, head over to your custom openmetadata helm values and append the below environment variable
extraEnvs:
- name: OPENMETADATA_HEAP_OPTS
value: "-Xmx2G -Xms2G"
The flag Xmx specifies the maximum memory allocation pool for a Java virtual machine (JVM), while Xms specifies the initial memory allocation pool.
Upgrade the helm charts with the above changes using the following command helm upgrade --install openmetadata open-metadata/openmetadata --values <values.yml> --namespace <namespaceName>. Update this command your values.yml filename and namespaceName where you have deployed OpenMetadata in Kubernetes.
PostgreSQL Issue permission denied to create extension “pgcrypto”
If you are facing the below issue with PostgreSQL as Database Backend for OpenMetadata Application,
Message: ERROR: permission denied to create extension "pgcrypto"
Hint: Must be superuser to create this extension.
It seems the Database User does not have sufficient privileges. In order to resolve the above issue, grant usage permissions to the PSQL User.
GRANT USAGE ON SCHEMA schema_name TO <openmetadata_psql_user>;
GRANT CREATE ON EXTENSION pgcrypto TO <openmetadata_psql_user>;
In the above command, replace <openmetadata_psql_user> with the sql user used by OpenMetadata Application to connect to PostgreSQL Database.
OpenMetadata helm charts uses official published docker images from DockerHub.
A typical scenario will be to install organization certificates for connecting with inhouse systems.
For Example -
FROM docker.open-metadata.org/openmetadata/server:x.y.z
WORKDIR /home/
COPY <my-organization-certs> .
RUN update-ca-certificates
where docker.open-metadata.org/openmetadata/server:x.y.z needs to point to the same version of the OpenMetadata server, for example docker.open-metadata.org/openmetadata/server:1.3.1.
This image needs to be built and published to the container registry of your choice.
The OpenMetadata Application gets installed as part of openmetadata helm chart. In this step, update the custom helm values using YAML file to point the image created in the previous step. For example, create a helm values file named values.yaml with the following contents -
...
image:
repository: <your repository>
# Overrides the image tag whose default is the chart appVersion.
tag: <your tag>
...
3. Install / Upgrade your helm release
Upgrade/Install your openmetadata helm charts with the below single command:
helm upgrade --install openmetadata open-metadata/openmetadata--values values.yaml
One possible use case where you would need to use a custom image for the ingestion is because you have developed your own custom connectors.
You can find a complete working example of this here. After
you have your code ready, the steps would be the following:
For example -
FROM docker.open-metadata.org/openmetadata/ingestion:x.y.z
USER airflow
# Let's use the home directory of airflow user
WORKDIR /home/airflow
# Install our custom connector
COPY <your_package> <your_package>
COPY setup.py .
RUN pip install --no-deps .
where docker.open-metadata.org/openmetadata/ingestion:x.y.z needs to point to the same version of the OpenMetadata server, for example docker.open-metadata.org/openmetadata/ingestion:1.3.1.
This image needs to be built and published to the container registry of your choice.
The ingestion containers (which is the one shipping Airflow) gets installed in the openmetadata-dependencies helm chart. In this step, we use
our own custom values YAML file to point to the image we just created on the previous step. You can create a file named values.deps.yaml with the
following contents:
airflow:
airflow:
image:
repository: <your repository> # by default, openmetadata/ingestion
tag: <your tag> # by default, the version you are deploying, e.g., 1.1.0
pullPolicy: "IfNotPresent"
3. Install / Upgrade helm release
Upgrade/Install your openmetadata-dependencies helm charts with the below single command:
helm upgrade --install openmetadata-dependencies open-metadata/openmetadata-dependencies --values values.deps.yaml
If you are using MySQL and ElasticSearch externally, you would want to disable the local installation of mysql and elasticsearch while installing OpenMetadata Dependencies Helm Chart. You can disable the MySQL and ElasticSearch Helm Dependencies by setting enabled: false value for each dependency. Below is the command to set helm values from Helm CLI -
helm upgrade --install openmetadata-dependencies open-metadata/openmetadata-dependencies --set mysql.enabled=false --set elasticsearch.enabled=false
Alternatively, you can create a custom YAML file named values.deps.yaml to disable installation of MySQL and Elasticsearch .
mysql:
enabled: false
...
elasticsearch:
enabled: false
...
...
How to configure external database like PostgreSQL with OpenMetadata Helm Charts ?
OpenMetadata Supports PostgreSQL as one of the Database Dependencies. OpenMetadata Helm Charts by default does not include PostgreSQL as Database Dependencies. In order to configure Helm Charts with External Database like PostgreSQL, follow the below guide to make the helm values change and upgrade / install OpenMetadata helm charts with the same.
Upgrade Airflow Helm Dependencies Helm Charts to connect to External Database like PostgreSQL
We ship airflow-helm as one of OpenMetadata Dependencies with default values to connect to MySQL Database as part of externalDatabase configurations.
You can find more information on setting the externalDatabase as part of helm values here.
With OpenMetadata Dependencies Helm Charts, your helm values would look something like below -
...
airflow:
externalDatabase:
type: postgresql
host: <postgresql_endpoint>
port: 5432
database: <airflow_database_name>
user: <airflow_database_login_user>
passwordSecret: airflow-postgresql-secrets
passwordSecretKey: airflow-postgresql-password
...
For the above code, it is assumed you are creating a kubernetes secret for storing Airflow Database login Credentials. A sample command to create the secret will be kubectl create secret generic airflow-postgresql-secrets --from-literal=airflow-postgresql-password=<password>.
Upgrade OpenMetadata Helm Charts to connect to External Database like PostgreSQL
Update the openmetadata.config.database.* helm values for OpenMetadata Application to connect to External Database like PostgreSQL.
With OpenMetadata Helm Charts, your helm values would look something like below -
openmetadata:
config:
...
database:
host: <postgresql_endpoint>
port: 5432
driverClass: org.postgresql.Driver
dbScheme: postgresql
dbUseSSL: true
databaseName: <openmetadata_database_name>
auth:
username: <database_login_user>
password:
secretRef: openmetadata-postgresql-secrets
secretKey: openmetadata-postgresql-password
For the above code, it is assumed you are creating a kubernetes secret for storing OpenMetadata Database login Credentials. A sample command to create the secret will be kubectl create secret generic openmetadata-postgresql-secrets --from-literal=openmetadata-postgresql-password=<password>.
Once you make the above changes to your helm values, run the below command to install/upgrade helm charts -
helm upgrade --install openmetadata-dependencies open-metadata/openmetadata-dependencies --values <<path-to-values-file>> --namespace <kubernetes_namespace>
helm upgrade --install openmetadata open-metadata/openmetadata --values <<path-to-values-file>> --namespace <kubernetes_namespace>
Our OpenMetadata Dependencies Helm Charts are internally depends on three sub-charts -
If you are looking to customize the deployments of any of the above dependencies, please refer to the above links for customizations of helm values for further references.
By default, OpenMetadata Dependencies helm chart provides initial generic customization of these helm values in order to get you started quickly. You can refer to the openmetadata-dependencies helm charts default values here.