Upgrade from 0.11 to 0.12

Upgrading from 0.11 to 0.12 can be done directly on your instances. This page will list few general details you should take into consideration when running the upgrade.

On 0.11, the Environment Variables to connect to Database used were

  1. MYSQL_USER
  2. MYSQL_USER_PASSWORD
  3. MYSQL_HOST
  4. MYSQL_PORT
  5. MYSQL_DATABASE.

These environment variables are changed in 0.12.0 Release

  1. DB_USER
  2. DB_USER_PASSWORD
  3. DB_HOST
  4. DB_PORT
  5. OM_DATABASE.

This will effect to all the bare metal and docker instances which configures a custom database depending on the above environment variable values.

This change is however not affected for Kubernetes deployments.

On 0.11, the Profiler Workflow handled two things:

  • Computing metrics on the data
  • Running the configured Data Quality Tests

There has been a major overhaul where not only the UI greatly improved, now showing all historical data, but on the internals as well. Main topics to consider:

  1. Tests now run with the Test Suite workflow and cannot be configured in the Profiler Workflow
  2. Any past test data will be cleaned up during the upgrade to 0.12.0, as the internal data storage has been improved
  3. The Profiler Ingestion Pipelines will be cleaned up during the upgrade to 0.12.0 as well
  4. You will see broken profiler DAGs in airflow -- you can simply delete these DAGs

From 0.12, OpenMetadata supports ingesting the tests and the test results from your dbt project.

  • Along with manifest.json and catalog.json files, We now provide an option to ingest the run_results.json file generated from the dbt run and ingest the test results from it.
  • The field to enter the run_results.json file path is an optional field in the case of local and http dbt configs. The test results will be ingested from this file if it is provided.
  • For others the file will be picked from their respective sources if it is available.

On top of the information above, the fqnFilterPattern has been converted into the same patterns we use for ingestion, databaseFilterPattern, schemaFilterPattern and tableFilterPattern.

In the processor you can now configure:

  • profileSample to specify the % of the table to run the profiling on
  • columnConfig.profileQuery as a query to use to sample the data of the table
  • columnConfig.excludeColumns and columnConfig.includeColumns to mark which columns to skip.
  • In columnConfig.includeColumns we can also specify a list of metrics to run from our supported metrics.

In OpenMetadata 0.12 we have migrated the metrics computation to multithreading. This migration reduced metrics computation time by 70%.

Snowflake users may experience a circular import error. This is a known issue with snowflake-connector-python. If you experience such error we recommend to either 1) run the ingestion workflow in Python 3.8 environment or 2) if you can't manage your environement set ThreadCount to 1. You can find more information on the profiler setting here

The Airflow version from the Ingestion container image has been upgraded to 2.3.3.

Note that this means that now this is the version that will be used to run the Airflow metadata extraction. This impacted for example when ingesting status from Airflow 2.1.4 (issue[https://github.com/open-metadata/OpenMetadata/issues/7228]).

Moreover, the authentication mechanism that Airflow exposes for the custom plugins has changed. This required us to fully update how we were handling the managed APIs, both on the plugin side and the OpenMetadata API (which is the one sending the authentication).

To continue working with your own Airflow linked to the OpenMetadata UI for ingestion management, we recommend migrating to Airflow 2.3.3.

If you are using your own Airflow to prepare the ingestion from the UI, which is stuck in version 2.1.4, and you cannot upgrade that, but you want to use OM 0.12, reach out to us.

Note
Upgrading airflow from 2.1.4 to 2.3 requires a few steps. If you are using your airflow instance only to run OpenMetadata workflow we recommend you to simply drop the airflow database. You can simply connect to your database engine where your airflow database exist, make a back of your database, drop it and recreate it.

If you would like to keep the existing data in your airflow instance or simply cannot drop your airflow database you will need to follow the steps below:

  1. Make a backup of your database (this will come handing if the migration fails and you need to perform any kind of restore)
  2. Upgrade your airflow instance from 2.1.x to 2.2.x
  3. Before upgrading to the 2.3.x version perform the steps describe on the airflow documentation page [here] to make sure character set/collation uses the correct encoding -- this has been changing across MySQL versions.
  4. Once the above has been performed you should be able to upgrade to airflow 2.3.x
  • Oracle: In 0.11.x and previous releases, we were using the Cx_Oracle driver to extract the metadata from oracledb. The drawback of using this driver was it required Oracle Client libraries to be installed in the host machine in order to run the ingestion. With the 0.12 release, we will be using the python-oracledb driver which is a upgraded version of Cx_Oracle. python-oracledb with Thin mode does not need Oracle Client libraries.

  • Azure SQL & MSSQL: Azure SQL & MSSQL with pyodbc scheme requires ODBC driver to be installed, with 0.12 release we are shipping the ODBC Driver 18 for SQL Server out of the box in our ingestion docker image.

  • DynamoDB
    • Removed: database
  • Deltalake:
    • Removed: connectionOptions and supportsProfiler
  • Looker
    • Renamed username to clientId and password to clientSecret to align on the internals required for the metadata extraction.
    • Removed: env
  • Oracle
    • Removed: databaseSchema and oracleServiceName from the root.
    • Added: oracleConnectionType which will either contain oracleServiceName or databaseSchema. This will reduce confusion on setting up the connection.
  • Athena
    • Removed: hostPort
  • Databricks
    • Removed: username and password
  • dbt Config
    • Added: dbtRunResultsFilePath and dbtRunResultsHttpPath where path of the run_results.json file can be passed to get the test results data from dbt.

In 0.12.1

  • DeltaLake:
    • Updated the structure of the connection to better reflect the possible options.
      • Removed metastoreHostPort and metastoreFilePath, which are now embedded inside metastoreConnection
      • Added metastoreDb as an option to be passed inside metastoreConnection
    • You can find more information about the current structure here

We have stopped updating the service connection parameters when running the ingestion workflow from the CLI. The connection parameter will be retrieved from the server if the service already exists. Therefore, the connection parameters of a service will only be possible to be updated from the OpenMetadata UI.

In the 0.12.1 version, AIRFLOW_AUTH_PROVIDER and OM_AUTH_AIRFLOW_{AUTH_PROVIDER} parameters are not needed to configure how the ingestion is performed from Airflow when our OpenMetadata server is secured. This can be achieved directly from UI through the Bots configuration in the settings page. For more information, visit the section of each SSO configuration in the Enable Security chapter.

Note that the ingestion-bot bot is created (or updated if it already exists) as a system bot that cannot be deleted, and the credentials used for this bot, if they did not exist before, will be the ones present in the OpenMetadata configuration. Otherwise, a JWT Token will be generated to be the default authentication mechanism of the ingestion-bot.

OpenMetadata 0.12.3 is a stable release. Please check the release notes

If you are upgrading production this is the recommended version to upgrade.

OpenMetadata Release 0.12.x introduces below breaking changes -

Under the openmetadata.yaml, all the class names are updated from org.openmetadata.catalog.* to org.openmetadata.service.*.

  • If you are using a previous version of openmetadata.yaml config file with bare metal installation, make sure to migrate all these values as per new openmetadata.yaml configurations. Check the below example code snippet from openmetadata.yaml configuration
  • If you are using docker installation with your custom env file, update all the environement variables from org.openmetadata.catalog.* to org.openmetadata.service.*.
  • If you are running openmetadata on kubernetes with helm charts, make sure to update global.authorizer.className and global.authorizer.containerRequestFilter with below values for your custom openmetadata helm chart values file.

Starting 0.12.1 Release, we have centralized openmetadata/airflow and openmetadata/ingestion docker images with openmetadata/ingestion docker image which will be used with docker compose installation and kubernetes helm chart installation. This docker image is based on apache-airflow 2.3.3 image with python 3.9.9. This will be a rootless docker image for enhanced security.

  • There is no change or effect with docker installation

  • This is a breaking change if you are using a custom openmetadata-dependencies kubernetes helm chart values file. You will need to manually update the airflow image and tag with openmetadata/ingestion:0.12.3

<p> If you are extending openmetadata/airflow docker image with 0.12.1 release, you can safely replace that with openmetadata/ingestion:0.12.3 Docker Image. </p>

We have deprecated and removed no-auth as the authentication mechanism starting 0.12.1 Release with OpenMetadata.

The default Authentication mechanism will be basic authentication. You can login to OpenMetadata UI with below default credentials -

Starting 0.12.1 Release, OpenMetadata Installation will provide a default configuration that will enable JWT Token Configuration for the OpenMetadata Instance.

If you want to setup a production Open Metadata instance, it is recommended to follow enable jwt tokens to setup and configure your own JWT Token configurations.