Upgrade from 0.11 to 0.12
Upgrading from 0.11 to 0.12 can be done directly on your instances. This page will list few general details you should take into consideration when running the upgrade.
Highlights
Database Connection Environment Variables
On 0.11, the Environment Variables to connect to Database used were
- MYSQL_USER
- MYSQL_USER_PASSWORD
- MYSQL_HOST
- MYSQL_PORT
- MYSQL_DATABASE.
These environment variables are changed in 0.12.0 Release
- DB_USER
- DB_USER_PASSWORD
- DB_HOST
- DB_PORT
- OM_DATABASE.
This will effect to all the bare metal and docker instances which configures a custom database depending on the above environment variable values.
This change is however not affected for Kubernetes deployments.
Data Profiler and Data Quality Tests
On 0.11, the Profiler Workflow handled two things:
- Computing metrics on the data
- Running the configured Data Quality Tests
There has been a major overhaul where not only the UI greatly improved, now showing all historical data, but on the internals as well. Main topics to consider:
- Tests now run with the Test Suite workflow and cannot be configured in the Profiler Workflow
- Any past test data will be cleaned up during the upgrade to 0.12.0, as the internal data storage has been improved
- The Profiler Ingestion Pipelines will be cleaned up during the upgrade to 0.12.0 as well
- You will see broken profiler DAGs in airflow -- you can simply delete these DAGs
dbt Tests Integration
From 0.12, OpenMetadata supports ingesting the tests and the test results from your dbt project.
- Along with
manifest.json
andcatalog.json
files, We now provide an option to ingest therun_results.json
file generated from the dbt run and ingest the test results from it. - The field to enter the
run_results.json
file path is an optional field in the case of local and http dbt configs. The test results will be ingested from this file if it is provided. - For others the file will be picked from their respective sources if it is available.
Profiler Workflow Updates
On top of the information above, the fqnFilterPattern
has been converted into the same patterns we use for ingestion, databaseFilterPattern
, schemaFilterPattern
and tableFilterPattern
.
In the processor
you can now configure:
profileSample
to specify the % of the table to run the profiling oncolumnConfig.profileQuery
as a query to use to sample the data of the tablecolumnConfig.excludeColumns
andcolumnConfig.includeColumns
to mark which columns to skip.- In
columnConfig.includeColumns
we can also specify a list ofmetrics
to run from our supported metrics.
Profiler Multithreading for Snowflake users
In OpenMetadata 0.12 we have migrated the metrics computation to multithreading. This migration reduced metrics computation time by 70%.
Snowflake users may experience a circular import error. This is a known issue with snowflake-connector-python
. If you experience such error we recommend to either 1) run the ingestion workflow in Python 3.8 environment or 2) if you can't manage your environement set ThreadCount
to 1. You can find more information on the profiler setting here
Airflow Version
The Airflow version from the Ingestion container image has been upgraded to 2.3.3
.
Note that this means that now this is the version that will be used to run the Airflow metadata extraction. This impacted for example when ingesting status from Airflow 2.1.4 (issue[https://github.com/open-metadata/OpenMetadata/issues/7228]).
Moreover, the authentication mechanism that Airflow exposes for the custom plugins has changed. This required us to fully update how we were handling the managed APIs, both on the plugin side and the OpenMetadata API (which is the one sending the authentication).
To continue working with your own Airflow linked to the OpenMetadata UI for ingestion management, we recommend migrating to Airflow 2.3.3.
If you are using your own Airflow to prepare the ingestion from the UI, which is stuck in version 2.1.4, and you cannot upgrade that, but you want to use OM 0.12, reach out to us.
Note
Upgrading airflow from 2.1.4 to 2.3 requires a few steps. If you are using your airflow instance only to run OpenMetadata workflow we recommend you to simply drop the airflow database. You can simply connect to your database engine where your airflow database exist, make a back of your database, drop it and recreate it.
If you would like to keep the existing data in your airflow instance or simply cannot drop your airflow database you will need to follow the steps below:
- Make a backup of your database (this will come handing if the migration fails and you need to perform any kind of restore)
- Upgrade your airflow instance from 2.1.x to 2.2.x
- Before upgrading to the 2.3.x version perform the steps describe on the airflow documentation page [here] to make sure character set/collation uses the correct encoding -- this has been changing across MySQL versions.
- Once the above has been performed you should be able to upgrade to airflow 2.3.x
Connector Improvements
Oracle: In
0.11.x
and previous releases, we were using the Cx_Oracle driver to extract the metadata from oracledb. The drawback of using this driver was it required Oracle Client libraries to be installed in the host machine in order to run the ingestion. With the0.12
release, we will be using the python-oracledb driver which is a upgraded version ofCx_Oracle
.python-oracledb
withThin
mode does not need Oracle Client libraries.Azure SQL & MSSQL: Azure SQL & MSSQL with pyodbc scheme requires ODBC driver to be installed, with
0.12
release we are shipping theODBC Driver 18 for SQL Server
out of the box in our ingestion docker image.
Service Connection Updates
- DynamoDB
- Removed:
database
- Removed:
- Deltalake:
- Removed:
connectionOptions
andsupportsProfiler
- Removed:
- Looker
- Renamed
username
toclientId
andpassword
toclientSecret
to align on the internals required for the metadata extraction. - Removed:
env
- Renamed
- Oracle
- Removed:
databaseSchema
andoracleServiceName
from the root. - Added:
oracleConnectionType
which will either containoracleServiceName
ordatabaseSchema
. This will reduce confusion on setting up the connection.
- Removed:
- Athena
- Removed:
hostPort
- Removed:
- Databricks
- Removed:
username
andpassword
- Removed:
- dbt Config
- Added:
dbtRunResultsFilePath
anddbtRunResultsHttpPath
where path of therun_results.json
file can be passed to get the test results data from dbt.
- Added:
In 0.12.1
- DeltaLake:
- Updated the structure of the connection to better reflect the possible options.
- Removed
metastoreHostPort
andmetastoreFilePath
, which are now embedded insidemetastoreConnection
- Added
metastoreDb
as an option to be passed insidemetastoreConnection
- Removed
- You can find more information about the current structure here
- Updated the structure of the connection to better reflect the possible options.
Ingestion from CLI
We have stopped updating the service connection parameters when running the ingestion workflow from the CLI. The connection parameter will be retrieved from the server if the service already exists. Therefore, the connection parameters of a service will only be possible to be updated from the OpenMetadata UI.
Bots configuration
In the 0.12.1 version, AIRFLOW_AUTH_PROVIDER
and OM_AUTH_AIRFLOW_{AUTH_PROVIDER}
parameters are not needed to configure how the ingestion is performed from Airflow when our OpenMetadata server is secured. This can be achieved directly from UI through the Bots configuration in the settings page. For more information, visit the section of each SSO configuration in the Enable Security
chapter.
Note that the ingestion-bot
bot is created (or updated if it already exists) as a system bot that cannot be deleted, and the credentials used for this bot, if they did not exist before, will be the ones present in the OpenMetadata configuration. Otherwise, a JWT Token will be generated to be the default authentication mechanism of the ingestion-bot
.
0.12.3 - Stable release
OpenMetadata 0.12.3 is a stable release. Please check the release notes
If you are upgrading production this is the recommended version to upgrade.
Breaking Changes from 0.12.x Stable Release
OpenMetadata Release 0.12.x introduces below breaking changes -
Change of OpenMetadata Service Namespace
Under the openmetadata.yaml, all the class names are updated from org.openmetadata.catalog.*
to org.openmetadata.service.*
.
- If you are using a previous version of openmetadata.yaml config file with bare metal installation, make sure to migrate all these values as per new openmetadata.yaml configurations. Check the below example code snippet from openmetadata.yaml configuration
- If you are using docker installation with your custom env file, update all the environement variables from
org.openmetadata.catalog.*
toorg.openmetadata.service.*
.
- If you are running openmetadata on kubernetes with helm charts, make sure to update
global.authorizer.className
andglobal.authorizer.containerRequestFilter
with below values for your custom openmetadata helm chart values file.
Centralising of openmetadata/ingestion and openmetadata/airflow docker images
Starting 0.12.1 Release, we have centralized openmetadata/airflow and openmetadata/ingestion docker images with openmetadata/ingestion docker image which will be used with docker compose installation and kubernetes helm chart installation. This docker image is based on apache-airflow 2.3.3 image with python 3.9.9. This will be a rootless docker image for enhanced security.
There is no change or effect with docker installation
This is a breaking change if you are using a custom openmetadata-dependencies kubernetes helm chart values file. You will need to manually update the airflow image and tag with
openmetadata/ingestion:0.12.3
<p> If you are extending openmetadata/airflow docker image with 0.12.1 release, you can safely replace that with openmetadata/ingestion:0.12.3 Docker Image. </p>
Basic Authentication enabled by default
We have deprecated and removed no-auth as the authentication mechanism starting 0.12.1 Release with OpenMetadata.
The default Authentication mechanism will be basic authentication. You can login to OpenMetadata UI with below default credentials -
Enabled JWT Token Configuration by default
Starting 0.12.1 Release, OpenMetadata Installation will provide a default configuration that will enable JWT Token Configuration for the OpenMetadata Instance.
If you want to setup a production Open Metadata instance, it is recommended to follow enable jwt tokens to setup and configure your own JWT Token configurations.