Backup & Restore Metadata
Introduction
The goal of OpenMetadata is to enable company-wide collaboration around metadata. The more we use it, the more value this brings to the table, which means that keeping the metadata safe can become a critical activity for our Disaster Recovery practices.
While there are cloud services that feature automatic snapshots and replication, the metadata CLI now allows all users to perform backups regardless of the underlying infrastructure.
Requirements
The backup CLI needs to be used with openmetadata-ingestion
version 0.13.1 or higher.
However, we recommend to make sure that you are always in the latest openmetadata-ingestion
version to have all the improvements shipped in the CLI.
Installation
The CLI comes bundled in the base openmetadata-ingestion
Python package. You can install it with:
One of the backup
features is to upload the generated backup to cloud storage (currently supporting S3 and Azure Blob). To use this, you can instead install the package with the backup plugin:
Requirements & Considerations
This is a custom utility. As almost all tables contain GENERATED
columns, directly using mysqldump
is not an option out of the box, as it would require some further cleaning steps to get the data right.
Instead, we have created a utility that will just dump the necessary data.
The requirement for running the process is that the target database should have the Flyway migrations executed.
The backup utility will provide an SQL file which will do two things:
- TRUNCATE the OpenMetadata tables
- INSERT the data that has been saved
You can then run the script's statements to restore the data.
Make sure that the migrations have been run correctly (find out how here).
Also, make sure that the target database does not already have any OpenMetadata data, or if it does, that you are OK replacing it with whatever comes from the SQL script.
Backup CLI
After the installation, we can take a look at the different options to run the CLI:
If using MySQL:
If using Postgres, add the -s
parameter specifying the schema
:
Database Connection
There is a set of four required parameters, the minimum required for us to access the database service and run the backup: host
, user
, password
and database
to point to. Note that the user should have at least read access to the database. By default, we'll try to connect through the port 3306
, but this can be overridden with the --port
option.
Output
The CLI will create a dump file that looks like openmetadata_YYYYmmddHHMM_backup.sql
. This will help us identify the date each backup was generated. We can also specify an output path, which we'll create if it does not exist, via --output
.
Uploading to S3
To run this, make sure to have AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
as environment variables with permissions to the bucket that you'd like to point to. Afterwards, we can just use --upload <endpoint> <bucket> <key>
to have the CLI upload the file. In this case, you'll get both the local dump file and the one in the cloud.
Uploading to Azure Blob
To run this, make sure to have Azure CLI configured with permissions to the Blob that you'd like to point to. Afterwards, we can just use --upload <account_url> <container> <folder>
to have the CLI upload the file. In this case, you'll get both the local dump file and the one in the cloud.
Connection Options and Arguments
You can pass any required connection options or arguments to the MySQL connection via -o <opt1>, -o <opt2> [...]
or -a <arg1>, -a <arg2> [...]
.
Trying it out
We can do a test locally preparing some containers:
sh docker/run_local_docker.sh
to start thedocker compose
service.docker run -p 9000:9000 -p 9001:9001 minio/minio server /data --console-address ":9001"
to start minio, an object storage S3 compatible.- Connect to http://localhost:9001 URL to reach the minio console and create a bucket called
my-bucket
- Finally, we just need to prepare the environment variables as:
An example of S3 CLI call will look as:
And we'll get the following output:
If we now head to the minio console and check the my-backup
bucket, we'll see our SQL dump in there.

An example of Azure Blob CLI call will look as:
And we'll get the following output:
Restore Metadata
Make sure that when restoring metadata, your OpenMetadata server is NOT RUNNING. Otherwise, there could be clashes on cached values and specific elements that the server creates at start-time and that will be present on the restore SQL file as well.
Introduction
SQL file which is generated using Backup metadata CLI can restore using Restore metadata CLI.
Requirements
The restore CLI needs to be used with openmetadata-ingestion
version 0.12.1 or higher.
Restore CLI
After the installation, we can take a look at the different options to run the CLI:
Output
The CLI will give messages like this Backup restored from openmetadata_202209301715_backup.sql
when backup restored completed.
Trying it out
An example CLI call will look as:
And we'll get the following output: