We are now going to present a high-level Python API as a type-safe and gentle wrapper for the OpenMetadata backend.
The Python SDK is part of the
openmetadata-ingestion base package. You can install it from PyPI.
Make sure to use the same
openmetadata-ingestion version as your server version. For example, if you have the OpenMetadata server at version 0.13.0, you will need to install:
In the OpenMetadata Design, we have been dissecting the internals of OpenMetadata. The main conclusion here is twofold:
- Everything is handled via the API, and
- Data structures (Entity definitions) are at the heart of the solution.
This means that whenever we need to interact with the metadata system or develop a new connector or logic, we have to make sure that we pass the proper inputs and handle the types of outputs.
Let's suppose that we have our local OpenMetadata server running at
http:localhost:8585. We can play with it with simple
httpie commands, and if we just want to take a look at the Entity instances we have lying around, that might probably be enough.
However, let's imagine that we want to create or update an ML Model Entity with a
PUT. To do so, we need to make sure that we are providing a proper JSON, covering all the attributes and types required by the Entity definition.
If we needed to repeat this process with a full-fledged model that is built ad-hoc and updated during the CICD process, we would just be adding a hardly maintainable, error-prone requirement to our production deployment pipelines.
The same would happen if, inside the actual OpenMetadata code, there was not a way to easily interact with the API and make sure that we send proper data and can safely process the outputs.
As OpenMetadata is a data-centric solution, we need to make sure we have the right ingredients at all times. That is why we have developed a high-level Python API, using pydantic models automatically generated from the JSON Schemas.
OBS: If you are using a published version of the Ingestion Framework, you are already good to go, as we package the code with the metadata.generated module. If you are developing a new feature, you can get more information here.
This API wrapper helps developers and consumers in:
- Validating data during development and with specific error messages at runtime,
- Receiving typed responses to ease further processing.
Thanks to the recursive model setting of
pydantic the example above can be rewritten using only Python classes, and thus being able to get help from IDEs and the Python interpreter. We can rewrite the previous JSON as:
Now that we know how to directly use the pydantic models, we can start showcasing the solution. This module has been built with two main principles in mind:
- Reusability: We should be able to support existing and new entities with minimum effort,
- Extensibility: However, we are aware that not all Entities are the same. Some of them may require specific functionalities or slight variations (such as
Location), so it should be easy to identify those special methods and create new ones when needed.
To this end, we have the main class
OpenMetadata (source) based on Python's
TypeVar. Thanks to this we can exploit the complete power of the
pydantic models, having methods with Type Parameters that know how to respond to each Entity.
At the same time, we have the Mixins (source) module, with special extensions to some Entities.
Let's use Python's API to create, update and delete a
Table Entity. Choosing the
Table is a nice starter, as its attributes define the following hierarchy:
This will help us showcase how we can reuse the same syntax with the three different Entities.
OpenMetadata is the class holding the connection to the API and handling the requests. We can instantiate this by passing the proper configuration to reach the server API:
For local development, we can get a JWT token for the ingestion bot as described here and use that when we specify the
jwtToken. For a real-world deployment, we can also use different authentication methods and specify other settings of the connection (such as
The OpenMetadataConnection is defined as a JSON Schema as well. You can check the definition here
From this point onwards, we will interact with the API by using
An interesting validation we can already make at this point is verifying that the service is reachable and healthy. To do so, we can validate the
Bool output from:
Following the hierarchy, we need to start by defining a
DatabaseService. This will be system hosting our
Database, which will contain the
Recall how we have mainly two types of models:
- Entity definitions, such as
- API definitions, useful when running a
As we are just creating Entities right now, we'll stick to the
pydantic models with the API definitions.
Let's imagine that we are defining a MySQL:
Note how we can use both
String definitions for the attributes, as well as specific types when possible, such as
serviceType=DatabaseServiceType.Mysql. The less information we need to hardcode, the better.
Another important point here is that the connection definitions are centralized as JSON Schemas. Here you can find the root of all of them.
We can review the information that will be passed to the API by visiting the JSON definition of the class we just instantiated. As all these models are powered by
pydantic, this conversion is transparent to us:
Executing the actual creation is easy! As our
create_service variable already holds the proper datatype, there is a single line to execute:
Moreover, running a
create_or_update will return us the Entity type, so we can explore its attributes easily:
We can now repeat the process to create a
Database Entity. However, if we review the definition of the
Note how the only non-optional fields are
service. The type of
service, however, is
FullyQualifiedEntityName. This is expected, as there we need to pass the information of an existing Entity. In our case, the
fullyQualifiedName of the
DatabaseService we just created.
In the case of the
owner field, repeating the exercise and reviewing the required fields to instantiate an
EntityReference we notice how we need to pass an
id: uuid.UUID and
type: str. There we need to specify the
type of an
Querying by name
id we actually saw it by printing the
service_entity JSON. However, let's imagine that it did not happen, and the only information we have from the
DatabaseService is its name.
To retrieve the
id, we should then ask the
metadata to find our Entity by its FQN:
We have just used the
get_by_name method. This method is the same that we will use for any Entity. This is why as an argument, we need to provide the
entity field. Again, instead of relying on error-prone handwritten parameters, we can just pass the
pydantic model we expect to get back. In our case, a
With the addition of the Schema Entity in 0.10, we now also need to create a Schema, which will be the one containing the Tables. As this entity is a link between other entities, an Entity Reference will be required too.
Now that we have all the preparations ready, we can just reuse the same steps to create the
Let's now update the
Table by adding an owner. This will require us to create a
User, and then update the
Table with it. Afterwards, we will validate that the information has been properly stored.
First, make sure that no owner has been set during the creation:
Now, create a
Update our instance of
create_table to add the
owner field (we need to use the
Create class as we'll run a
PUT), and update the Entity:
If we did not save the
updated_table_entity variable, we should need to query it to review the
owner field. We can run the
get_by_name using the proper FQN definition for
When querying an Entity we might not find it! The Entity could not exist, or there might be an error in the
In those cases, the
get method won't fail, but instead will return None. Note that the signature of the
get methods is
Optional[T], so make sure to validate that there is data coming back!
Finally, we can clean up the Table by running the
We could directly clean up the service itself with a Hard and Recursive delete. Note that this is ok for this test, but beware when working with production data!