developers

No menu items for this category

Understand Code Layout

Use this document as a quick start guide to begin developing in OpenMetadata. Below, we address the following topics:

  1. Schema (Metadata Models)
  2. APIs
  3. System and Components

OpenMetadata takes a schema-first approach to model metadata. We define entities, types, API requests, and relationships between entities. We define the OpenMetadata schema using the JSON Schema vocabulary.

We convert models defined using JSON Schema to Plain Old Java Objects (POJOs) using the jsonschema2pojo-maven-plugin plugin as defined in pom.xml. You can find the generated POJOs under OpenMetadata/openmetadata-service/target/generated-sources/jsonschema2pojo.

You can locate defined entities in the directory OpenMetadata/openmetadata-spec/src/main/resources/json/schema/entity. Currently, OpenMetadata supports the following entities:

  • data
  • feed
  • policies
  • services
  • tags
  • teams

All OpenMetadata supported types are defined under OpenMetadata/openmetadata-spec/src/main/resources/json/schema/type.

The API request objects are defined under OpenMetadata/openmetadata-spec/src/main/resources/json/schema/api.

OpenMetadata uses the Dropwizard Java framework to build REST APIs. You can locate defined APIs in the directory OpenMetadata/openmetadata-service/src/main/java/org/openmetadata/service/resources. OpenMetadata uses Swagger to generate API documentation following OpenAPI standards.

System and Components

Overview of the OpenMetadata components and high-level interactions.

OpenMetadata captures changes to entities as events and stores them in the OpenMetadata server database. OpenMetadata also indexes change events in Elasticsearch to make them searchable.

The event handlers are defined under OpenMetadata/openmetadata-service/src/main/java/org/openmetadata/service/events and are applied globally to any outgoing response using the ContainerResponseFilter.

OpenMetadata uses MySQL or Postgres for the metadata catalog. The catalog code is located in the directory OpenMetadata/openmetadata-service/src/main/java/org/openmetadata/service/jdbi3.

The database entity tables are created with the script OpenMetadata/bootstrap/openmetadata-ops.sh. Flyway is used for managing the database table versions.

OpenMetadata uses Elasticsearch to store the Entity change events and makes it searchable by search index. The OpenMetadata/openmetadata-service/src/main/java/org/openmetadata/service/elasticsearch/ElasticSearchEventPublisher.java is responsible for capturing the change events and updating Elasticsearch.

Elasticsearch indices are created when the OpenMetadata/ingestion/pipelines/metadata_to_es.json ingestion connector is run.

OpenMetadata uses Google OAuth for authentication. All incoming requests are filtered by validating the JWT token using the Google OAuth provider. Access control is provided by Authorizer.

See the configuration file OpenMetadata /conf/openmetadata.yaml for the authentication and authorization configurations.

Ingestion is a simple Python framework to ingest metadata from external sources into OpenMetadata.

Connectors

OpenMetadata defines and uses a set of components called Connectors for metadata ingestion. Each data service requires its own connector. See the documentation on how to build a connector for details on developing connectors for new services.

  1. Workflow OpenMetadata/ingestion/src/metadata/ingestion/api/workflow.py
  2. Source OpenMetadata/ingestion/src/metadata/ingestion/api/source.py
  3. Processor OpenMetadata/ingestion/src/metadata/ingestion/api/processor.py
  4. Sink OpenMetadata/ingestion/src/metadata/ingestion/api/sink.py
  5. Stage OpenMetadata/ingestion/src/metadata/ingestion/api/stage.py
  6. BulkSink OpenMetadata/ingestion/src/metadata/ingestion/api/bulk_sink.py

Workflow is a simple orchestration job that runs Source, Processor, Sink, Stage and BulkSink based on the configurations present under OpenMetadata/ingestion/examples/workflows.

There are some popular connectors already developed and can be found under:

  1. Source → OpenMetadata/ingestion/src/metadata/ingestion/source
  2. Processor → OpenMetadata/ingestion/src/metadata/ingestion/processor
  3. Sink → OpenMetadata/ingestion/src/metadata/ingestion/sink
  4. Stage → OpenMetadata/ingestion/src/metadata/ingestion/stage
  5. BulkSink → OpenMetadata/ingestion/src/metadata/ingestion/bulksink

Airflow

For simplicity, OpenMetadata ingests metadata from external sources using a pull-based model. OpenMetadata uses Apache Airflow to orchestrate ingestion workflows.

See the directory OpenMetadata/ingestion/examples/airflow/dags for reference DAG definitions.

JsonSchema python typings

You can generate Python types for OpenMetadata models defined using Json Schema using the make generate command of the Makefile. Generated files are located in the directory OpenMetadata/ingestion/src/metadata/generated