sdk

No menu items for this category

Build a Connector

This design doc will walk through developing a connector for OpenMetadata

Ingestion is a simple python framework to ingest the metadata from various sources.

Please look at our framework APIs.

workflow is a simple orchestration job that runs the components in an Order.

A workflow consists of Source and Sink. It also provides support for Stage and BulkSink.

Workflow execution happens in a serial fashion.

  1. The Workflow runs the source component first. The source retrieves a record from external sources and emits the record downstream.
  2. If the processor component is configured, the workflow sends the record to the processor next.
  3. There can be multiple processor components attached to the workflow. The workflow passes a record to each processor in the order they are configured.
  4. Once a processor is finished, it sends the modified record to the sink.
  5. The above steps are repeated for each record emitted from the source.

In the cases where we need aggregation over the records, we can use the stage to write to a file or other store. Use the file written to in stage and pass it to bulk sink to publish to external services such as OpenMetadata or Elasticsearch.

Each Step comes from this generic definition:

so we always need to inform the methods:

  • create to initialize the actual step.
  • close in case there's any connection that needs to be terminated.

On top of this, you can find further notes on each specific step in the links below:

Read more about the Workflow management here.