Build a Connector

This design doc will walk through developing a connector for OpenMetadata

Ingestion is a simple python framework to ingest the metadata from various sources.

Please look at our framework APIs.

workflow is a simple orchestration job that runs the components in an Order.

A workflow consists of Source and Sink. It also provides support for Stage and BulkSink.

Workflow execution happens in a serial fashion.

  1. The Workflow runs the source component first. The source retrieves a record from external sources and emits the record downstream.
  2. If the processor component is configured, the workflow sends the record to the processor next.
  3. There can be multiple processor components attached to the workflow. The workflow passes a record to each processor in the order they are configured.
  4. Once a processor is finished, it sends the modified record to the sink.
  5. The above steps are repeated for each record emitted from the source.

In the cases where we need aggregation over the records, we can use the stage to write to a file or other store. Use the file written to in stage and pass it to bulk sink to publish to external services such as OpenMetadata or Elasticsearch.

Source The connector to external systems which outputs a record for downstream to process.

Sink It will get the event emitted by the source, one at a time.

Stage It can be used to store the records or to aggregate the work done by a processor.

BulkSink It can be used to bulk update the records generated in a workflow.

Still have questions?

You can take a look at our Q&A or reach out to us in Slack

Was this page helpful?

editSuggest edits