Skip to main content

Scaffold a Connector

The metadata scaffold-connector command generates all the boilerplate files for a new connector: JSON Schema, test connection definition, Python source files, and a CONNECTOR_CONTEXT.md that any AI agent can use to implement the connector.

Prerequisites

Set up the development environment first:
cd OpenMetadata
python3.11 -m venv env
source env/bin/activate
make install_dev generate

Interactive Mode

Run the scaffold tool with no arguments to enter interactive mode:
source env/bin/activate
metadata scaffold-connector
The tool walks you through a series of prompts:
PromptWhat It Controls
Connector nameDirectory name, class names, schema file name
Service typeBase class, directory, test patterns
Connection typeDatabase only: sqlalchemy, rest_api, or sdk_client
Auth typesWhich auth $ref schemas to include
CapabilitiesWhich extra files to generate (lineage, usage, profiler)
Docs URLAPI/SDK documentation — included in AI context
SDK packagePython package name — included in AI context
API endpointsKey endpoints — included in AI context
Implementation notesAuth quirks, pagination, rate limits — included in AI context
Docker imageIf available, included in AI context for integration tests
Container portPort to expose from the Docker container

Non-Interactive Mode

Pass all options as flags for scripted or CI use:
metadata scaffold-connector \
    --name clickhouse \
    --service-type database \
    --connection-type sqlalchemy \
    --scheme "clickhousedb+connect" \
    --auth-types basic \
    --capabilities metadata lineage usage profiler \
    --docs-url "https://clickhouse.com/docs/en/interfaces/http" \
    --sdk-package "clickhouse-connect" \
    --docker-image "clickhouse/clickhouse-server:latest" \
    --docker-port 8123
Only --name and --service-type are required. All other flags have sensible defaults.

What Gets Generated

JSON Schema (Single Source of Truth)

openmetadata-spec/src/main/resources/json/schema/entity/services/connections/{service_type}/{name}Connection.json
This file drives code generation for Python Pydantic models, Java models, TypeScript types, and UI forms. The scaffold generates it with correct $ref patterns for auth, SSL, filters, and capability flags.

Test Connection Definition

openmetadata-service/src/main/resources/json/data/testConnections/{service_type}/{name}.json
Defines the steps for testing a connection (e.g., CheckAccess, GetDatabases). Step names must match the test_fn dictionary in connection.py.

Python Source Files

For SQLAlchemy database connectors, the scaffold generates concrete, nearly-complete templates:
ingestion/src/metadata/ingestion/source/database/{name}/
├── __init__.py          # Empty module marker
├── connection.py        # BaseConnection[Config, Engine] subclass
├── metadata.py          # CommonDbSourceService subclass
├── service_spec.py      # DefaultDatabaseSpec registration
├── queries.py           # SQL query templates
├── lineage.py           # LineageSource mixin (if lineage selected)
├── usage.py             # UsageSource mixin (if usage selected)
├── query_parser.py      # QueryParserSource (if lineage or usage)
└── CONNECTOR_CONTEXT.md # AI implementation brief
For all other connector types (dashboard, pipeline, messaging, non-SQLAlchemy database, etc.), the scaffold generates skeleton files:
ingestion/src/metadata/ingestion/source/{service_type}/{name}/
├── __init__.py          # Empty module marker
├── connection.py        # Skeleton → points to CONNECTOR_CONTEXT.md
├── metadata.py          # Skeleton → points to CONNECTOR_CONTEXT.md
├── service_spec.py      # Skeleton → points to CONNECTOR_CONTEXT.md
├── client.py            # Skeleton → points to CONNECTOR_CONTEXT.md
└── CONNECTOR_CONTEXT.md # AI implementation brief
Each skeleton file contains a pointer to the reference connector and CONNECTOR_CONTEXT.md for implementation guidance.

CONNECTOR_CONTEXT.md

This is the key file for AI-assisted development. It contains:
  • Connector profile (name, type, capabilities, auth)
  • Source documentation you provided (API docs, SDK package, endpoints, notes)
  • Complete file list with what to implement in each
  • Reference connector path for copying patterns
  • Registration checklist (exact files and changes needed)
  • Validation checklist

Service Types

TypeConnection TypesReference
databasesqlalchemy, rest_api, sdk_clientmysql/ (SQLAlchemy), salesforce/ (REST)
dashboardrest_api, sdk_clientmetabase/
pipelinerest_api, sdk_clientairflow/
messagingrest_api, sdk_clientkafka/
mlmodelrest_api, sdk_clientmlflow/
storagerest_api, sdk_clients3/
searchrest_api, sdk_clientelasticsearch/
apirest_api, sdk_clientrest/

Examples

Database with SQLAlchemy

metadata scaffold-connector \
    --name my_olap_db \
    --service-type database \
    --connection-type sqlalchemy \
    --scheme "myolap+pyodbc" \
    --default-port 10000 \
    --auth-types basic iam \
    --capabilities metadata lineage usage profiler data_diff

Dashboard with REST API

metadata scaffold-connector \
    --name my_bi_tool \
    --service-type dashboard \
    --auth-types token \
    --docs-url "https://docs.example.com/api/v1" \
    --api-endpoints "GET /dashboards, GET /charts, GET /datasources" \
    --docs-notes "Uses cursor-based pagination. Rate limit: 100 req/min."

Pipeline with SDK

metadata scaffold-connector \
    --name my_orchestrator \
    --service-type pipeline \
    --connection-type sdk_client \
    --auth-types token \
    --sdk-package "my-orchestrator-sdk" \
    --docker-image "myorch/server:latest" \
    --docker-port 8080

Next Steps

After scaffolding, follow the Build with AI guide to implement the connector using your preferred AI tool. Or continue manually with the existing guides:
  1. Define JSON Schema (already done by scaffold)
  2. Develop Ingestion Code
  3. Apply UI Changes