Scaffold a Connector

The metadata scaffold-connector command generates all the boilerplate files for a new connector: JSON Schema, test connection definition, Python source files, and a CONNECTOR_CONTEXT.md that any AI agent can use to implement the connector.

Prerequisites

Set up the development environment first:

cd OpenMetadata
python3.11 -m venv env
source env/bin/activate
make install_dev generate

Interactive Mode

Run the scaffold tool with no arguments to enter interactive mode:

source env/bin/activate
metadata scaffold-connector

The tool walks you through a series of prompts:

Prompt	What It Controls
Connector name	Directory name, class names, schema file name
Service type	Base class, directory, test patterns
Connection type	Database only: sqlalchemy, rest_api, or sdk_client
Auth types	Which auth `$ref` schemas to include
Capabilities	Which extra files to generate (lineage, usage, profiler)
Docs URL	API/SDK documentation — included in AI context
SDK package	Python package name — included in AI context
API endpoints	Key endpoints — included in AI context
Implementation notes	Auth quirks, pagination, rate limits — included in AI context
Docker image	If available, included in AI context for integration tests
Container port	Port to expose from the Docker container

Non-Interactive Mode

Pass all options as flags for scripted or CI use:

metadata scaffold-connector \
    --name clickhouse \
    --service-type database \
    --connection-type sqlalchemy \
    --scheme "clickhousedb+connect" \
    --auth-types basic \
    --capabilities metadata lineage usage profiler \
    --docs-url "https://clickhouse.com/docs/en/interfaces/http" \
    --sdk-package "clickhouse-connect" \
    --docker-image "clickhouse/clickhouse-server:latest" \
    --docker-port 8123

Only --name and --service-type are required. All other flags have sensible defaults.

What Gets Generated

JSON Schema (Single Source of Truth)

openmetadata-spec/src/main/resources/json/schema/entity/services/connections/{service_type}/{name}Connection.json

This file drives code generation for Python Pydantic models, Java models, TypeScript types, and UI forms. The scaffold generates it with correct $ref patterns for auth, SSL, filters, and capability flags.

Test Connection Definition

openmetadata-service/src/main/resources/json/data/testConnections/{service_type}/{name}.json

Defines the steps for testing a connection (e.g., CheckAccess, GetDatabases). Step names must match the test_fn dictionary in connection.py.

Python Source Files

For SQLAlchemy database connectors, the scaffold generates concrete, nearly-complete templates:

ingestion/src/metadata/ingestion/source/database/{name}/
├── __init__.py          # Empty module marker
├── connection.py        # BaseConnection[Config, Engine] subclass
├── metadata.py          # CommonDbSourceService subclass
├── service_spec.py      # DefaultDatabaseSpec registration
├── queries.py           # SQL query templates
├── lineage.py           # LineageSource mixin (if lineage selected)
├── usage.py             # UsageSource mixin (if usage selected)
├── query_parser.py      # QueryParserSource (if lineage or usage)
└── CONNECTOR_CONTEXT.md # AI implementation brief

For all other connector types (dashboard, pipeline, messaging, non-SQLAlchemy database, etc.), the scaffold generates skeleton files:

ingestion/src/metadata/ingestion/source/{service_type}/{name}/
├── __init__.py          # Empty module marker
├── connection.py        # Skeleton → points to CONNECTOR_CONTEXT.md
├── metadata.py          # Skeleton → points to CONNECTOR_CONTEXT.md
├── service_spec.py      # Skeleton → points to CONNECTOR_CONTEXT.md
├── client.py            # Skeleton → points to CONNECTOR_CONTEXT.md
└── CONNECTOR_CONTEXT.md # AI implementation brief

Each skeleton file contains a pointer to the reference connector and CONNECTOR_CONTEXT.md for implementation guidance.

CONNECTOR_CONTEXT.md

This is the key file for AI-assisted development. It contains:

Connector profile (name, type, capabilities, auth)
Source documentation you provided (API docs, SDK package, endpoints, notes)
Complete file list with what to implement in each
Reference connector path for copying patterns
Registration checklist (exact files and changes needed)
Validation checklist

Service Types

Type	Connection Types	Reference
`database`	`sqlalchemy`, `rest_api`, `sdk_client`	`mysql/` (SQLAlchemy), `salesforce/` (REST)
`dashboard`	`rest_api`, `sdk_client`	`metabase/`
`pipeline`	`rest_api`, `sdk_client`	`airflow/`
`messaging`	`rest_api`, `sdk_client`	`kafka/`
`mlmodel`	`rest_api`, `sdk_client`	`mlflow/`
`storage`	`rest_api`, `sdk_client`	`s3/`
`search`	`rest_api`, `sdk_client`	`elasticsearch/`
`api`	`rest_api`, `sdk_client`	`rest/`

Examples

Database with SQLAlchemy

metadata scaffold-connector \
    --name my_olap_db \
    --service-type database \
    --connection-type sqlalchemy \
    --scheme "myolap+pyodbc" \
    --default-port 10000 \
    --auth-types basic iam \
    --capabilities metadata lineage usage profiler data_diff

Dashboard with REST API

metadata scaffold-connector \
    --name my_bi_tool \
    --service-type dashboard \
    --auth-types token \
    --docs-url "https://docs.example.com/api/v1" \
    --api-endpoints "GET /dashboards, GET /charts, GET /datasources" \
    --docs-notes "Uses cursor-based pagination. Rate limit: 100 req/min."

Pipeline with SDK

metadata scaffold-connector \
    --name my_orchestrator \
    --service-type pipeline \
    --connection-type sdk_client \
    --auth-types token \
    --sdk-package "my-orchestrator-sdk" \
    --docker-image "myorch/server:latest" \
    --docker-port 8080

Next Steps

After scaffolding, follow the Build with AI guide to implement the connector using your preferred AI tool. Or continue manually with the existing guides:

Define JSON Schema (already done by scaffold)
Develop Ingestion Code
Apply UI Changes

Documentation Index

​Scaffold a Connector

​Prerequisites

​Interactive Mode

​Non-Interactive Mode

​What Gets Generated

​JSON Schema (Single Source of Truth)

​Test Connection Definition

​Python Source Files

​CONNECTOR_CONTEXT.md

​Service Types

​Examples

​Database with SQLAlchemy

​Dashboard with REST API

​Pipeline with SDK

​Next Steps

Scaffold a Connector

Prerequisites

Interactive Mode

Non-Interactive Mode

What Gets Generated

JSON Schema (Single Source of Truth)

Test Connection Definition

Python Source Files

CONNECTOR_CONTEXT.md

Service Types

Examples

Database with SQLAlchemy

Dashboard with REST API

Pipeline with SDK

Next Steps