SFTP

BETA

Feature List

✓ Metadata

✕ Query Usage

✕ Lineage

✕ Column-level Lineage

✕ Data Profiler

✕ Data Quality

✕ Owners

✕ Tags

In this section, we provide guides and references to use the SFTP connector. Configure and schedule SFTP metadata workflows from the OpenMetadata UI:

Requirements
Metadata Ingestion

Requirements

Python Requirements

We have support for Python versions 3.9-3.11

To run the SFTP ingestion, you will need to install:

pip3 install "openmetadata-ingestion[sftp]"

Metadata Ingestion

All connectors are defined as JSON Schemas. Here you can find the structure to create a connection to an SFTP server. In order to create and run a Metadata Ingestion workflow, we will follow the steps to create a YAML configuration able to connect to the source, process the Entities if needed, and reach the OpenMetadata server. The workflow is modeled around the following JSON Schema

1. Define the YAML Config

This is a sample config for SFTP:

Configuration Examples

Basic Authentication (Username/Password)

source:
  type: sftp
  serviceName: sftp_prod
  serviceConnection:
    config:
      type: Sftp
      host: sftp.example.com
      port: 22
      authType:
        username: sftp_user
        password: your_secure_password
      rootDirectories:
        - /data/exports
        - /data/reports

SSH Private Key Authentication

source:
  type: sftp
  serviceName: sftp_prod
  serviceConnection:
    config:
      type: Sftp
      host: sftp.example.com
      port: 22
      authType:
        username: sftp_user
        privateKey: |
          -----BEGIN OPENSSH PRIVATE KEY-----
          <your_private_key_content>
          -----END OPENSSH PRIVATE KEY-----
        # privateKeyPassphrase: your_passphrase  # only if key is encrypted
      rootDirectories:
        - /data

Configuration with File Filtering and Sample Data

source:
  type: sftp
  serviceName: sftp_analytics
  serviceConnection:
    config:
      type: Sftp
      host: sftp.example.com
      port: 22
      authType:
        username: sftp_user
        password: your_secure_password
      rootDirectories:
        - /data/analytics
      directoryFilterPattern:
        includes:
          - /data/analytics/sales/.*
          - /data/analytics/marketing/.*
        excludes:
          - /data/analytics/temp/.*
          - /data/analytics/archive/.*
      fileFilterPattern:
        includes:
          - .*\.csv
          - .*\.tsv
        excludes:
          - .*_backup\.csv
      structuredDataFilesOnly: true
      extractSampleData: true

2. Run with the CLI

First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run:

metadata ingest -c <path-to-yaml>

Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, you will be able to extract metadata from different sources.

Configure Ingestion Externally

Deploy, configure, and manage the ingestion workflows externally.

Connectors

Connector

Ingestion

Run the SFTP Connector Externally

SFTP

Requirements

Python Requirements

Metadata Ingestion

1. Define the YAML Config

Configuration Examples

Basic Authentication (Username/Password)

SSH Private Key Authentication

Configuration with File Filtering and Sample Data

2. Run with the CLI

Configure Ingestion Externally

Connectors

Connector

Ingestion

SFTP

​Requirements

​Python Requirements

​Metadata Ingestion

​1. Define the YAML Config

​Configuration Examples

​Basic Authentication (Username/Password)

​SSH Private Key Authentication

​Configuration with File Filtering and Sample Data

​2. Run with the CLI

​Related

Configure Ingestion Externally

Requirements

Python Requirements

Metadata Ingestion

1. Define the YAML Config

Configuration Examples

Basic Authentication (Username/Password)

SSH Private Key Authentication

Configuration with File Filtering and Sample Data

2. Run with the CLI

Related