Skip to main content
SFTP

SFTP

BETA
Feature List
Metadata
Query Usage
Lineage
Column-level Lineage
Data Profiler
Data Quality
Owners
Tags
In this section, we provide guides and references to use the SFTP connector. Configure and schedule SFTP metadata workflows from the OpenMetadata UI:

Requirements

Python Requirements

We have support for Python versions 3.9-3.11
To run the SFTP ingestion, you will need to install:
pip3 install "openmetadata-ingestion[sftp]"

Metadata Ingestion

All connectors are defined as JSON Schemas. Here you can find the structure to create a connection to an SFTP server. In order to create and run a Metadata Ingestion workflow, we will follow the steps to create a YAML configuration able to connect to the source, process the Entities if needed, and reach the OpenMetadata server. The workflow is modeled around the following JSON Schema

1. Define the YAML Config

This is a sample config for SFTP:

Configuration Examples

Basic Authentication (Username/Password)

source:
  type: sftp
  serviceName: sftp_prod
  serviceConnection:
    config:
      type: Sftp
      host: sftp.example.com
      port: 22
      authType:
        username: sftp_user
        password: your_secure_password
      rootDirectories:
        - /data/exports
        - /data/reports

SSH Private Key Authentication

source:
  type: sftp
  serviceName: sftp_prod
  serviceConnection:
    config:
      type: Sftp
      host: sftp.example.com
      port: 22
      authType:
        username: sftp_user
        privateKey: |
          -----BEGIN OPENSSH PRIVATE KEY-----
          <your_private_key_content>
          -----END OPENSSH PRIVATE KEY-----
        # privateKeyPassphrase: your_passphrase  # only if key is encrypted
      rootDirectories:
        - /data

Configuration with File Filtering and Sample Data

source:
  type: sftp
  serviceName: sftp_analytics
  serviceConnection:
    config:
      type: Sftp
      host: sftp.example.com
      port: 22
      authType:
        username: sftp_user
        password: your_secure_password
      rootDirectories:
        - /data/analytics
      directoryFilterPattern:
        includes:
          - /data/analytics/sales/.*
          - /data/analytics/marketing/.*
        excludes:
          - /data/analytics/temp/.*
          - /data/analytics/archive/.*
      fileFilterPattern:
        includes:
          - .*\.csv
          - .*\.tsv
        excludes:
          - .*_backup\.csv
      structuredDataFilesOnly: true
      extractSampleData: true

2. Run with the CLI

First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run:
metadata ingest -c <path-to-yaml>
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, you will be able to extract metadata from different sources.