> ## Documentation Index
> Fetch the complete documentation index at: https://docs.open-metadata.org/llms.txt
> Use this file to discover all available pages before exploring further.

# dbt Artifact Storage - AWS S3 Configuration | OpenMetadata

> Complete guide to configuring AWS S3 for dbt artifact storage with OpenMetadata. Includes Airflow DAG examples, IAM policies, and setup instructions.

# dbt Artifact Storage: AWS S3 Configuration

This guide walks you through configuring **AWS S3** as the artifact storage layer for dbt Core + OpenMetadata integration. After completing this guide, your dbt artifacts will automatically sync to OpenMetadata for metadata extraction and lineage tracking.

## Prerequisites Checklist

| Requirement          | Details                                                | How to Verify                 |
| -------------------- | ------------------------------------------------------ | ----------------------------- |
| **AWS Account**      | With permissions to create S3 buckets and IAM policies | `aws sts get-caller-identity` |
| **AWS CLI**          | Installed and configured                               | `aws --version`               |
| **dbt Project**      | Existing dbt project                                   | `dbt debug`                   |
| **Orchestration**    | Airflow or similar scheduler                           | Access to DAG configuration   |
| **Database Service** | Data warehouse already ingested                        | Check Settings → Services     |

## Step 1: AWS S3 Setup

### 1.1 Create S3 Bucket

```bash theme={null}
# Set your variables
export AWS_REGION="us-east-1"
export BUCKET_NAME="your-company-dbt-artifacts"

# Create the bucket
aws s3 mb s3://${BUCKET_NAME} --region ${AWS_REGION}

# Verify bucket creation
aws s3 ls | grep ${BUCKET_NAME}
```

**Expected output:**

```
2026-02-10 10:30:00 your-company-dbt-artifacts
```

### 1.2 Create IAM Policy for dbt (Write Access)

Your Airflow/dbt environment needs permission to **write** to S3.

Save this as `dbt-s3-write-policy.json`:

```json theme={null}
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowDBTArtifactUpload",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": "arn:aws:s3:::your-company-dbt-artifacts/dbt-artifacts/*"
        },
        {
            "Sid": "AllowBucketListing",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::your-company-dbt-artifacts"
        }
    ]
}
```

Create and attach the policy:

```bash theme={null}
# Create the IAM policy
aws iam create-policy \
    --policy-name dbt-s3-write-policy \
    --policy-document file://dbt-s3-write-policy.json

# Attach to your Airflow/ECS role
export AIRFLOW_ROLE_NAME="your-airflow-task-role"

aws iam attach-role-policy \
    --role-name ${AIRFLOW_ROLE_NAME} \
    --policy-arn arn:aws:iam::YOUR_ACCOUNT_ID:policy/dbt-s3-write-policy
```

### 1.3 Create IAM Policy for OpenMetadata (Read Access)

OpenMetadata needs permission to **read** from S3.

Save this as `collate-s3-read-policy.json`:

```json theme={null}
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowOpenMetadataRead",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::your-company-dbt-artifacts",
                "arn:aws:s3:::your-company-dbt-artifacts/dbt-artifacts/*"
            ]
        }
    ]
}
```

Create the policy:

```bash theme={null}
# Create the policy
aws iam create-policy \
    --policy-name collate-s3-read-policy \
    --policy-document file://collate-s3-read-policy.json

# Attach to OpenMetadata's role or create access keys for OpenMetadata user
```

### 1.4 Verify S3 Access

```bash theme={null}
# Create test file
echo "test" > /tmp/test.txt

# Upload it
aws s3 cp /tmp/test.txt s3://${BUCKET_NAME}/dbt-artifacts/test.txt

# Verify it exists
aws s3 ls s3://${BUCKET_NAME}/dbt-artifacts/

# Clean up
aws s3 rm s3://${BUCKET_NAME}/dbt-artifacts/test.txt
rm /tmp/test.txt
```

## Step 2: Upload Artifacts from dbt

### 2.1 Understanding dbt Artifacts

OpenMetadata requires these dbt-generated files:

| File               | Generated By                          | Required?   | What It Contains                              |
| ------------------ | ------------------------------------- | ----------- | --------------------------------------------- |
| `manifest.json`    | `dbt run`, `dbt compile`, `dbt build` | **YES**     | Models, sources, lineage, descriptions, tests |
| `catalog.json`     | `dbt docs generate`                   | Recommended | Column names, types, descriptions             |
| `run_results.json` | `dbt run`, `dbt test`, `dbt build`    | Optional    | Test pass/fail results, timing                |

**Generate all artifacts:**

```bash theme={null}
dbt run          # Generates manifest.json
dbt test         # Updates run_results.json
dbt docs generate # Generates catalog.json
```

### 2.2 Complete Airflow DAG Example

This is a **complete, working DAG** for uploading dbt artifacts to S3.

**Save as `dbt_with_collate.py` in your Airflow DAGs folder:**

```python theme={null}
"""
dbt + OpenMetadata Integration DAG (S3 Method)

This DAG:
1. Runs dbt models
2. Runs dbt tests
3. Generates dbt documentation (catalog.json)
4. Uploads all artifacts to S3

No OpenMetadata packages are installed in this Airflow environment.
OpenMetadata pulls the artifacts from S3 independently.
"""

import os
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
from airflow.utils.task_group import TaskGroup


# =============================================================================
# CONFIGURATION
# =============================================================================

# dbt Configuration
DBT_PROJECT_DIR = os.getenv("DBT_PROJECT_DIR", "/opt/airflow/dbt/my_project")
DBT_PROFILES_DIR = os.getenv("DBT_PROFILES_DIR", "/opt/airflow/dbt")

# S3 Configuration
S3_BUCKET = os.getenv("S3_BUCKET", "your-company-dbt-artifacts")
S3_PREFIX = os.getenv("S3_PREFIX", "dbt-artifacts")
AWS_REGION = os.getenv("AWS_DEFAULT_REGION", "us-east-1")

# =============================================================================
# DAG DEFAULT ARGUMENTS
# =============================================================================

default_args = {
    "owner": "data-engineering",
    "depends_on_past": False,
    "email": ["data-team@yourcompany.com"],
    "email_on_failure": True,
    "email_on_retry": False,
    "retries": 2,
    "retry_delay": timedelta(minutes=5),
    "execution_timeout": timedelta(hours=2),
}

# =============================================================================
# PYTHON FUNCTIONS
# =============================================================================

def upload_artifacts_to_s3(**context):
    """
    Upload dbt artifacts to S3.

    Uses boto3 (AWS SDK) which is typically available in Airflow.
    If not: pip install boto3
    """
    import boto3
    from botocore.exceptions import ClientError

    s3_client = boto3.client("s3", region_name=AWS_REGION)
    target_dir = os.path.join(DBT_PROJECT_DIR, "target")

    # Files to upload
    artifacts = [
        ("manifest.json", True),      # Required
        ("catalog.json", False),      # Optional but recommended
        ("run_results.json", False),  # Optional
        ("sources.json", False),      # Optional
    ]

    uploaded = []
    failed = []

    for filename, required in artifacts:
        local_path = os.path.join(target_dir, filename)
        s3_key = f"{S3_PREFIX}/{filename}"

        if os.path.exists(local_path):
            try:
                s3_client.upload_file(local_path, S3_BUCKET, s3_key)
                uploaded.append(filename)
                print(f"✓ Uploaded {filename} to s3://{S3_BUCKET}/{s3_key}")
            except ClientError as e:
                error_msg = f"✗ Failed to upload {filename}: {e}"
                print(error_msg)
                if required:
                    raise Exception(error_msg)
                failed.append(filename)
        else:
            if required:
                raise FileNotFoundError(
                    f"Required artifact not found: {local_path}\n"
                    f"Make sure 'dbt run' completed successfully."
                )
            else:
                print(f"⊘ Skipping {filename} (not found - optional)")

    # Log summary
    print(f"\n{'='*50}")
    print(f"Upload Summary:")
    print(f"  Uploaded: {', '.join(uploaded) or 'None'}")
    print(f"  Skipped:  {', '.join(failed) or 'None'}")
    print(f"  S3 Location: s3://{S3_BUCKET}/{S3_PREFIX}/")
    print(f"{'='*50}")

    return {"uploaded": uploaded, "bucket": S3_BUCKET, "prefix": S3_PREFIX}


# =============================================================================
# DAG DEFINITION
# =============================================================================

with DAG(
    dag_id="dbt_with_collate",
    default_args=default_args,
    description="Run dbt models and sync metadata to OpenMetadata via S3",
    schedule_interval="0 6 * * *",  # Daily at 6 AM UTC
    start_date=datetime(2024, 1, 1),
    catchup=False,
    max_active_runs=1,
    tags=["dbt", "collate", "data-pipeline"],
) as dag:

    # Task Group: dbt Execution
    with TaskGroup(group_id="dbt_execution") as dbt_tasks:

        dbt_run = BashOperator(
            task_id="dbt_run",
            bash_command=f"""
                cd {DBT_PROJECT_DIR} && \
                dbt run --profiles-dir {DBT_PROFILES_DIR}
            """,
        )

        dbt_test = BashOperator(
            task_id="dbt_test",
            bash_command=f"""
                cd {DBT_PROJECT_DIR} && \
                dbt test --profiles-dir {DBT_PROFILES_DIR}
            """,
            trigger_rule="all_done",  # Run even if dbt_run fails
        )

        dbt_docs = BashOperator(
            task_id="dbt_docs_generate",
            bash_command=f"""
                cd {DBT_PROJECT_DIR} && \
                dbt docs generate --profiles-dir {DBT_PROFILES_DIR}
            """,
        )

        dbt_run >> dbt_test >> dbt_docs

    # Upload to S3
    upload_to_s3 = PythonOperator(
        task_id="upload_artifacts_to_s3",
        python_callable=upload_artifacts_to_s3,
        provide_context=True,
    )

    # DAG Dependencies
    dbt_tasks >> upload_to_s3
```

### 2.3 Verify DAG Deployment

```bash theme={null}
# Check DAG is visible in Airflow
airflow dags list | grep dbt

# Trigger manual run
airflow dags trigger dbt_with_collate

# Check S3 after DAG completes
aws s3 ls s3://your-company-dbt-artifacts/dbt-artifacts/
```

**Expected S3 output:**

```
2026-02-10 10:30:00   5242880 manifest.json
2026-02-10 10:30:01   1048576 catalog.json
2026-02-10 10:30:01    102400 run_results.json
```

## Step 3: Configure OpenMetadata

### Configuration

1. Go to **Settings → Services → Database Services**
2. Click on your database service (e.g., "production-snowflake")
3. Go to the **Ingestion** tab
4. Click **Add Ingestion**
5. Select **dbt** from the dropdown

**Configure dbt Source (S3):**

| Field                        | Value                        | Notes                        |
| ---------------------------- | ---------------------------- | ---------------------------- |
| **dbt Configuration Source** | `S3`                         | Select from dropdown         |
| **S3 Bucket Name**           | `your-company-dbt-artifacts` | Your bucket name             |
| **S3 Object Prefix**         | `dbt-artifacts`              | Folder path (no leading `/`) |
| **AWS Region**               | `us-east-1`                  | Your region                  |

**AWS Credentials (choose one):**

**Option A: Using Access Keys**

| Field                     | Value          |
| ------------------------- | -------------- |
| **AWS Access Key ID**     | `AKIA...`      |
| **AWS Secret Access Key** | `wJalrXUtn...` |

**Option B: Using IAM Role** (if OpenMetadata runs on AWS)

| Field                     | Value         |
| ------------------------- | ------------- |
| **AWS Access Key ID**     | *Leave empty* |
| **AWS Secret Access Key** | *Leave empty* |

**Configure dbt Options:**

| Field                   | Recommended Value |
| ----------------------- | ----------------- |
| **Update Descriptions** | `Enabled`         |
| **Update Owners**       | `Enabled`         |
| **Include Tags**        | `Enabled`         |
| **Classification Name** | `dbtTags`         |

**Test & Deploy:**

1. Click **Test Connection**
2. If successful, click **Deploy**
3. Click **Run** to trigger immediately

## Verification

After running the full pipeline, verify:

| Check                   | How to Verify                             | Expected Result                    |
| ----------------------- | ----------------------------------------- | ---------------------------------- |
| **S3 artifacts exist**  | `aws s3 ls s3://bucket/dbt-artifacts/`    | manifest.json, catalog.json listed |
| **Ingestion completed** | OpenMetadata UI → Service → Ingestion tab | Green status, no errors            |
| **Lineage appears**     | Click on a dbt model → Lineage tab        | Upstream/downstream connections    |
| **Descriptions synced** | Click on a table → Schema tab             | Column descriptions visible        |
| **Tags appear**         | Click on a table → Tags section           | dbt tags shown                     |

## Troubleshooting

| Issue                  | Symptom                         | Cause                                | Solution                                                 |
| ---------------------- | ------------------------------- | ------------------------------------ | -------------------------------------------------------- |
| **Access Denied**      | "403 Forbidden" error           | IAM permissions insufficient         | Verify IAM policy has `s3:GetObject` and `s3:ListBucket` |
| **Manifest not found** | "dbtManifestFilePath not found" | S3 path incorrect                    | Check `dbtObjectPrefix` matches your S3 structure        |
| **No lineage**         | Tables exist but no lineage     | Database metadata not ingested first | Run database metadata ingestion before dbt ingestion     |
| **Stale data**         | Old lineage/descriptions        | Old artifacts in S3                  | Verify dbt DAG uploads fresh artifacts                   |
| **Missing columns**    | No column descriptions          | Missing catalog.json                 | Ensure `dbt docs generate` runs and uploads              |

## Next Steps

* [Configure dbt Workflow](/v1.12.x/connectors/database/dbt/configure-dbt-workflow)
* [Auto Ingest dbt Core](/v1.12.x/connectors/database/dbt/auto-ingest-dbt-core)
* [dbt Troubleshooting](/v1.12.x/connectors/database/dbt/dbt-troubleshooting)

<Note>
  See other storage options: [GCS](/v1.12.x/connectors/database/dbt/storage-gcs-guide) | [Azure](/v1.12.x/connectors/database/dbt/storage-azure-guide) | [HTTP](/v1.12.x/connectors/database/dbt/storage-http-guide) | [Local](/v1.12.x/connectors/database/dbt/storage-local-guide) | [dbt Cloud](/v1.12.x/connectors/database/dbt/dbt-cloud-api-guide)
</Note>
