> ## Documentation Index
> Fetch the complete documentation index at: https://docs.open-metadata.org/llms.txt
> Use this file to discover all available pages before exploring further.

# dbt Artifact Storage - Google Cloud Storage Configuration | OpenMetadata

> Complete guide to configuring Google Cloud Storage for dbt artifact storage with OpenMetadata. Includes Cloud Composer DAG, service accounts, and setup.

# dbt Artifact Storage: Google Cloud Storage Configuration

This guide walks you through configuring **Google Cloud Storage (GCS)** as the artifact storage layer for dbt Core + OpenMetadata integration. Perfect for Google Cloud Platform deployments.

## Prerequisites Checklist

| Requirement          | Details                                | How to Verify               |
| -------------------- | -------------------------------------- | --------------------------- |
| **GCP Account**      | With permissions to create GCS buckets | `gcloud auth list`          |
| **gcloud CLI**       | Installed and configured               | `gcloud --version`          |
| **dbt Project**      | Existing dbt project                   | `dbt debug`                 |
| **Orchestration**    | Cloud Composer or Airflow              | Access to DAG configuration |
| **Database Service** | Data warehouse already ingested        | Check Settings → Services   |

## Step 1: GCS Setup

### 1.1 Create GCS Bucket

```bash theme={null}
# Set your variables
export GCP_PROJECT="your-gcp-project-id"
export BUCKET_NAME="your-company-dbt-artifacts"
export REGION="us-central1"

# Set active project
gcloud config set project ${GCP_PROJECT}

# Create the bucket
gsutil mb -p ${GCP_PROJECT} -c STANDARD -l ${REGION} gs://${BUCKET_NAME}

# Verify bucket creation
gsutil ls | grep ${BUCKET_NAME}
```

**Expected output:**

```
gs://your-company-dbt-artifacts/
```

### 1.2 Create Service Account for dbt (Write Access)

Your dbt environment needs permission to **write** to GCS.

```bash theme={null}
# Create service account for dbt
gcloud iam service-accounts create dbt-artifacts-writer \
    --display-name="dbt Artifacts Writer" \
    --project=${GCP_PROJECT}

# Grant Storage Object Creator role
gsutil iam ch \
    serviceAccount:dbt-artifacts-writer@${GCP_PROJECT}.iam.gserviceaccount.com:roles/storage.objectCreator \
    gs://${BUCKET_NAME}

# Create and download service account key
gcloud iam service-accounts keys create ~/dbt-sa-key.json \
    --iam-account=dbt-artifacts-writer@${GCP_PROJECT}.iam.gserviceaccount.com

echo "✓ Service account key saved to: ~/dbt-sa-key.json"
```

<Warning>
  Store the service account key securely. Never commit it to version control.
</Warning>

### 1.3 Create Service Account for OpenMetadata (Read Access)

OpenMetadata needs permission to **read** from GCS.

```bash theme={null}
# Create service account for OpenMetadata
gcloud iam service-accounts create collate-dbt-reader \
    --display-name="OpenMetadata dbt Reader" \
    --project=${GCP_PROJECT}

# Grant Storage Object Viewer role (read-only)
gsutil iam ch \
    serviceAccount:collate-dbt-reader@${GCP_PROJECT}.iam.gserviceaccount.com:roles/storage.objectViewer \
    gs://${BUCKET_NAME}

# Create and download service account key
gcloud iam service-accounts keys create ~/collate-sa-key.json \
    --iam-account=collate-dbt-reader@${GCP_PROJECT}.iam.gserviceaccount.com

echo "✓ Service account key saved to: ~/collate-sa-key.json"
```

### 1.4 (Alternative) Use Workload Identity on GKE

If running on GKE, use Workload Identity instead of service account keys:

```bash theme={null}
# Create GCP Service Account
gcloud iam service-accounts create dbt-workload-identity \
    --project=${GCP_PROJECT}

# Grant bucket access
gsutil iam ch \
    serviceAccount:dbt-workload-identity@${GCP_PROJECT}.iam.gserviceaccount.com:roles/storage.objectCreator \
    gs://${BUCKET_NAME}

# Bind Kubernetes Service Account to GCP Service Account
gcloud iam service-accounts add-iam-policy-binding \
    dbt-workload-identity@${GCP_PROJECT}.iam.gserviceaccount.com \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:${GCP_PROJECT}.svc.id.goog[namespace/k8s-sa-name]"
```

### 1.5 Verify GCS Access

```bash theme={null}
# Set credentials
export GOOGLE_APPLICATION_CREDENTIALS=~/dbt-sa-key.json

# Create test file
echo "test" > /tmp/test.txt

# Upload to GCS
gsutil cp /tmp/test.txt gs://${BUCKET_NAME}/dbt/test.txt

# Verify it exists
gsutil ls gs://${BUCKET_NAME}/dbt/

# Clean up
gsutil rm gs://${BUCKET_NAME}/dbt/test.txt
rm /tmp/test.txt
```

## Step 2: Upload Artifacts from dbt

### 2.1 Understanding dbt Artifacts

OpenMetadata requires these dbt-generated files:

| File               | Generated By                          | Required?   | What It Contains                              |
| ------------------ | ------------------------------------- | ----------- | --------------------------------------------- |
| `manifest.json`    | `dbt run`, `dbt compile`, `dbt build` | **YES**     | Models, sources, lineage, descriptions, tests |
| `catalog.json`     | `dbt docs generate`                   | Recommended | Column names, types, descriptions             |
| `run_results.json` | `dbt run`, `dbt test`, `dbt build`    | Optional    | Test pass/fail results, timing                |

**Generate all artifacts:**

```bash theme={null}
dbt run           # Generates manifest.json
dbt test          # Updates run_results.json
dbt docs generate # Generates catalog.json
```

### 2.2 Complete Cloud Composer DAG

This is a **complete, working DAG** for Cloud Composer or GKE-based Airflow.

**Save as `dbt_with_gcs.py` in your Cloud Composer DAGs folder:**

```python theme={null}
"""
dbt + OpenMetadata Integration DAG (GCS Method)

This DAG:
1. Runs dbt models
2. Runs dbt tests
3. Generates dbt documentation (catalog.json)
4. Uploads all artifacts to Google Cloud Storage

Perfect for Cloud Composer or GKE deployments.
"""

import os
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
from airflow.utils.task_group import TaskGroup


# =============================================================================
# CONFIGURATION
# =============================================================================

# dbt Configuration
DBT_PROJECT_DIR = os.getenv("DBT_PROJECT_DIR", "/home/airflow/gcs/dbt/my_project")
DBT_PROFILES_DIR = os.getenv("DBT_PROFILES_DIR", "/home/airflow/gcs/dbt")

# GCS Configuration
GCS_BUCKET = os.getenv("GCS_BUCKET", "your-company-dbt-artifacts")
GCS_PREFIX = os.getenv("GCS_PREFIX", "dbt")
GCP_PROJECT = os.getenv("GCP_PROJECT", "your-gcp-project")

# Service Account (if not using Workload Identity)
GOOGLE_APPLICATION_CREDENTIALS = os.getenv(
    "GOOGLE_APPLICATION_CREDENTIALS",
    "/home/airflow/gcs/dbt-sa-key.json"
)

# =============================================================================
# DAG DEFAULT ARGUMENTS
# =============================================================================

default_args = {
    "owner": "data-engineering",
    "depends_on_past": False,
    "email": ["data-team@yourcompany.com"],
    "email_on_failure": True,
    "email_on_retry": False,
    "retries": 2,
    "retry_delay": timedelta(minutes=5),
    "execution_timeout": timedelta(hours=2),
}

# =============================================================================
# PYTHON FUNCTIONS
# =============================================================================

def upload_artifacts_to_gcs(**context):
    """
    Upload dbt artifacts to Google Cloud Storage.

    Uses google-cloud-storage library (pre-installed in Cloud Composer).
    For self-hosted: pip install google-cloud-storage
    """
    from google.cloud import storage

    # Initialize GCS client
    if os.path.exists(GOOGLE_APPLICATION_CREDENTIALS):
        client = storage.Client.from_service_account_json(
            GOOGLE_APPLICATION_CREDENTIALS
        )
    else:
        # Use default credentials (Workload Identity or ADC)
        client = storage.Client(project=GCP_PROJECT)

    bucket = client.bucket(GCS_BUCKET)
    target_dir = os.path.join(DBT_PROJECT_DIR, "target")

    # Files to upload
    artifacts = [
        ("manifest.json", True),      # Required
        ("catalog.json", False),      # Optional but recommended
        ("run_results.json", False),  # Optional
        ("sources.json", False),      # Optional
    ]

    uploaded = []
    failed = []

    for filename, required in artifacts:
        local_path = os.path.join(target_dir, filename)
        gcs_path = f"{GCS_PREFIX}/{filename}"

        if os.path.exists(local_path):
            try:
                blob = bucket.blob(gcs_path)
                blob.upload_from_filename(local_path)
                uploaded.append(filename)
                print(f"✓ Uploaded {filename} to gs://{GCS_BUCKET}/{gcs_path}")
            except Exception as e:
                error_msg = f"✗ Failed to upload {filename}: {e}"
                print(error_msg)
                if required:
                    raise Exception(error_msg)
                failed.append(filename)
        else:
            if required:
                raise FileNotFoundError(
                    f"Required artifact not found: {local_path}\n"
                    f"Make sure 'dbt run' completed successfully."
                )
            else:
                print(f"⊘ Skipping {filename} (not found - optional)")

    # Log summary
    print(f"\n{'='*50}")
    print(f"Upload Summary:")
    print(f"  Uploaded: {', '.join(uploaded) or 'None'}")
    print(f"  Skipped:  {', '.join(failed) or 'None'}")
    print(f"  GCS Location: gs://{GCS_BUCKET}/{GCS_PREFIX}/")
    print(f"{'='*50}")

    return {"uploaded": uploaded, "bucket": GCS_BUCKET, "prefix": GCS_PREFIX}


# =============================================================================
# DAG DEFINITION
# =============================================================================

with DAG(
    dag_id="dbt_with_gcs",
    default_args=default_args,
    description="Run dbt models and sync metadata to OpenMetadata via GCS",
    schedule_interval="0 6 * * *",  # Daily at 6 AM UTC
    start_date=datetime(2024, 1, 1),
    catchup=False,
    max_active_runs=1,
    tags=["dbt", "collate", "gcs", "data-pipeline"],
) as dag:

    # Task Group: dbt Execution
    with TaskGroup(group_id="dbt_execution") as dbt_tasks:

        dbt_run = BashOperator(
            task_id="dbt_run",
            bash_command=f"""
                cd {DBT_PROJECT_DIR} && \
                dbt run --profiles-dir {DBT_PROFILES_DIR}
            """,
        )

        dbt_test = BashOperator(
            task_id="dbt_test",
            bash_command=f"""
                cd {DBT_PROJECT_DIR} && \
                dbt test --profiles-dir {DBT_PROFILES_DIR}
            """,
            trigger_rule="all_done",
        )

        dbt_docs = BashOperator(
            task_id="dbt_docs_generate",
            bash_command=f"""
                cd {DBT_PROJECT_DIR} && \
                dbt docs generate --profiles-dir {DBT_PROFILES_DIR}
            """,
        )

        dbt_run >> dbt_test >> dbt_docs

    # Upload to GCS
    upload_to_gcs = PythonOperator(
        task_id="upload_artifacts_to_gcs",
        python_callable=upload_artifacts_to_gcs,
        provide_context=True,
    )

    # DAG Dependencies
    dbt_tasks >> upload_to_gcs
```

### 2.3 Alternative: Simple gsutil Upload

For simpler setups, use `gsutil` directly in a BashOperator:

```python theme={null}
upload_with_gsutil = BashOperator(
    task_id="upload_to_gcs",
    bash_command=f"""
        cd {DBT_PROJECT_DIR}/target && \
        gsutil -m cp manifest.json catalog.json run_results.json \
            gs://{GCS_BUCKET}/{GCS_PREFIX}/ || true
    """,
)
```

### 2.4 Verify DAG Deployment

```bash theme={null}
# For Cloud Composer - upload DAG
gcloud composer environments storage dags import \
    --environment your-composer-env \
    --location us-central1 \
    --source dbt_with_gcs.py

# Check GCS after DAG completes
gsutil ls gs://your-company-dbt-artifacts/dbt/
```

**Expected output:**

```
gs://your-company-dbt-artifacts/dbt/manifest.json
gs://your-company-dbt-artifacts/dbt/catalog.json
gs://your-company-dbt-artifacts/dbt/run_results.json
```

## Step 3: Configure OpenMetadata

### Configuration

1. Go to **Settings → Services → Database Services**
2. Click on your database service (e.g., "production-bigquery")
3. Go to the **Ingestion** tab
4. Click **Add Ingestion**
5. Select **dbt** from the dropdown

**Configure dbt Source (GCS):**

| Field                        | Value                        | Notes                        |
| ---------------------------- | ---------------------------- | ---------------------------- |
| **dbt Configuration Source** | `GCS`                        | Select from dropdown         |
| **GCS Bucket Name**          | `your-company-dbt-artifacts` | Your bucket name             |
| **GCS Object Prefix**        | `dbt`                        | Folder path (no leading `/`) |

**GCP Credentials:**

Upload the OpenMetadata service account key JSON:

1. Click **Upload Credentials**
2. Select `~/collate-sa-key.json`
3. Or paste the JSON content directly

**Configure dbt Options:**

| Field                   | Recommended Value |
| ----------------------- | ----------------- |
| **Update Descriptions** | `Enabled`         |
| **Update Owners**       | `Enabled`         |
| **Include Tags**        | `Enabled`         |
| **Classification Name** | `dbtTags`         |

**Test & Deploy:**

1. Click **Test Connection**
2. If successful, click **Deploy**
3. Click **Run** to trigger immediately

## Verification

After running the full pipeline, verify:

| Check                   | How to Verify                             | Expected Result                    |
| ----------------------- | ----------------------------------------- | ---------------------------------- |
| **GCS artifacts exist** | `gsutil ls gs://bucket/dbt/`              | manifest.json, catalog.json listed |
| **Ingestion completed** | OpenMetadata UI → Service → Ingestion tab | Green status, no errors            |
| **Lineage appears**     | Click on a dbt model → Lineage tab        | Upstream/downstream connections    |
| **Descriptions synced** | Click on a table → Schema tab             | Column descriptions visible        |
| **Tags appear**         | Click on a table → Tags section           | dbt tags shown                     |

## Cloud Composer Specific Setup

### Upload Service Account Key to Composer

```bash theme={null}
export COMPOSER_ENV="your-composer-env"
export COMPOSER_LOCATION="us-central1"

# Get Composer bucket
COMPOSER_BUCKET=$(gcloud composer environments describe ${COMPOSER_ENV} \
    --location ${COMPOSER_LOCATION} \
    --format="get(config.dagGcsPrefix)" | sed 's|/dags||')

# Upload service account key
gsutil cp ~/dbt-sa-key.json ${COMPOSER_BUCKET}/dbt/dbt-sa-key.json
```

### Set Environment Variables

```bash theme={null}
gcloud composer environments update ${COMPOSER_ENV} \
    --location ${COMPOSER_LOCATION} \
    --update-env-variables \
        DBT_PROJECT_DIR="/home/airflow/gcs/dbt/my_project",\
        GCS_BUCKET="your-company-dbt-artifacts",\
        GCS_PREFIX="dbt",\
        GOOGLE_APPLICATION_CREDENTIALS="/home/airflow/gcs/dbt/dbt-sa-key.json"
```

## Troubleshooting

| Issue                   | Symptom                  | Cause                         | Solution                                               |
| ----------------------- | ------------------------ | ----------------------------- | ------------------------------------------------------ |
| **Access Denied**       | "403 Forbidden" error    | Insufficient permissions      | Verify service account has `storage.objectViewer` role |
| **Bucket Not Found**    | "404 Not Found"          | Bucket name incorrect         | Check bucket name matches actual bucket                |
| **Invalid Credentials** | "Authentication failed"  | Wrong service account key     | Verify JSON key is for correct project and SA          |
| **No objects found**    | Artifacts not appearing  | Wrong prefix or upload failed | Check `GCS_PREFIX` matches upload path                 |
| **Stale data**          | Old lineage/descriptions | Old artifacts in GCS          | Verify dbt DAG uploads fresh artifacts                 |

## Next Steps

* [Configure dbt Workflow](/v1.12.x/connectors/database/dbt/configure-dbt-workflow)
* [Auto Ingest dbt Core](/v1.12.x/connectors/database/dbt/auto-ingest-dbt-core)
* [dbt Troubleshooting](/v1.12.x/connectors/database/dbt/dbt-troubleshooting)

<Note>
  See other storage options: [S3](/v1.12.x/connectors/database/dbt/storage-s3-guide) | [Azure](/v1.12.x/connectors/database/dbt/storage-azure-guide) | [HTTP](/v1.12.x/connectors/database/dbt/storage-http-guide) | [Local](/v1.12.x/connectors/database/dbt/storage-local-guide) | [dbt Cloud](/v1.12.x/connectors/database/dbt/dbt-cloud-api-guide)
</Note>
