> ## Documentation Index
> Fetch the complete documentation index at: https://docs.open-metadata.org/llms.txt
> Use this file to discover all available pages before exploring further.

# dbt Artifact Storage - Azure Blob Storage Configuration | OpenMetadata

> Complete guide to configuring Azure Blob Storage for dbt artifact storage with OpenMetadata. Includes managed identity, DAG, and setup instructions.

# dbt Artifact Storage: Azure Blob Storage Configuration

This guide walks you through configuring **Azure Blob Storage** as the artifact storage layer for dbt Core + OpenMetadata integration. Perfect for Microsoft Azure deployments.

## Prerequisites Checklist

| Requirement          | Details                                     | How to Verify                    |
| -------------------- | ------------------------------------------- | -------------------------------- |
| **Azure Account**    | With permissions to create Storage Accounts | `az account show`                |
| **Azure CLI**        | Installed and configured                    | `az --version`                   |
| **dbt Project**      | Existing dbt project                        | `dbt debug`                      |
| **Orchestration**    | Airflow or ADF                              | Access to pipeline configuration |
| **Database Service** | Data warehouse already ingested             | Check Settings → Services        |

## Step 1: Azure Blob Storage Setup

### 1.1 Create Storage Account and Container

```bash theme={null}
# Set your variables
export RESOURCE_GROUP="dbt-metadata-rg"
export LOCATION="eastus"
export STORAGE_ACCOUNT="dbtartifacts${RANDOM}"  # Must be globally unique
export CONTAINER_NAME="dbt-artifacts"

# Login to Azure
az login

# Create resource group
az group create \
    --name ${RESOURCE_GROUP} \
    --location ${LOCATION}

# Create storage account
az storage account create \
    --name ${STORAGE_ACCOUNT} \
    --resource-group ${RESOURCE_GROUP} \
    --location ${LOCATION} \
    --sku Standard_LRS \
    --kind StorageV2

# Verify creation
az storage account show \
    --name ${STORAGE_ACCOUNT} \
    --resource-group ${RESOURCE_GROUP} \
    --query "name" -o tsv
```

**Expected output:**

```
dbtartifacts12345
```

### 1.2 Create Blob Container

```bash theme={null}
# Get storage account key
export STORAGE_KEY=$(az storage account keys list \
    --resource-group ${RESOURCE_GROUP} \
    --account-name ${STORAGE_ACCOUNT} \
    --query '[0].value' -o tsv)

# Create container
az storage container create \
    --name ${CONTAINER_NAME} \
    --account-name ${STORAGE_ACCOUNT} \
    --account-key ${STORAGE_KEY}

# Verify container
az storage container show \
    --name ${CONTAINER_NAME} \
    --account-name ${STORAGE_ACCOUNT} \
    --account-key ${STORAGE_KEY}
```

### 1.3 Configure Access (Choose One Option)

**Option A: Using Storage Account Key** (Simplest)

```bash theme={null}
# Save the storage key (provides full access)
echo "Storage Account: ${STORAGE_ACCOUNT}"
echo "Storage Key: ${STORAGE_KEY}"

# Or get connection string
az storage account show-connection-string \
    --name ${STORAGE_ACCOUNT} \
    --resource-group ${RESOURCE_GROUP} \
    --query connectionString -o tsv
```

**Option B: Using SAS Token** (Read-only for OpenMetadata)

```bash theme={null}
# Create SAS token with read permissions (valid for 1 year)
export END_DATE=$(date -u -d "1 year" '+%Y-%m-%dT%H:%MZ')

az storage container generate-sas \
    --name ${CONTAINER_NAME} \
    --account-name ${STORAGE_ACCOUNT} \
    --account-key ${STORAGE_KEY} \
    --permissions rl \
    --expiry ${END_DATE} \
    --https-only \
    -o tsv
```

**Option C: Using Managed Identity** (Recommended for AKS)

```bash theme={null}
# Enable managed identity on AKS
az aks update \
    --resource-group ${RESOURCE_GROUP} \
    --name your-aks-cluster \
    --enable-managed-identity

# Get the managed identity
export PRINCIPAL_ID=$(az aks show \
    --resource-group ${RESOURCE_GROUP} \
    --name your-aks-cluster \
    --query identityProfile.kubeletidentity.clientId -o tsv)

# Assign Storage Blob Data Contributor role (write access for dbt)
az role assignment create \
    --role "Storage Blob Data Contributor" \
    --assignee ${PRINCIPAL_ID} \
    --scope "/subscriptions/$(az account show --query id -o tsv)/resourceGroups/${RESOURCE_GROUP}/providers/Microsoft.Storage/storageAccounts/${STORAGE_ACCOUNT}"
```

### 1.4 Verify Blob Storage Access

```bash theme={null}
# Create test file
echo "test" > /tmp/test.txt

# Upload
az storage blob upload \
    --container-name ${CONTAINER_NAME} \
    --name test.txt \
    --file /tmp/test.txt \
    --account-name ${STORAGE_ACCOUNT} \
    --account-key ${STORAGE_KEY}

# List blobs
az storage blob list \
    --container-name ${CONTAINER_NAME} \
    --account-name ${STORAGE_ACCOUNT} \
    --account-key ${STORAGE_KEY} \
    --output table

# Clean up
az storage blob delete \
    --container-name ${CONTAINER_NAME} \
    --name test.txt \
    --account-name ${STORAGE_ACCOUNT} \
    --account-key ${STORAGE_KEY}
rm /tmp/test.txt
```

## Step 2: Upload Artifacts from dbt

### 2.1 Understanding dbt Artifacts

OpenMetadata requires these dbt-generated files:

| File               | Generated By                          | Required?   | What It Contains                              |
| ------------------ | ------------------------------------- | ----------- | --------------------------------------------- |
| `manifest.json`    | `dbt run`, `dbt compile`, `dbt build` | **YES**     | Models, sources, lineage, descriptions, tests |
| `catalog.json`     | `dbt docs generate`                   | Recommended | Column names, types, descriptions             |
| `run_results.json` | `dbt run`, `dbt test`, `dbt build`    | Optional    | Test pass/fail results, timing                |

**Generate all artifacts:**

```bash theme={null}
dbt run           # Generates manifest.json
dbt test          # Updates run_results.json
dbt docs generate # Generates catalog.json
```

### 2.2 Complete Airflow DAG

This is a **complete, working DAG** for Azure deployments.

**Save as `dbt_with_azure.py` in your Airflow DAGs folder:**

```python theme={null}
"""
dbt + OpenMetadata Integration DAG (Azure Blob Method)

This DAG:
1. Runs dbt models
2. Runs dbt tests
3. Generates dbt documentation (catalog.json)
4. Uploads all artifacts to Azure Blob Storage

Perfect for AKS, Azure VMs, or Container Instances.
"""

import os
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
from airflow.utils.task_group import TaskGroup


# =============================================================================
# CONFIGURATION
# =============================================================================

# dbt Configuration
DBT_PROJECT_DIR = os.getenv("DBT_PROJECT_DIR", "/opt/airflow/dbt/my_project")
DBT_PROFILES_DIR = os.getenv("DBT_PROFILES_DIR", "/opt/airflow/dbt")

# Azure Blob Storage Configuration
AZURE_STORAGE_ACCOUNT = os.getenv("AZURE_STORAGE_ACCOUNT", "dbtartifacts12345")
AZURE_CONTAINER_NAME = os.getenv("AZURE_CONTAINER_NAME", "dbt-artifacts")
AZURE_STORAGE_KEY = os.getenv("AZURE_STORAGE_KEY", "")
AZURE_CONNECTION_STRING = os.getenv("AZURE_STORAGE_CONNECTION_STRING", "")

# =============================================================================
# DAG DEFAULT ARGUMENTS
# =============================================================================

default_args = {
    "owner": "data-engineering",
    "depends_on_past": False,
    "email": ["data-team@yourcompany.com"],
    "email_on_failure": True,
    "email_on_retry": False,
    "retries": 2,
    "retry_delay": timedelta(minutes=5),
    "execution_timeout": timedelta(hours=2),
}

# =============================================================================
# PYTHON FUNCTIONS
# =============================================================================

def upload_artifacts_to_azure(**context):
    """
    Upload dbt artifacts to Azure Blob Storage.

    Uses azure-storage-blob library.
    Install with: pip install azure-storage-blob
    """
    from azure.storage.blob import BlobServiceClient

    target_dir = os.path.join(DBT_PROJECT_DIR, "target")

    # Initialize Azure Blob Service Client
    if AZURE_CONNECTION_STRING:
        blob_service_client = BlobServiceClient.from_connection_string(
            AZURE_CONNECTION_STRING
        )
    else:
        account_url = f"https://{AZURE_STORAGE_ACCOUNT}.blob.core.windows.net"
        blob_service_client = BlobServiceClient(
            account_url=account_url,
            credential=AZURE_STORAGE_KEY
        )

    container_client = blob_service_client.get_container_client(AZURE_CONTAINER_NAME)

    # Files to upload
    artifacts = [
        ("manifest.json", True),      # Required
        ("catalog.json", False),      # Optional but recommended
        ("run_results.json", False),  # Optional
        ("sources.json", False),      # Optional
    ]

    uploaded = []
    failed = []

    for filename, required in artifacts:
        local_path = os.path.join(target_dir, filename)

        if os.path.exists(local_path):
            try:
                blob_client = container_client.get_blob_client(filename)
                with open(local_path, "rb") as data:
                    blob_client.upload_blob(data, overwrite=True)

                uploaded.append(filename)
                print(f"✓ Uploaded {filename} to Azure Blob Storage")
            except Exception as e:
                error_msg = f"✗ Failed to upload {filename}: {e}"
                print(error_msg)
                if required:
                    raise Exception(error_msg)
                failed.append(filename)
        else:
            if required:
                raise FileNotFoundError(
                    f"Required artifact not found: {local_path}\n"
                    f"Make sure 'dbt run' completed successfully."
                )
            else:
                print(f"⊘ Skipping {filename} (not found - optional)")

    # Log summary
    print(f"\n{'='*50}")
    print(f"Upload Summary:")
    print(f"  Uploaded: {', '.join(uploaded) or 'None'}")
    print(f"  Skipped:  {', '.join(failed) or 'None'}")
    print(f"  Azure Location: {AZURE_STORAGE_ACCOUNT}/{AZURE_CONTAINER_NAME}/")
    print(f"{'='*50}")

    return {
        "uploaded": uploaded,
        "storage_account": AZURE_STORAGE_ACCOUNT,
        "container": AZURE_CONTAINER_NAME
    }


# =============================================================================
# DAG DEFINITION
# =============================================================================

with DAG(
    dag_id="dbt_with_azure",
    default_args=default_args,
    description="Run dbt models and sync metadata to OpenMetadata via Azure Blob",
    schedule_interval="0 6 * * *",  # Daily at 6 AM UTC
    start_date=datetime(2024, 1, 1),
    catchup=False,
    max_active_runs=1,
    tags=["dbt", "collate", "azure", "data-pipeline"],
) as dag:

    # Task Group: dbt Execution
    with TaskGroup(group_id="dbt_execution") as dbt_tasks:

        dbt_run = BashOperator(
            task_id="dbt_run",
            bash_command=f"""
                cd {DBT_PROJECT_DIR} && \
                dbt run --profiles-dir {DBT_PROFILES_DIR}
            """,
        )

        dbt_test = BashOperator(
            task_id="dbt_test",
            bash_command=f"""
                cd {DBT_PROJECT_DIR} && \
                dbt test --profiles-dir {DBT_PROFILES_DIR}
            """,
            trigger_rule="all_done",
        )

        dbt_docs = BashOperator(
            task_id="dbt_docs_generate",
            bash_command=f"""
                cd {DBT_PROJECT_DIR} && \
                dbt docs generate --profiles-dir {DBT_PROFILES_DIR}
            """,
        )

        dbt_run >> dbt_test >> dbt_docs

    # Upload to Azure Blob
    upload_to_azure = PythonOperator(
        task_id="upload_artifacts_to_azure",
        python_callable=upload_artifacts_to_azure,
        provide_context=True,
    )

    # DAG Dependencies
    dbt_tasks >> upload_to_azure
```

### 2.3 Alternative: Azure CLI Upload

For simpler setups, use Azure CLI directly:

```python theme={null}
upload_with_az_cli = BashOperator(
    task_id="upload_to_azure",
    bash_command=f"""
        cd {DBT_PROJECT_DIR}/target && \
        az storage blob upload-batch \
            --account-name {AZURE_STORAGE_ACCOUNT} \
            --destination {AZURE_CONTAINER_NAME} \
            --source . \
            --pattern "*.json" \
            --overwrite || true
    """,
)
```

## Step 3: Configure OpenMetadata

### Configuration

1. Go to **Settings → Services → Database Services**
2. Click on your database service (e.g., "production-synapse")
3. Go to the **Ingestion** tab
4. Click **Add Ingestion**
5. Select **dbt** from the dropdown

**Configure dbt Source (Azure):**

| Field                        | Value               | Notes                         |
| ---------------------------- | ------------------- | ----------------------------- |
| **dbt Configuration Source** | `Azure`             | Select from dropdown          |
| **Azure Account Name**       | `dbtartifacts12345` | Your storage account name     |
| **Azure Container Name**     | `dbt-artifacts`     | Your container name           |
| **Azure Blob Prefix**        | \`\`                | Leave empty or specify folder |

**Azure Credentials (choose one):**

**Option A: Using Account Key**

| Field                 | Value       |                     |
| --------------------- | ----------- | ------------------- |
| **Azure Account Key** | `abc123...` | Storage account key |

**Option B: Using Connection String**

| Field                       | Value                                            |                        |
| --------------------------- | ------------------------------------------------ | ---------------------- |
| **Azure Connection String** | `DefaultEndpointsProtocol=https;AccountName=...` | Full connection string |

**Configure dbt Options:**

| Field                   | Recommended Value |
| ----------------------- | ----------------- |
| **Update Descriptions** | `Enabled`         |
| **Update Owners**       | `Enabled`         |
| **Include Tags**        | `Enabled`         |
| **Classification Name** | `dbtTags`         |

**Test & Deploy:**

1. Click **Test Connection**
2. If successful, click **Deploy**
3. Click **Run** to trigger immediately

## Verification

After running the full pipeline, verify:

| Check                   | How to Verify                             | Expected Result                    |
| ----------------------- | ----------------------------------------- | ---------------------------------- |
| **Azure blobs exist**   | `az storage blob list --container-name X` | manifest.json, catalog.json listed |
| **Ingestion completed** | OpenMetadata UI → Service → Ingestion tab | Green status, no errors            |
| **Lineage appears**     | Click on a dbt model → Lineage tab        | Upstream/downstream connections    |
| **Descriptions synced** | Click on a table → Schema tab             | Column descriptions visible        |
| **Tags appear**         | Click on a table → Tags section           | dbt tags shown                     |

## Troubleshooting

| Issue                   | Symptom                  | Cause                       | Solution                                            |
| ----------------------- | ------------------------ | --------------------------- | --------------------------------------------------- |
| **Access Denied**       | "403 Forbidden" error    | Insufficient permissions    | Verify storage account key or SAS token is correct  |
| **Container Not Found** | "404 Not Found"          | Container name incorrect    | Check container name matches actual container       |
| **Invalid Credentials** | "Authentication failed"  | Wrong credentials           | Verify account key, connection string, or SAS token |
| **No blobs found**      | Artifacts not appearing  | Wrong upload path or failed | Check container and verify upload succeeded         |
| **Stale data**          | Old lineage/descriptions | Old artifacts in blob       | Verify dbt DAG uploads fresh artifacts              |

## Next Steps

* [Configure dbt Workflow](/v1.12.x/connectors/database/dbt/configure-dbt-workflow)
* [Auto Ingest dbt Core](/v1.12.x/connectors/database/dbt/auto-ingest-dbt-core)
* [dbt Troubleshooting](/v1.12.x/connectors/database/dbt/dbt-troubleshooting)

<Note>
  See other storage options: [S3](/v1.12.x/connectors/database/dbt/storage-s3-guide) | [GCS](/v1.12.x/connectors/database/dbt/storage-gcs-guide) | [HTTP](/v1.12.x/connectors/database/dbt/storage-http-guide) | [Local](/v1.12.x/connectors/database/dbt/storage-local-guide) | [dbt Cloud](/v1.12.x/connectors/database/dbt/dbt-cloud-api-guide)
</Note>
