Secoda Docs
Get Started
  • Getting Started with Secoda
    • Secoda as an Admin
      • Deployment options
      • Sign in options
      • Settings
      • Connect your data
        • Define Service Accounts
        • Choose which schemas to extract
      • Customize the workspace
      • Populate Questions with FAQs
      • Invite your teammates
        • Joining & Navigating between Multiple Workspaces
      • Onboard new users
        • Onboarding email templates
        • Onboarding Homepage template
        • Training session guide
      • User engagement and adoption
        • Tips & Tricks to share with new users
    • Secoda as an Editor
    • Secoda as a Viewer
      • Introduction guide
      • Requesting changes in Secoda
  • Best practices
    • Setting up your workspace
    • Integrating Secoda into existing workflows
    • Documentation best practices
    • Glossary best practices
    • Data governance
    • Data quality
    • Clean up your data
    • Tool migrations using Secoda
    • Slack <> Questions workflow
    • Defining resources workflow
    • Streamline data access: Private and public teams workflow
    • Exposing Secoda to external clients
  • Resource Management
    • Editing Properties
      • AI Description Editor
      • Bulk Editing
      • Propagation
      • Templates
    • Resource Sidesheet
    • Assigning Owners
    • Custom Properties
    • Tags
      • Custom Tags
      • PII Identifier
      • Verified Identifier
    • Import and Export Resources
    • Related Resources
  • User Management
    • Roles
    • Teams
    • Groups
  • Integrations
    • Integration Settings
    • Data Warehouses
      • BigQuery
        • BigQuery Metadata Extracted
      • Databricks
        • Databricks Metadata Extracted
      • Redshift
        • Redshift Metadata Extracted
      • Snowflake
        • Snowflake Metadata Extracted
        • Snowflake Costs
        • Snowflake Native App
      • Apache Hive
        • Apache Hive Metadata Extracted
      • Azure Synapse
        • Azure Synapse Metadata Extracted
      • MotherDuck
        • MotherDuck Metadata Extracted
      • ClickHouse
        • ClickHouse Metadata Extracted
    • Databases
      • Druid
        • Druid Metadata Extracted
      • MySQL
        • MySQL Metadata Extracted
      • Microsoft SQL Server
        • Page
        • Microsoft SQL Server Metadata Extracted
      • Oracle
        • Oracle Metadata Extracted
      • Salesforce
        • Salesforce Metadata Extracted
      • Postgres
        • Postgres Metadata Extracted
      • MongoDB
        • MongoDB Metadata Extracted
      • Azure Cosmos DB
        • Azure Cosmos DB Metadata Extracted
      • SingleStore
        • SingleStore Metadata Extracted
      • DynamoDB
        • DynamoDB Metadata Extracted
    • Data Visualization Tools
      • Amplitude
        • Amplitude Metadata Extracted
      • Looker
        • Looker Metadata Extracted
      • Looker Studio
        • Looker Studio Metadata Extracted
      • Metabase
        • Metabase Metadata Extracted
      • Mixpanel
        • Mixpanel Metadata Extracted
      • Mode
        • Mode Metadata Extracted
      • Power BI
        • Power BI Metadata Extracted
      • QuickSight
        • QuickSight Metadata Extracted
      • Retool
        • Retool Metadata Extracted
      • Redash
        • Redash Metadata Extracted
      • Sigma
        • Sigma Metadata Extracted
      • Tableau
        • Tableau Metadata Extracted
      • ThoughtSpot
        • ThoughtSpot Metadata Extracted
      • Cluvio
        • Cluvio Metadata Extracted
      • Hashboard
        • Hashboard Metadata Extracted
      • Lightdash
        • Lightdash Metadata Extracted
      • Preset
        • Preset Metadata Extracted
      • Superset
        • Superset Metadata Extracted
      • SQL Server Reporting Services
        • SQL Server Reporting Services Metadata Extracted
      • Hex
        • Hex Metadata Extracted
      • Omni
        • Omni Metadata Extracted
    • Data Pipeline Tools
      • Census
        • Census Metadata Extracted
      • Stitch
        • Stitch Metadata Extracted
      • Airflow
        • Airflow Metadata Extracted
      • Dagster
        • Dagster Metadata Extracted
      • Fivetran
        • Fivetran Metadata Extracted
      • Glue
        • Glue Metadata Extracted
      • Hightouch
        • Hightouch Metadata Extracted
      • Apache Kafka
        • Apache Kafka Metadata Extracted
      • Confluent Cloud
        • Confluent Cloud Metadata Extracted
      • Polytomic
        • Polytomic Metadata Extracted
      • Matillion
        • Matillion Metadata Extracted
      • Airbyte
        • Airbyte Extracted Metadata
      • Informatica
        • Informatica Metadata Extracted
      • Azure Data Factory
        • Azure Data Factory Metadata Extracted
    • Data Transformation Tools
      • dbt
        • dbt Cloud
          • dbt Cloud Metadata Extracted
        • dbt Core
          • dbt Core Metadata Extracted
      • Coalesce
        • Coalesce Metadata Extracted
    • Data Quality Tools
      • Cyera
      • Dataplex
        • Dataplex Metadata Extracted
      • Great Expectations
        • Great Expectations Metadata Extracted
      • Monte Carlo
        • Monte Carlo Metadata Extracted
      • Soda
        • Soda Metadata Extracted
    • Data Lakes
      • Google Cloud Storage
        • GCS Metadata Extracted
      • AWS S3
        • S3 Metadata Extracted
    • Query Engines
      • Trino
        • Trino Metadata Extracted
    • Custom Integrations
      • File Upload
        • CSV File Format
        • JSONL File Format
        • Maintain your Resources
      • Marketplace
        • Secoda SDK
        • Upload and Connect your Marketplace Integration
        • Publish the Integration
        • Example Integrations
      • Secoda Fields Explained
    • Security
      • Connecting via Reverse SSH Tunnel
      • Connecting via SSH Tunnel
      • Connecting via VPC Peering
      • Connecting via AWS Cross Account Role
      • Connecting via AWS PrivateLink
        • Snowflake via AWS PrivateLink
        • AWS Service via AWS PrivateLink
      • Recommendations to Improve SSH Tunnel Concurrency on SSH Bastion
    • Push Metadata to Source
  • Extensions
    • Chrome
    • Confluence
      • Confluence Metadata Extracted
      • Confluence best practices
    • Git
    • GitHub
    • Jira
      • Jira Metadata Extracted
    • Linear
    • Microsoft Teams
    • PagerDuty
    • Slack
      • Slack user guide
  • Features
    • Access Requests
    • Activity Log
    • Analytics
    • Announcements
    • Audit Log
    • Automations
      • Automations Use Cases
    • Archive
    • Bookmarks
    • Catalog
    • Collections
    • Column Profiling
    • Data Previews
    • Data Quality Score
    • Documents
      • Comments
      • Embeddings
    • Filters
    • Glossary
    • Homepage
    • Inbox
    • Lineage
      • Manual Lineage
    • Metrics
    • Monitors
      • Monitoring Use Cases
    • Notifications
    • Policies
    • Popularity
    • Publishing
    • Queries
      • Query Blocks
        • Chart Blocks
      • Extracted Queries
    • Questions
    • Search
    • Secoda AI
      • Secoda AI User Guide
      • Secoda AI Use Cases
      • Secoda AI Security FAQs
      • Secoda MCP Server
    • Sharing
    • Views
  • Enterprise
    • SAML
      • Okta SAML
      • OneLogin SAML
      • Microsoft Azure AD SAML
      • Google SAML
      • SCIM
      • SAML Attributes
    • Self-Hosted
      • Additional Resources
        • Additional Environment Variables
          • PowerBI OAuth Application (on-premise)
          • Google OAuth Application (on-premise)
          • Github Application (on-premise)
          • OpenAI API Key Creation (on-premise)
          • AWS Bucket with Access Keys (on-premise)
        • TLS/SSL (Docker compose)
        • Automatic Updates (Docker compose)
        • Backups (Docker compose)
        • Outbound Connections
      • Self-Hosted Changelog
    • SIEM
      • Google Chronicle
  • API
    • Get Started
    • Authentication
    • Example Workflows
    • API Reference
      • Getting Started
      • Helpful Information
      • Audit Logs
      • Charts
      • Collections
      • Columns
      • Custom Properties
      • Dashboards
      • Databases
      • Documents
      • Events
      • Groups
      • Integrations
      • Lineage
      • Monitors
      • Resources
      • Schemas
      • Tables
      • Tags
      • Teams
      • Users
      • Questions
      • Queries
      • Getting Started
      • Helpful Information
      • Audit Logs
      • Charts
      • Collections
      • Columns
      • Custom Properties
      • Dashboards
      • Databases
      • Documents
      • Events
      • Groups
      • Integrations
      • Lineage
      • Monitors
      • Resources
      • Schemas
      • Tables
      • Tags
      • Teams
      • Users
      • Questions
      • Queries
  • FAQ
  • Policies
    • Terms of Use
    • Secoda AI Terms
    • Master Subscription Agreement
    • Privacy Policy
    • Security Policy
    • Accessibility Statement
    • Data Processing Agreement
    • Subprocessors
    • Service Level Agreement
    • Bug Bounty Program
  • System Status
  • Changelog
Powered by GitBook
On this page
  • Getting started with dbt Core
  • Option 1 – Storage bucket (container)
  • 1a. Connect an AWS S3 bucket
  • 1b. Connect a GCS S3-compatible bucket
  • 1c. Connect a Azure Blob Storage container
  • Option 2 – Upload a single manifest.json
  • Option 3 – Using the API
  • 3a. Two separate calls (One for Manifest, One for Run Results)
  • 3b. One call to upload both Manifest and Run Results

Was this helpful?

  1. Integrations
  2. Data Transformation Tools
  3. dbt

dbt Core

An overview of the dbt Core integration with Secoda

Last updated 1 month ago

Was this helpful?

Getting started with dbt Core

dbt is a secondary integration that adds additional metadata on to your data warehouse or relational database tables. Before connecting dbt make sure to connect a data warehouse or relational database first. These include Snowflake, BigQuery, Postgres, Redshift, etc.

There are several options to connect dbt core with Secoda:

  1. (Recommended) Connect an AWS, GCP, or Azure storage bucket/container

  2. Upload a manifest.json and run_results.json through the UI

  3. Upload a manifest.json and run_results.json through the Secoda API

Option 1 – Storage bucket (container)

This option is recommended to ensure that Secoda always has the latest manifest.json and run_results.json files from dbt Core. Secoda will only sync these files from the bucket.

1a. Connect an AWS S3 bucket

You can connect to the AWS S3 bucket using an AWS IAM user, or AWS Roles.

AWS IAM User
  1. Create a new AWS IAM user and ensure that Access Key - Programatic access is checked. Once you create the user save the Access Key ID and Secret Access Key that are generated for the user.

  2. Attach the following policy to the user. Make sure to change <your-bucket-name>.

{
    "Statement": [
        {
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:ListBucket",
                "s3:GetObject",
                "s3:GetObjectAcl",
                "s3:DeleteObject"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::<your-bucket-name>",
                "arn:aws:s3:::<your-bucket-name>/*"
            ]
        }
    ],
    "Version": "2012-10-17"
}
  1. Connect your S3 bucket to Secoda

    • Navigate to and click dbt Core

    • Choose the Access Key tab and add the credentials from AWS (Region, Bucket Name, Access Key ID, Secret Access Key)

    • Test the Connection - if successful, you'll be prompted to run your initial sync

AWS Roles
  1. Create a new AWS IAM role. In the Select type of trusted entity page, click Another AWS account and add the following account ID: 482836992928.

  2. Click on Require External ID, and copy the randomly generated value from Secoda, in the dbt Core connection page.

  3. Attach the following policy to the role. Make sure to change <your-bucket-name>.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<your-bucket-name>",
                "arn:aws:s3:::<your-bucket-name/*"
            ]
        }
    ]
}
  1. Once the role is created, you'll receive an Amazon Resource Name (ARN) for the role.

  2. Connect your S3 bucket to Secoda

    • Choose the Role tab and add the credentials from AWS (Role ARN, Region, Bucket Name)

    • Test the Connection - if successful you'll be prompted to run your initial sync

1b. Connect a GCS S3-compatible bucket

  1. Login to GCP cloud console.

  2. Create a service account.

  3. Grant access to the service account from the Bucket page as “Storage Object Viewer”.

  4. Turn on interoperability on the bucket. Generate HMAC keys for a service account with read access to the bucket. Both located here:

  1. Setup CORS. GCP requires this be done over CLI. Like the following:

gsutil cors set cors.json gs://bucket-name

cors.json

[
  {
    "origin": ["*"],
    "method": ["GET", "POST", "PUT", "DELETE", "OPTIONS"],
    "responseHeader": ["Content-Type"],
    "maxAgeSeconds": 3600
  }
]
  1. Save the HMAC keys to be used in the connection form.

    • Access Key Id

    • Secret

    • Region bucket region for GCP

    • S3 Endpoint must be added and set to https://storage.googleapis.com

  2. Connect your S3 bucket to Secoda

    • Choose the Access Key tab and add the HMAC keys saved above to the relevant fields.

    • Test the Connection - if successful you'll be prompted to run your initial sync

1c. Connect a Azure Blob Storage container

  1. Go to portal.azure.com and then click Storage accounts.

  2. Copy the name of the desired storage account. Enter that in the integration form.

  3. Click on your storage account and under Security + networking select Access keys.

  4. Copy the Connection string and add to your integration form.

  5. Test the connection.

Option 2 – Upload a single manifest.json

The dbt manifest file contains complete information about how tables are transformed and how they are connected in terms of data lineage. It details the model to table relationships, providing a complete and accurate lineage view.

This is a one time sync with your manifest.json file. You can upload the file following these steps:

  1. Choose the File Upload tab and select your manifest.json and run_results.json files using the file select

  2. Test the Connection - if successful you'll be prompted to run your initial sync

Option 3 – Using the API

The API provides an endpoint to upload your manifest.json and run_results.json file. This is convenient if you run dbt with Airflow because you can upload the manifest.json at the end of a dbt run. Follow these instructions to upload your manifest.json via the API:

  1. Return to https://app.secoda.co/integrations and click on the dbt Core integration that was just created. Save the ID which is contained in the URL.

  2. Use the endpoints below to upload your files.

Endpoints ->

3a. Two separate calls (One for Manifest, One for Run Results)

Manifest.json: https://api.secoda.co/integration/dbt/manifest/

Run_results.json: https://api.secoda.co/integration/dbt/run_result/

Method -> POST

NOTE -> This will automatically trigger an extraction to run on the integration you created

Sample Request for Manifest file (Python) ->

import requests

headers = {
    "Authorization": "Bearer <Your Key>"
}
response = requests.post(
	"<https://api.secoda.co/integration/dbt/manifest/>",
	files={"manifest_file": open("manifest.json", "rb")},
	data={"integration": "Your Integration ID"},
	headers=headers
)
print(response.json())

Sample Request for Run Results file (Python) ->

import requests

headers = {
    "Authorization": "Bearer <Your Key>"
}
response = requests.post(
	"<https://api.secoda.co/integration/dbt/run_result/>",
	files={"run_result_file": open("run_results.json", "rb")},
	data={"integration": "Your Integration ID"},
	headers=headers
)
print(response.json())

3b. One call to upload both Manifest and Run Results

1. Get your dbt Core Integration ID

  • Get the integration ID from the integration page URL

    • For example, if the url is https://app.secoda.co/integrations/f7d68db5-9dbc-4880-b6cd-ec363c1f7d6b/syncs, the integration id would be f7d68db5-9dbc-4880-b6cd-ec363c1f7d6b

  • Or get the integration ID programmatically via a GET request to the /integration/integrations/ endpoint and parse the list for your dbt Core integration

2. Upload manifest.json and run_results.json

Endpoint (inserting the integration_id from Step 1): https://api.secoda.co/integration/dbt/{integration_id}/upload_artifacts/

Method -> POST

Expected Response -> 200

Sample Request for uploading your files (Python, note the TODOs) ->

import requests

# TODO: replace YOUR_INTEGRATION_ID with your integration_id
integration_id = "YOUR_INTEGRATION_ID"

url = f"https://api.secoda.co/integration/dbt/{integration_id}/upload_artifacts/"

# TODO: replace YOUR_PATH with your path to run_results and manifest
files={
  "run_results": open("YOUR_PATH/run_results.json", "rb"), 
  "manifest": open("YOUR_PATH/manifest.json", "rb")
}

# TODO: replace YOUR_API_KEY with your Secoda API key
headers = {
  "Authorization": "Bearer YOUR_API_KEY"
}

response = requests.request("POST", url, headers=headers, files=files)

print(response)

3. Trigger an Integration Sync

Endpoint (inserting the integration_id from Step 1): https://api.secoda.co/integration/dbt/{integration_id}/trigger/

Method -> POST

Expected Response -> 200

Sample Request for triggering a sync (Python, note the TODOs) ->

import requests

# TODO: replace YOUR_INTEGRATION_ID with your integration_id
integration_id = "YOUR_INTEGRATION_ID"

url = f"https://api.secoda.co/integration/dbt/{integration_id}/trigger/"

# TODO: replace YOUR_API_KEY with your Secoda API key
headers = {
  "Authorization": "Bearer YOUR_API_KEY"
}

response = requests.request("POST", url, headers=headers)

print(response)

4. After a sync has been triggered you can monitor your sync status in the UI

Navigate to and click dbt Core

Navigate to and click dbt Core

Navigate to and click dbt Core

Create a blank dbt core integration by going to and selecting the "dbt Core" integration and then click "Test Connection". And run the initial extraction. This extraction will fail, but that's intended.

dbt Core Metadata Extracted
https://app.secoda.co/integrations/new
https://app.secoda.co/integrations/new
https://app.secoda.co/integrations/new
https://app.secoda.co/integrations/new
https://app.secoda.co/integrations/new