Databricks

An overview of the Databricks integration with Secoda

Getting Started with Databricks

There are three steps to get started using Databricks with Secoda:

Create an access token
Connect Databricks to Secoda
Whitelist Secoda IP Address

Create an access token

In your Databricks console go to the User Settings and generate a new access token. Save the value to be used to connect Databricks to Secoda in the second step.

To have query history and popularity you must provide admin privileges to the token.

Grant Secoda Access

For each warehouse you plan to connect to Secoda, the credentials must have Can monitor permissions (set via SQL Warehouses > [My Warehouse] > Permissions).

Can use can be selected but will not allow for any warehouse-level query history to be accessed. Can view does not provide sufficient permissions

For each catalog you want to connect to Secoda, the credentials must have the following permissions:

USE_CATALOG
USE_SCHEMA
BROWSE
SELECT

Connect Databricks to Secoda

Go to https://app.secoda.co/integrations/new and select the Databricks integration.

Enter in the following credentials:

Host: This is the URL of your Databricks workspace, i.e, dbc-dc31b5a2-597d.cloud.databricks.com
Databricks Workspace Id: The numerical id of your workspace, located in the url of your Databricks instance, after the "/?o=". https://<instance_id>.cloud.databricks.com/?o=<workspace_id>.\
Access Token: The access token you generated in the first step
Warehouse ID (Recommended) or Cluster ID: This is the resource what SQL queries will run on. For the optimal experience, use a Databricks serverless SQL warehouse.

To ingest table and column level lineage using Databricks Unity Catalog, a Warehouse ID must be specified.

After entering in the information into Secoda, click "Test Connection". After the connection is successful your can Submit and run the initial extraction.

Whitelist Secoda IP Address

If your Databricks instance is behind a firewall, you'll have to whitelist Secoda's IP address to allow for metadata extractions.

FAQs

What cloud providers are supported?

Databricks on the major cloud providers including AWS, GCP, and Azure are supported.

Last updated 9 days ago

Was this helpful?