Secoda Docs
Get Started
  • Getting Started with Secoda
    • Secoda as an Admin
      • Deployment options
      • Sign in options
      • Settings
      • Connect your data
        • Define Service Accounts
        • Choose which schemas to extract
      • Customize the workspace
      • Populate Questions with FAQs
      • Invite your teammates
        • Joining & Navigating between Multiple Workspaces
      • Onboard new users
        • Onboarding email templates
        • Onboarding Homepage template
        • Training session guide
      • User engagement and adoption
        • Tips & Tricks to share with new users
    • Secoda as an Editor
    • Secoda as a Viewer
      • Introduction guide
      • Requesting changes in Secoda
  • Best practices
    • Setting up your workspace
    • Integrating Secoda into existing workflows
    • Documentation best practices
    • Glossary best practices
    • Data governance
    • Data quality
    • Clean up your data
    • Tool migrations using Secoda
    • Slack <> Questions workflow
    • Defining resources workflow
    • Streamline data access: Private and public teams workflow
    • Exposing Secoda to external clients
  • Resource Management
    • Editing Properties
      • AI Description Editor
      • Bulk Editing
      • Propagation
      • Templates
    • Resource Sidesheet
    • Assigning Owners
    • Custom Properties
    • Tags
      • Custom Tags
      • PII Identifier
      • Verified Identifier
    • Import and Export Resources
    • Related Resources
  • User Management
    • Roles
    • Teams
    • Groups
  • Integrations
    • Integration Settings
    • Data Warehouses
      • BigQuery
        • BigQuery Metadata Extracted
      • Databricks
        • Databricks Metadata Extracted
      • Redshift
        • Redshift Metadata Extracted
      • Snowflake
        • Snowflake Metadata Extracted
        • Snowflake Costs
        • Snowflake Native App
      • Apache Hive
        • Apache Hive Metadata Extracted
      • Azure Synapse
        • Azure Synapse Metadata Extracted
      • MotherDuck
        • MotherDuck Metadata Extracted
      • ClickHouse
        • ClickHouse Metadata Extracted
    • Databases
      • Druid
        • Druid Metadata Extracted
      • MySQL
        • MySQL Metadata Extracted
      • Microsoft SQL Server
        • Page
        • Microsoft SQL Server Metadata Extracted
      • Oracle
        • Oracle Metadata Extracted
      • Salesforce
        • Salesforce Metadata Extracted
      • Postgres
        • Postgres Metadata Extracted
      • MongoDB
        • MongoDB Metadata Extracted
      • Azure Cosmos DB
        • Azure Cosmos DB Metadata Extracted
      • SingleStore
        • SingleStore Metadata Extracted
      • DynamoDB
        • DynamoDB Metadata Extracted
    • Data Visualization Tools
      • Amplitude
        • Amplitude Metadata Extracted
      • Looker
        • Looker Metadata Extracted
      • Looker Studio
        • Looker Studio Metadata Extracted
      • Metabase
        • Metabase Metadata Extracted
      • Mixpanel
        • Mixpanel Metadata Extracted
      • Mode
        • Mode Metadata Extracted
      • Power BI
        • Power BI Metadata Extracted
      • QuickSight
        • QuickSight Metadata Extracted
      • Retool
        • Retool Metadata Extracted
      • Redash
        • Redash Metadata Extracted
      • Sigma
        • Sigma Metadata Extracted
      • Tableau
        • Tableau Metadata Extracted
      • ThoughtSpot
        • ThoughtSpot Metadata Extracted
      • Cluvio
        • Cluvio Metadata Extracted
      • Hashboard
        • Hashboard Metadata Extracted
      • Lightdash
        • Lightdash Metadata Extracted
      • Preset
        • Preset Metadata Extracted
      • Superset
        • Superset Metadata Extracted
      • SQL Server Reporting Services
        • SQL Server Reporting Services Metadata Extracted
      • Hex
        • Hex Metadata Extracted
      • Omni
        • Omni Metadata Extracted
    • Data Pipeline Tools
      • Census
        • Census Metadata Extracted
      • Stitch
        • Stitch Metadata Extracted
      • Airflow
        • Airflow Metadata Extracted
      • Dagster
        • Dagster Metadata Extracted
      • Fivetran
        • Fivetran Metadata Extracted
      • Glue
        • Glue Metadata Extracted
      • Hightouch
        • Hightouch Metadata Extracted
      • Apache Kafka
        • Apache Kafka Metadata Extracted
      • Confluent Cloud
        • Confluent Cloud Metadata Extracted
      • Polytomic
        • Polytomic Metadata Extracted
      • Matillion
        • Matillion Metadata Extracted
      • Airbyte
        • Airbyte Extracted Metadata
      • Informatica
        • Informatica Metadata Extracted
      • Azure Data Factory
        • Azure Data Factory Metadata Extracted
    • Data Transformation Tools
      • dbt
        • dbt Cloud
          • dbt Cloud Metadata Extracted
        • dbt Core
          • dbt Core Metadata Extracted
      • Coalesce
        • Coalesce Metadata Extracted
    • Data Quality Tools
      • Cyera
      • Dataplex
        • Dataplex Metadata Extracted
      • Great Expectations
        • Great Expectations Metadata Extracted
      • Monte Carlo
        • Monte Carlo Metadata Extracted
    • Data Lakes
      • Google Cloud Storage
        • GCS Metadata Extracted
      • AWS S3
        • S3 Metadata Extracted
    • Query Engines
      • Trino
        • Trino Metadata Extracted
    • Custom Integrations
      • File Upload
        • CSV File Format
        • JSONL File Format
        • Maintain your Resources
      • Marketplace
        • Secoda SDK
        • Upload and Connect your Marketplace Integration
        • Publish the Integration
        • Example Integrations
      • Secoda Fields Explained
    • Security
      • Connecting via Reverse SSH Tunnel
      • Connecting via SSH Tunnel
      • Connecting via VPC Peering
      • Connecting via AWS Cross Account Role
      • Connecting via AWS PrivateLink
        • Snowflake via AWS PrivateLink
        • AWS Service via AWS PrivateLink
      • Recommendations to Improve SSH Tunnel Concurrency on SSH Bastion
    • Push Metadata to Source
  • Extensions
    • Chrome
    • Confluence
      • Confluence Metadata Extracted
      • Confluence best practices
    • Git
    • GitHub
    • Jira
      • Jira Metadata Extracted
    • Linear
    • Microsoft Teams
    • PagerDuty
    • Slack
      • Slack user guide
  • Features
    • Access Requests
    • Activity Log
    • Analytics
    • Announcements
    • Audit Log
    • Automations
      • Automations Use Cases
    • Archive
    • Bookmarks
    • Catalog
    • Collections
    • Column Profiling
    • Data Previews
    • Data Quality Score
    • Documents
      • Comments
      • Embeddings
    • Filters
    • Glossary
    • Homepage
    • Inbox
    • Lineage
      • Manual Lineage
    • Metrics
    • Monitors
      • Monitoring Use Cases
    • Notifications
    • Policies
    • Popularity
    • Publishing
    • Queries
      • Query Blocks
        • Chart Blocks
      • Extracted Queries
    • Questions
    • Search
    • Secoda AI
      • Secoda AI User Guide
      • Secoda AI Use Cases
      • Secoda AI Security FAQs
      • Prompts
    • Sharing
    • Views
  • Enterprise
    • SAML
      • Okta SAML
      • OneLogin SAML
      • Microsoft Azure AD SAML
      • Google SAML
      • SCIM
      • SAML Attributes
    • Self-Hosted
      • Additional Resources
        • Additional Environment Variables
          • PowerBI OAuth Application (on-premise)
          • Google OAuth Application (on-premise)
          • Github Application (on-premise)
          • OpenAI API Key Creation (on-premise)
          • AWS Bucket with Access Keys (on-premise)
        • TLS/SSL (Docker compose)
        • Automatic Updates (Docker compose)
        • Backups (Docker compose)
        • Outbound Connections
      • Self-Hosted Changelog
    • SIEM
      • Google Chronicle
  • API
    • Get Started
    • Authentication
    • Example Workflows
    • API Reference
      • Getting Started
      • Helpful Information
      • Audit Logs
      • Charts
      • Collections
      • Columns
      • Custom Properties
      • Dashboards
      • Databases
      • Documents
      • Events
      • Groups
      • Integrations
      • Lineage
      • Monitors
      • Resources
      • Schemas
      • Tables
      • Tags
      • Teams
      • Users
      • Questions
      • Queries
  • FAQ
  • Policies
    • Terms of Use
    • Secoda AI Terms
    • Master Subscription Agreement
    • Privacy Policy
    • Security Policy
    • Accessibility Statement
    • Data Processing Agreement
    • Subprocessors
    • Service Level Agreement
    • Bug Bounty Program
  • System Status
  • Changelog
Powered by GitBook
On this page
  • Getting Started with Glue
  • Set up Glue Access
  • Update your Lake Formation Permissions
  • Connect the AWS Glue integration to Secoda

Was this helpful?

  1. Integrations
  2. Data Pipeline Tools

Glue

An overview of the AWS Glue integration with Secoda

Last updated 2 months ago

Was this helpful?

Getting Started with Glue

The AWS Glue integration will pull metadata from your AWS Glue Catalog and the associated lineage information from the data sources for Glue. To connect AWS Glue to Secoda you'll need to create an IAM user that has permission to get Glue objects. Follow the instructions below to set up that user.

The following steps are taken to connect AWS Glue to Secoda

  1. Set up Glue Access

    a. Option 1: Create a Secoda Glue User b. Option 2: Create a Secoda Glue Role

  2. Update the Lake Formation permissions

  3. Connect the AWS Glue integration to Secoda

Set up Glue Access

Option 1: Create a Secoda Glue User

Log in to your and then to the

Create a new IAM user by clicking "Add users"

Select "Access key - Programmatic access" under the "Select AWS access type" section and click "Next"

In the permissions section, select "Attach existing policies directly" and then click "Create policy".

Select the "JSON" option and paste in the following policy. Make sure to replace <aws_region> and <aws_account_id> with the proper values. Then create the policy and return to the previous page for the IAM user creation.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "glue:GetDataflowGraph",
                "glue:GetJobs",
                "glue:GetJobRuns",
                "glue:GetTable",
                "glue:GetDatabases",
                "glue:GetTables",
                "glue:SearchTables",
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:AbortMultipartUpload",
                "s3:PutObject",
                "s3:ListMultipartUploadParts",
                "athena:GetWorkGroup",
                "athena:StartQueryExecution",
                "athena:StopQueryExecution",
                "athena:GetQueryExecution",
                "athena:GetQueryResults",
                "athena:ListWorkGroups",
                "athena:ListDataCatalogs",
                "athena:ListQueryExecutions",
                "s3:ListMultipartUploadParts",
                "sts:AssumeRole"
            ],
            "Resource": [
                "arn:aws:athena:*:<athena_id>:workgroup/primary",
                "arn:aws:s3:::<athena_output_bucket>",
                "arn:aws:s3:::<athena_output_bucket>/*",
                "arn:aws:glue:<region>:<glue_id>:*",
                "arn:aws:s3:::<glue_bucket>",
                "arn:aws:s3:::<glue_bucket>/*"
            ]
        }
    ]
}

Refresh the policy list and search for your newly created policy. Select that policy and then create the user.

Option 2: Create a Secoda Glue Role

Prior to creating the AWS IAM role, go to the Secoda application > Integrations > Create Integration > Glue. Copy the "External ID" that's prefilled in the integration form. This will be used when creating your IAM role.

Select the JSON tab and enter in the following policy.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "glue:GetDataflowGraph",
                "glue:GetJobs",
                "glue:GetTable",
                "glue:GetDatabases",
                "glue:SearchTables",
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:AbortMultipartUpload",
                "s3:PutObject",
                "s3:ListMultipartUploadParts",
                "athena:GetWorkGroup",
                "athena:StartQueryExecution",
                "athena:StopQueryExecution",
                "athena:GetQueryExecution",
                "athena:GetQueryResults",
                "athena:ListWorkGroups",
                "athena:ListDataCatalogs",
                "athena:ListQueryExecutions",
                "s3:ListMultipartUploadParts",
                "sts:AssumeRole"
            ],
            "Resource": [
                "arn:aws:athena:*:<athena_id>:workgroup/primary",
                "arn:aws:s3:::<athena_output_bucket>",
                "arn:aws:s3:::<athena_output_bucket>/*",
                "arn:aws:glue:<region>:<glue_id>:*",
                "arn:aws:s3:::<glue_bucket>",
                "arn:aws:s3:::<glue_bucket>/*"
            ]
        }
    ]
}

Name the policy secoda-glue-policy or a name of your choice.

Next, go to the Roles section and create a new IAM role by clicking "Create role".

Select the "AWS account" option, then select "Another AWS account", enter the account number 482836992928 , click "Require external ID" and copy in the value from the first step. Then click "Next".

Add the secoda-glue-policy to the role, and then copy the Role ARN.

Update your Lake Formation Permissions

Go to AWS Lake Formation > Permissions > Data Lake Permissions and ensure that the Secoda Glue user has SELECT permission on all the necessary tables.

Connect the AWS Glue integration to Secoda

Return to https://app.secoda.co/integrations and select the AWS Glue integration. If using a AWS IAM User, copy the Access Key ID and Secret Access Key that is generated for the user. If using the Role, input the role ARN. Next input your region, access key ID and secret access key and click "Test Connection". After the connection is established click "Run initial extraction" to begin the process of syncing your Glue Data Catalog.

Log in to your and then to the then go to the Policies section and click "Create policy".

Glue Metadata Extracted
AWS console
IAM management console
AWS console
IAM management console