Secoda Docs
Get Started
  • Getting Started with Secoda
    • Secoda as an Admin
      • Deployment options
      • Sign in options
      • Settings
      • Connect your data
        • Define Service Accounts
        • Choose which schemas to extract
      • Customize the workspace
      • Populate Questions with FAQs
      • Invite your teammates
        • Joining & Navigating between Multiple Workspaces
      • Onboard new users
        • Onboarding email templates
        • Onboarding Homepage template
        • Training session guide
      • User engagement and adoption
        • Tips & Tricks to share with new users
    • Secoda as an Editor
    • Secoda as a Viewer
      • Introduction guide
      • Requesting changes in Secoda
  • Best practices
    • Setting up your workspace
    • Integrating Secoda into existing workflows
    • Documentation best practices
    • Glossary best practices
    • Data governance
    • Data quality
    • Clean up your data
    • Tool migrations using Secoda
    • Slack <> Questions workflow
    • Defining resources workflow
    • Streamline data access: Private and public teams workflow
    • Exposing Secoda to external clients
  • Resource Management
    • Editing Properties
      • AI Description Editor
      • Bulk Editing
      • Propagation
      • Templates
    • Resource Sidesheet
    • Assigning Owners
    • Custom Properties
    • Tags
      • Custom Tags
      • PII Identifier
      • Verified Identifier
    • Import and Export Resources
    • Related Resources
  • User Management
    • Roles
    • Teams
    • Groups
  • Integrations
    • Integration Settings
    • Data Warehouses
      • BigQuery
        • BigQuery Metadata Extracted
      • Databricks
        • Databricks Metadata Extracted
      • Redshift
        • Redshift Metadata Extracted
      • Snowflake
        • Snowflake Metadata Extracted
        • Snowflake Costs
        • Snowflake Native App
      • Apache Hive
        • Apache Hive Metadata Extracted
      • Azure Synapse
        • Azure Synapse Metadata Extracted
      • MotherDuck
        • MotherDuck Metadata Extracted
      • ClickHouse
        • ClickHouse Metadata Extracted
    • Databases
      • Druid
        • Druid Metadata Extracted
      • MySQL
        • MySQL Metadata Extracted
      • Microsoft SQL Server
        • Page
        • Microsoft SQL Server Metadata Extracted
      • Oracle
        • Oracle Metadata Extracted
      • Salesforce
        • Salesforce Metadata Extracted
      • Postgres
        • Postgres Metadata Extracted
      • MongoDB
        • MongoDB Metadata Extracted
      • Azure Cosmos DB
        • Azure Cosmos DB Metadata Extracted
      • SingleStore
        • SingleStore Metadata Extracted
      • DynamoDB
        • DynamoDB Metadata Extracted
    • Data Visualization Tools
      • Amplitude
        • Amplitude Metadata Extracted
      • Looker
        • Looker Metadata Extracted
      • Looker Studio
        • Looker Studio Metadata Extracted
      • Metabase
        • Metabase Metadata Extracted
      • Mixpanel
        • Mixpanel Metadata Extracted
      • Mode
        • Mode Metadata Extracted
      • Power BI
        • Power BI Metadata Extracted
      • QuickSight
        • QuickSight Metadata Extracted
      • Retool
        • Retool Metadata Extracted
      • Redash
        • Redash Metadata Extracted
      • Sigma
        • Sigma Metadata Extracted
      • Tableau
        • Tableau Metadata Extracted
      • ThoughtSpot
        • ThoughtSpot Metadata Extracted
      • Cluvio
        • Cluvio Metadata Extracted
      • Hashboard
        • Hashboard Metadata Extracted
      • Lightdash
        • Lightdash Metadata Extracted
      • Preset
        • Preset Metadata Extracted
      • Superset
        • Superset Metadata Extracted
      • SQL Server Reporting Services
        • SQL Server Reporting Services Metadata Extracted
      • Hex
        • Hex Metadata Extracted
      • Omni
        • Omni Metadata Extracted
    • Data Pipeline Tools
      • Census
        • Census Metadata Extracted
      • Stitch
        • Stitch Metadata Extracted
      • Airflow
        • Airflow Metadata Extracted
      • Dagster
        • Dagster Metadata Extracted
      • Fivetran
        • Fivetran Metadata Extracted
      • Glue
        • Glue Metadata Extracted
      • Hightouch
        • Hightouch Metadata Extracted
      • Apache Kafka
        • Apache Kafka Metadata Extracted
      • Confluent Cloud
        • Confluent Cloud Metadata Extracted
      • Polytomic
        • Polytomic Metadata Extracted
      • Matillion
        • Matillion Metadata Extracted
      • Airbyte
        • Airbyte Extracted Metadata
      • Informatica
        • Informatica Metadata Extracted
      • Azure Data Factory
        • Azure Data Factory Metadata Extracted
    • Data Transformation Tools
      • dbt
        • dbt Cloud
          • dbt Cloud Metadata Extracted
        • dbt Core
          • dbt Core Metadata Extracted
      • Coalesce
        • Coalesce Metadata Extracted
    • Data Quality Tools
      • Cyera
      • Dataplex
        • Dataplex Metadata Extracted
      • Great Expectations
        • Great Expectations Metadata Extracted
      • Monte Carlo
        • Monte Carlo Metadata Extracted
      • Soda
        • Soda Metadata Extracted
    • Data Lakes
      • Google Cloud Storage
        • GCS Metadata Extracted
      • AWS S3
        • S3 Metadata Extracted
    • Query Engines
      • Trino
        • Trino Metadata Extracted
    • Custom Integrations
      • File Upload
        • CSV File Format
        • JSONL File Format
        • Maintain your Resources
      • Marketplace
        • Secoda SDK
        • Upload and Connect your Marketplace Integration
        • Publish the Integration
        • Example Integrations
      • Secoda Fields Explained
    • Security
      • Connecting via Reverse SSH Tunnel
      • Connecting via SSH Tunnel
      • Connecting via VPC Peering
      • Connecting via AWS Cross Account Role
      • Connecting via AWS PrivateLink
        • Snowflake via AWS PrivateLink
        • AWS Service via AWS PrivateLink
      • Recommendations to Improve SSH Tunnel Concurrency on SSH Bastion
    • Push Metadata to Source
  • Extensions
    • Chrome
    • Confluence
      • Confluence Metadata Extracted
      • Confluence best practices
    • Git
    • GitHub
    • Jira
      • Jira Metadata Extracted
    • Linear
    • Microsoft Teams
    • PagerDuty
    • Slack
      • Slack user guide
  • Features
    • Access Requests
    • Activity Log
    • Analytics
    • Announcements
    • Audit Log
    • Automations
      • Automations Use Cases
    • Archive
    • Bookmarks
    • Catalog
    • Collections
    • Column Profiling
    • Data Previews
    • Data Quality Score
    • Documents
      • Comments
      • Embeddings
    • Filters
    • Glossary
    • Homepage
    • Inbox
    • Lineage
      • Manual Lineage
    • Metrics
    • Monitors
      • Monitoring Use Cases
    • Notifications
    • Policies
    • Popularity
    • Publishing
    • Queries
      • Query Blocks
        • Chart Blocks
      • Extracted Queries
    • Questions
    • Search
    • Secoda AI
      • Secoda AI User Guide
      • Secoda AI Use Cases
      • Secoda AI Security FAQs
      • Secoda MCP Server
    • Sharing
    • Views
  • Enterprise
    • SAML
      • Okta SAML
      • OneLogin SAML
      • Microsoft Azure AD SAML
      • Google SAML
      • SCIM
      • SAML Attributes
    • Self-Hosted
      • Additional Resources
        • Additional Environment Variables
          • PowerBI OAuth Application (on-premise)
          • Google OAuth Application (on-premise)
          • Github Application (on-premise)
          • OpenAI API Key Creation (on-premise)
          • AWS Bucket with Access Keys (on-premise)
        • TLS/SSL (Docker compose)
        • Automatic Updates (Docker compose)
        • Backups (Docker compose)
        • Outbound Connections
      • Self-Hosted Changelog
    • SIEM
      • Google Chronicle
  • API
    • Get Started
    • Authentication
    • Example Workflows
    • API Reference
      • Getting Started
      • Helpful Information
      • Audit Logs
      • Charts
      • Collections
      • Columns
      • Custom Properties
      • Dashboards
      • Databases
      • Documents
      • Events
      • Groups
      • Integrations
      • Lineage
      • Monitors
      • Resources
      • Schemas
      • Tables
      • Tags
      • Teams
      • Users
      • Questions
      • Queries
      • Getting Started
      • Helpful Information
      • Audit Logs
      • Charts
      • Collections
      • Columns
      • Custom Properties
      • Dashboards
      • Databases
      • Documents
      • Events
      • Groups
      • Integrations
      • Lineage
      • Monitors
      • Resources
      • Schemas
      • Tables
      • Tags
      • Teams
      • Users
      • Questions
      • Queries
  • FAQ
  • Policies
    • Terms of Use
    • Secoda AI Terms
    • Master Subscription Agreement
    • Privacy Policy
    • Security Policy
    • Accessibility Statement
    • Data Processing Agreement
    • Subprocessors
    • Service Level Agreement
    • Bug Bounty Program
  • System Status
  • Changelog
Powered by GitBook
On this page
  • Example 1: Cluvio
  • Example 2: Google Cloud Storage

Was this helpful?

  1. Integrations
  2. Custom Integrations
  3. Marketplace

Example Integrations

Example 1: Cluvio

import os

from secodadk import SecodaIntegration, Resource, DeclaredLineage, InternalResource

BASE_URL = "https://api.cluvio.com"


class FakeCluvioIntegration(SecodaIntegration):
    def extract(self):
        token = (
            self.http_post(
                f"{BASE_URL}/users/sign_in",
                json={
                    "user": {
                        "email": self.credentials.get("email"),
                        "password": self.credentials.get("password"),
                    }
                },
                headers={"Accept": "application/json"},
            )
            .json()
            .get("token")
        )

        auth = dict(
            headers={"token": token},
            verify=True,
        )

        group_databuilder_id = "dashboard_group.default"
        self.declare_resource(
            Resource(
                entity_type="dashboard_group",
                databuilder_id=group_databuilder_id,
                title="default",
            )
        )

        for dashboard in (
            self.http_get(
                f"{BASE_URL}/dashboards/",
                **auth,
            )
            .json()
            .get("data", [])
        ):
            dashboard_databuilder_id = f"dashboard.{dashboard.get('id')}"

            self.declare_resource(
                Resource(
                    entity_type="dashboard",
                    databuilder_id=dashboard_databuilder_id,
                    title=dashboard.get("attributes", {}).get("name"),
                    description=dashboard.get("attributes", {}).get("description"),
                    parent_databuilder_id=group_databuilder_id,
                    external_updated_at=dashboard.get("attributes", {}).get(
                        "updated_at"
                    ),
                    product="cluvio",
                    native_type="dashboard",
                )
            )


            self.declare_lineage(
                DeclaredLineage(
                    from_identifier=InternalResource(
                        databuilder_id=dashboard_databuilder_id
                    ),
                    to_identifier=InternalResource(databuilder_id=group_databuilder_id),
                )
            )


if __name__ == "__main__":
    FakeCluvioIntegration(
        credentials={
            "email": os.environ.get("EMAIL"),
            "password": os.environ.get("PASSWORD"),
        }
    ).start()

Example 2: Google Cloud Storage

import os
import csv
import json
from io import StringIO
from secodadk import SecodaIntegration, Resource, DeclaredLineage, InternalResource, ExternalTable
from google.cloud import storage

class GoogleCloudStorageIntegration(SecodaIntegration):
    def extract(self):

        json_key = json.loads(self.credentials.get("json_file"))
        storage_client = storage.Client.from_service_account_info(json_key)

        BUCKET_NAME = self.credentials.get("bucket_name")
        bucket = storage_client.get_bucket(BUCKET_NAME)
        blobs = list(bucket.list_blobs())

        for blob in blobs: 
            # Sample blob names:
            # 2023/
            # 2023/montreal/
            # 2023/montreal/montreal_sales_data.csv

            parts = [part for part in blob.name.split('/') if part]
            if len(parts) == 1:
                ## Database
                self.declare_resource(
                    Resource(
                        title=parts[0],
                        entity_type="database",
                        native_type="folder",
                        databuilder_id=f"{BUCKET_NAME}.{parts[0]}",
                    )
                )
            elif len(parts) == 2:
                ## Schema
                self.declare_resource(
                    Resource(
                        title=parts[1],
                        entity_type="schema",
                        native_type="sub-folder",
                        database=parts[0],
                        databuilder_id=f"{BUCKET_NAME}.{parts[0]}.{parts[1]}"
                    )
                )
            elif len(parts) == 3:
                ## Table
                table_name = (parts[2])[:-4]
                self.declare_resource(
                    Resource(
                        title=table_name, 
                        entity_type="table",
                        native_type="csv",
                        database=parts[0],
                        schema=parts[1],
                        databuilder_id=f"{BUCKET_NAME}.{parts[0]}.{parts[1]}.{table_name}",
                        description=f"CSV data from pop up sales. Can be found here: {blob.public_url}"
                    )
                )

                dataset = BUCKET_NAME.replace("-", "_")
                self.declare_lineage(
                    DeclaredLineage(
                        from_identifier = InternalResource(
                            databuilder_id=f"{BUCKET_NAME}.{parts[0]}.{parts[1]}.{table_name}"
                        ),
                        to_identifier = ExternalTable(
                            database="secoda-web",
                            schema=dataset,
                            table=table_name
                        )
                    )
                )

                csv_string = blob.download_as_text(encoding='utf-8')
                csv_data = csv.reader(StringIO(csv_string), delimiter=',')
                header = next(csv_data)

                for field in header:
                    ## Column
                    self.declare_resource(
                        Resource(
                            title=field,
                            entity_type="column",
                            native_type="field",
                            database=parts[0],
                            schema=parts[1],
                            table=table_name,
                            parent_databuilder_id=f"{BUCKET_NAME}.{parts[0]}.{parts[1]}.{table_name}",
                            databuilder_id=f"{BUCKET_NAME}.{parts[0]}.{parts[1]}.{table_name}.{field}"
                        )
                    )

if __name__ == "__main__":
    GoogleCloudStorageIntegration(
        credentials={
            "json_file": os.environ.get("json_file"),
            "bucket_name": os.environ.get("bucket_name")
        }
    ).start()

Last updated 11 months ago

Was this helpful?

See webinar reviewing this example .

here