Technology & EngineeringData Pipeline Services246 lines

Fivetran

Configure and manage Fivetran connectors for automated data ingestion into warehouses.

Quick Summary15 lines

You are an expert in Fivetran managed data integration, skilled at configuring connectors, managing sync schedules, building post-load transformations, and automating operations via the Fivetran REST API.

## Key Points

- **Replicating entire databases when only a few tables are needed**: Wastes MAR budget and destination storage; selectively enable only required schemas and tables
- **Ignoring connector warnings and schema change notifications**: Unaddressed warnings escalate to sync failures; route alerts to Slack or PagerDuty
- **Using Fivetran transformations for complex multi-table joins**: Fivetran SQL transformations are meant for lightweight post-load cleanup; use dbt for complex modeling
- **Storing Fivetran API credentials in application code**: Use a secrets manager; API keys have full account access including billing and connector deletion
- Ingesting data from SaaS applications (Salesforce, HubSpot, Stripe) into a warehouse without building custom connectors
- Replicating production databases (PostgreSQL, MySQL) via CDC (WAL/binlog) for analytics
- Setting up a reliable ELT pipeline where managed infrastructure and automatic schema migration reduce maintenance
- Auditing data freshness and sync health across dozens of data sources from a single control plane
- Rapidly onboarding new data sources where time-to-value matters more than ingestion customization

skilldb get data-pipeline-services-skills/FivetranFull skill: 246 lines

Paste into your CLAUDE.md or agent config

Fivetran

You are an expert in Fivetran managed data integration, skilled at configuring connectors, managing sync schedules, building post-load transformations, and automating operations via the Fivetran REST API.

Core Philosophy

Managed ELT over Custom ETL

Fivetran handles extraction and loading automatically with pre-built connectors. Focus your engineering effort on the T (transformation) layer using dbt or Fivetran Transformations, not on building and maintaining custom ingestion scripts.

Schema Drift Tolerance

Fivetran automatically propagates schema changes from sources. Configure column blocking and schema change handling policies rather than assuming static schemas.

Sync Frequency as a Cost Lever

Every sync incurs compute and MAR (Monthly Active Rows) costs. Match sync frequency to actual business freshness requirements; not every table needs 5-minute syncs.

Setup

Configure Fivetran via the dashboard or REST API. API authentication uses Basic Auth with your API key and secret:

# Store credentials
export FIVETRAN_API_KEY="your_api_key"
export FIVETRAN_API_SECRET="your_api_secret"
export FIVETRAN_BASE_URL="https://api.fivetran.com/v1"

# Test connectivity
curl -s -u "${FIVETRAN_API_KEY}:${FIVETRAN_API_SECRET}" \
  "${FIVETRAN_BASE_URL}/account/info" | jq .

Terraform provider setup for infrastructure-as-code:

terraform {
  required_providers {
    fivetran = {
      source  = "fivetran/fivetran"
      version = "~> 1.0"
    }
  }
}

provider "fivetran" {
  api_key    = var.fivetran_api_key
  api_secret = var.fivetran_api_secret
}

Key Patterns

Do: Use the API to create and configure connectors programmatically

import requests
from requests.auth import HTTPBasicAuth

auth = HTTPBasicAuth(FIVETRAN_API_KEY, FIVETRAN_API_SECRET)

# Create a PostgreSQL connector
response = requests.post(
    f"{FIVETRAN_BASE_URL}/connectors",
    auth=auth,
    json={
        "group_id": "your_destination_group_id",
        "service": "postgres",
        "config": {
            "host": "db.example.com",
            "port": 5432,
            "database": "production",
            "user": "fivetran_reader",
            "password": "secure_password",
            "update_method": "WAL",
        },
        "sync_frequency": 60,
        "paused": False,
    },
)
connector = response.json()["data"]
connector_id = connector["id"]

Do Not: Use 5-minute sync frequency for all connectors blindly

# BAD - unnecessary cost and load on source system
{"sync_frequency": 5}

# GOOD - match frequency to business need
# Transactional data: 15-60 min
{"sync_frequency": 15, "service": "postgres"}

# Slowly changing reference data: 360-1440 min
{"sync_frequency": 1440, "service": "google_sheets"}

Do: Configure webhook notifications for sync events

# Create a webhook for sync completion events
requests.post(
    f"{FIVETRAN_BASE_URL}/webhooks/account",
    auth=auth,
    json={
        "url": "https://your-app.com/fivetran/webhook",
        "events": [
            "sync_end",
            "connector_warning",
            "connector_failure",
        ],
        "active": True,
        "secret": "webhook_signing_secret",
    },
)

Common Patterns

Trigger sync and wait for completion

import time

def trigger_and_wait(connector_id: str, timeout: int = 3600) -> dict:
    # Force sync
    requests.post(
        f"{FIVETRAN_BASE_URL}/connectors/{connector_id}/force",
        auth=auth,
    )

    start = time.time()
    while time.time() - start < timeout:
        resp = requests.get(
            f"{FIVETRAN_BASE_URL}/connectors/{connector_id}",
            auth=auth,
        )
        status = resp.json()["data"]["status"]["sync_state"]
        if status == "scheduled":
            return resp.json()["data"]
        time.sleep(30)

    raise TimeoutError(f"Sync did not complete within {timeout}s")

Block sensitive columns from syncing

# Modify schema to exclude PII columns
requests.patch(
    f"{FIVETRAN_BASE_URL}/connectors/{connector_id}/schemas",
    auth=auth,
    json={
        "schemas": {
            "public": {
                "tables": {
                    "users": {
                        "columns": {
                            "ssn": {"enabled": False},
                            "email": {"hashed": True},
                        }
                    }
                }
            }
        }
    },
)

Manage connectors with Terraform

resource "fivetran_connector" "salesforce" {
  group_id       = fivetran_destination.warehouse.id
  service        = "salesforce"
  sync_frequency = 60
  paused         = false

  config {
    domain    = "mycompany"
    is_sandbox = false
  }
}

resource "fivetran_connector_schema_config" "salesforce_schema" {
  connector_id         = fivetran_connector.salesforce.id
  schema_change_handling = "ALLOW_ALL"

  schema {
    name    = "salesforce"
    enabled = true

    table {
      name    = "Account"
      enabled = true
    }
    table {
      name    = "Opportunity"
      enabled = true
    }
  }
}

Integrate with Airflow for orchestration

from airflow.providers.fivetran.operators.fivetran import FivetranOperator
from airflow.providers.fivetran.sensors.fivetran import FivetranSensor

trigger_sync = FivetranOperator(
    task_id="trigger_fivetran_sync",
    fivetran_conn_id="fivetran_default",
    connector_id="{{ var.value.fivetran_connector_id }}",
)

wait_for_sync = FivetranSensor(
    task_id="wait_for_sync",
    fivetran_conn_id="fivetran_default",
    connector_id="{{ var.value.fivetran_connector_id }}",
    poke_interval=60,
)

trigger_sync >> wait_for_sync >> dbt_run

Anti-Patterns

Replicating entire databases when only a few tables are needed: Wastes MAR budget and destination storage; selectively enable only required schemas and tables
Ignoring connector warnings and schema change notifications: Unaddressed warnings escalate to sync failures; route alerts to Slack or PagerDuty
Using Fivetran transformations for complex multi-table joins: Fivetran SQL transformations are meant for lightweight post-load cleanup; use dbt for complex modeling
Storing Fivetran API credentials in application code: Use a secrets manager; API keys have full account access including billing and connector deletion

When to Use

Ingesting data from SaaS applications (Salesforce, HubSpot, Stripe) into a warehouse without building custom connectors
Replicating production databases (PostgreSQL, MySQL) via CDC (WAL/binlog) for analytics
Setting up a reliable ELT pipeline where managed infrastructure and automatic schema migration reduce maintenance
Auditing data freshness and sync health across dozens of data sources from a single control plane
Rapidly onboarding new data sources where time-to-value matters more than ingestion customization

Install this skill directly: skilldb add data-pipeline-services-skills

Get CLI access →

Fivetran

Core Philosophy

Managed ELT over Custom ETL

Schema Drift Tolerance

Sync Frequency as a Cost Lever

Setup

Store credentials

Test connectivity

Key Patterns

Do: Use the API to create and configure connectors programmatically

Create a PostgreSQL connector

Do Not: Use 5-minute sync frequency for all connectors blindly

BAD - unnecessary cost and load on source system

GOOD - match frequency to business need

Transactional data: 15-60 min

Slowly changing reference data: 360-1440 min

Do: Configure webhook notifications for sync events

Create a webhook for sync completion events

Common Patterns

Trigger sync and wait for completion

Block sensitive columns from syncing

Modify schema to exclude PII columns

Manage connectors with Terraform

Integrate with Airflow for orchestration

Anti-Patterns

When to Use

Details

Pack: data-pipeline-services-skills
File: fivetran.md
Lines: 246
Category: Technology & Engineering

Download via CLI

Pro

$ skilldb add data-pipeline-services-skills

Installs the full Data Pipeline Services pack to your project.

Fivetran

Fivetran

Core Philosophy

Managed ELT over Custom ETL

Schema Drift Tolerance

Sync Frequency as a Cost Lever

Setup

Key Patterns

Do: Use the API to create and configure connectors programmatically

Do Not: Use 5-minute sync frequency for all connectors blindly

Do: Configure webhook notifications for sync events

Common Patterns

Trigger sync and wait for completion

Block sensitive columns from syncing

Manage connectors with Terraform

Integrate with Airflow for orchestration

Anti-Patterns

When to Use

Related Skills

Airbyte

Apache Airflow

Apache Spark

Bigquery

Clickhouse

DBT