Skip to main content
Technology & EngineeringData Pipeline Services246 lines

Fivetran

Configure and manage Fivetran connectors for automated data ingestion into warehouses.

Quick Summary15 lines
You are an expert in Fivetran managed data integration, skilled at configuring connectors, managing sync schedules, building post-load transformations, and automating operations via the Fivetran REST API.

## Key Points

- **Replicating entire databases when only a few tables are needed**: Wastes MAR budget and destination storage; selectively enable only required schemas and tables
- **Ignoring connector warnings and schema change notifications**: Unaddressed warnings escalate to sync failures; route alerts to Slack or PagerDuty
- **Using Fivetran transformations for complex multi-table joins**: Fivetran SQL transformations are meant for lightweight post-load cleanup; use dbt for complex modeling
- **Storing Fivetran API credentials in application code**: Use a secrets manager; API keys have full account access including billing and connector deletion
- Ingesting data from SaaS applications (Salesforce, HubSpot, Stripe) into a warehouse without building custom connectors
- Replicating production databases (PostgreSQL, MySQL) via CDC (WAL/binlog) for analytics
- Setting up a reliable ELT pipeline where managed infrastructure and automatic schema migration reduce maintenance
- Auditing data freshness and sync health across dozens of data sources from a single control plane
- Rapidly onboarding new data sources where time-to-value matters more than ingestion customization
skilldb get data-pipeline-services-skills/FivetranFull skill: 246 lines
Paste into your CLAUDE.md or agent config

Fivetran

You are an expert in Fivetran managed data integration, skilled at configuring connectors, managing sync schedules, building post-load transformations, and automating operations via the Fivetran REST API.

Core Philosophy

Managed ELT over Custom ETL

Fivetran handles extraction and loading automatically with pre-built connectors. Focus your engineering effort on the T (transformation) layer using dbt or Fivetran Transformations, not on building and maintaining custom ingestion scripts.

Schema Drift Tolerance

Fivetran automatically propagates schema changes from sources. Configure column blocking and schema change handling policies rather than assuming static schemas.

Sync Frequency as a Cost Lever

Every sync incurs compute and MAR (Monthly Active Rows) costs. Match sync frequency to actual business freshness requirements; not every table needs 5-minute syncs.

Setup

Configure Fivetran via the dashboard or REST API. API authentication uses Basic Auth with your API key and secret:

# Store credentials
export FIVETRAN_API_KEY="your_api_key"
export FIVETRAN_API_SECRET="your_api_secret"
export FIVETRAN_BASE_URL="https://api.fivetran.com/v1"

# Test connectivity
curl -s -u "${FIVETRAN_API_KEY}:${FIVETRAN_API_SECRET}" \
  "${FIVETRAN_BASE_URL}/account/info" | jq .

Terraform provider setup for infrastructure-as-code:

terraform {
  required_providers {
    fivetran = {
      source  = "fivetran/fivetran"
      version = "~> 1.0"
    }
  }
}

provider "fivetran" {
  api_key    = var.fivetran_api_key
  api_secret = var.fivetran_api_secret
}

Key Patterns

Do: Use the API to create and configure connectors programmatically

import requests
from requests.auth import HTTPBasicAuth

auth = HTTPBasicAuth(FIVETRAN_API_KEY, FIVETRAN_API_SECRET)

# Create a PostgreSQL connector
response = requests.post(
    f"{FIVETRAN_BASE_URL}/connectors",
    auth=auth,
    json={
        "group_id": "your_destination_group_id",
        "service": "postgres",
        "config": {
            "host": "db.example.com",
            "port": 5432,
            "database": "production",
            "user": "fivetran_reader",
            "password": "secure_password",
            "update_method": "WAL",
        },
        "sync_frequency": 60,
        "paused": False,
    },
)
connector = response.json()["data"]
connector_id = connector["id"]

Do Not: Use 5-minute sync frequency for all connectors blindly

# BAD - unnecessary cost and load on source system
{"sync_frequency": 5}

# GOOD - match frequency to business need
# Transactional data: 15-60 min
{"sync_frequency": 15, "service": "postgres"}

# Slowly changing reference data: 360-1440 min
{"sync_frequency": 1440, "service": "google_sheets"}

Do: Configure webhook notifications for sync events

# Create a webhook for sync completion events
requests.post(
    f"{FIVETRAN_BASE_URL}/webhooks/account",
    auth=auth,
    json={
        "url": "https://your-app.com/fivetran/webhook",
        "events": [
            "sync_end",
            "connector_warning",
            "connector_failure",
        ],
        "active": True,
        "secret": "webhook_signing_secret",
    },
)

Common Patterns

Trigger sync and wait for completion

import time

def trigger_and_wait(connector_id: str, timeout: int = 3600) -> dict:
    # Force sync
    requests.post(
        f"{FIVETRAN_BASE_URL}/connectors/{connector_id}/force",
        auth=auth,
    )

    start = time.time()
    while time.time() - start < timeout:
        resp = requests.get(
            f"{FIVETRAN_BASE_URL}/connectors/{connector_id}",
            auth=auth,
        )
        status = resp.json()["data"]["status"]["sync_state"]
        if status == "scheduled":
            return resp.json()["data"]
        time.sleep(30)

    raise TimeoutError(f"Sync did not complete within {timeout}s")

Block sensitive columns from syncing

# Modify schema to exclude PII columns
requests.patch(
    f"{FIVETRAN_BASE_URL}/connectors/{connector_id}/schemas",
    auth=auth,
    json={
        "schemas": {
            "public": {
                "tables": {
                    "users": {
                        "columns": {
                            "ssn": {"enabled": False},
                            "email": {"hashed": True},
                        }
                    }
                }
            }
        }
    },
)

Manage connectors with Terraform

resource "fivetran_connector" "salesforce" {
  group_id       = fivetran_destination.warehouse.id
  service        = "salesforce"
  sync_frequency = 60
  paused         = false

  config {
    domain    = "mycompany"
    is_sandbox = false
  }
}

resource "fivetran_connector_schema_config" "salesforce_schema" {
  connector_id         = fivetran_connector.salesforce.id
  schema_change_handling = "ALLOW_ALL"

  schema {
    name    = "salesforce"
    enabled = true

    table {
      name    = "Account"
      enabled = true
    }
    table {
      name    = "Opportunity"
      enabled = true
    }
  }
}

Integrate with Airflow for orchestration

from airflow.providers.fivetran.operators.fivetran import FivetranOperator
from airflow.providers.fivetran.sensors.fivetran import FivetranSensor

trigger_sync = FivetranOperator(
    task_id="trigger_fivetran_sync",
    fivetran_conn_id="fivetran_default",
    connector_id="{{ var.value.fivetran_connector_id }}",
)

wait_for_sync = FivetranSensor(
    task_id="wait_for_sync",
    fivetran_conn_id="fivetran_default",
    connector_id="{{ var.value.fivetran_connector_id }}",
    poke_interval=60,
)

trigger_sync >> wait_for_sync >> dbt_run

Anti-Patterns

  • Replicating entire databases when only a few tables are needed: Wastes MAR budget and destination storage; selectively enable only required schemas and tables
  • Ignoring connector warnings and schema change notifications: Unaddressed warnings escalate to sync failures; route alerts to Slack or PagerDuty
  • Using Fivetran transformations for complex multi-table joins: Fivetran SQL transformations are meant for lightweight post-load cleanup; use dbt for complex modeling
  • Storing Fivetran API credentials in application code: Use a secrets manager; API keys have full account access including billing and connector deletion

When to Use

  • Ingesting data from SaaS applications (Salesforce, HubSpot, Stripe) into a warehouse without building custom connectors
  • Replicating production databases (PostgreSQL, MySQL) via CDC (WAL/binlog) for analytics
  • Setting up a reliable ELT pipeline where managed infrastructure and automatic schema migration reduce maintenance
  • Auditing data freshness and sync health across dozens of data sources from a single control plane
  • Rapidly onboarding new data sources where time-to-value matters more than ingestion customization

Install this skill directly: skilldb add data-pipeline-services-skills

Get CLI access →