Fivetran
Configure and manage Fivetran connectors for automated data ingestion into warehouses.
You are an expert in Fivetran managed data integration, skilled at configuring connectors, managing sync schedules, building post-load transformations, and automating operations via the Fivetran REST API. ## Key Points - **Replicating entire databases when only a few tables are needed**: Wastes MAR budget and destination storage; selectively enable only required schemas and tables - **Ignoring connector warnings and schema change notifications**: Unaddressed warnings escalate to sync failures; route alerts to Slack or PagerDuty - **Using Fivetran transformations for complex multi-table joins**: Fivetran SQL transformations are meant for lightweight post-load cleanup; use dbt for complex modeling - **Storing Fivetran API credentials in application code**: Use a secrets manager; API keys have full account access including billing and connector deletion - Ingesting data from SaaS applications (Salesforce, HubSpot, Stripe) into a warehouse without building custom connectors - Replicating production databases (PostgreSQL, MySQL) via CDC (WAL/binlog) for analytics - Setting up a reliable ELT pipeline where managed infrastructure and automatic schema migration reduce maintenance - Auditing data freshness and sync health across dozens of data sources from a single control plane - Rapidly onboarding new data sources where time-to-value matters more than ingestion customization
skilldb get data-pipeline-services-skills/FivetranFull skill: 246 linesFivetran
You are an expert in Fivetran managed data integration, skilled at configuring connectors, managing sync schedules, building post-load transformations, and automating operations via the Fivetran REST API.
Core Philosophy
Managed ELT over Custom ETL
Fivetran handles extraction and loading automatically with pre-built connectors. Focus your engineering effort on the T (transformation) layer using dbt or Fivetran Transformations, not on building and maintaining custom ingestion scripts.
Schema Drift Tolerance
Fivetran automatically propagates schema changes from sources. Configure column blocking and schema change handling policies rather than assuming static schemas.
Sync Frequency as a Cost Lever
Every sync incurs compute and MAR (Monthly Active Rows) costs. Match sync frequency to actual business freshness requirements; not every table needs 5-minute syncs.
Setup
Configure Fivetran via the dashboard or REST API. API authentication uses Basic Auth with your API key and secret:
# Store credentials
export FIVETRAN_API_KEY="your_api_key"
export FIVETRAN_API_SECRET="your_api_secret"
export FIVETRAN_BASE_URL="https://api.fivetran.com/v1"
# Test connectivity
curl -s -u "${FIVETRAN_API_KEY}:${FIVETRAN_API_SECRET}" \
"${FIVETRAN_BASE_URL}/account/info" | jq .
Terraform provider setup for infrastructure-as-code:
terraform {
required_providers {
fivetran = {
source = "fivetran/fivetran"
version = "~> 1.0"
}
}
}
provider "fivetran" {
api_key = var.fivetran_api_key
api_secret = var.fivetran_api_secret
}
Key Patterns
Do: Use the API to create and configure connectors programmatically
import requests
from requests.auth import HTTPBasicAuth
auth = HTTPBasicAuth(FIVETRAN_API_KEY, FIVETRAN_API_SECRET)
# Create a PostgreSQL connector
response = requests.post(
f"{FIVETRAN_BASE_URL}/connectors",
auth=auth,
json={
"group_id": "your_destination_group_id",
"service": "postgres",
"config": {
"host": "db.example.com",
"port": 5432,
"database": "production",
"user": "fivetran_reader",
"password": "secure_password",
"update_method": "WAL",
},
"sync_frequency": 60,
"paused": False,
},
)
connector = response.json()["data"]
connector_id = connector["id"]
Do Not: Use 5-minute sync frequency for all connectors blindly
# BAD - unnecessary cost and load on source system
{"sync_frequency": 5}
# GOOD - match frequency to business need
# Transactional data: 15-60 min
{"sync_frequency": 15, "service": "postgres"}
# Slowly changing reference data: 360-1440 min
{"sync_frequency": 1440, "service": "google_sheets"}
Do: Configure webhook notifications for sync events
# Create a webhook for sync completion events
requests.post(
f"{FIVETRAN_BASE_URL}/webhooks/account",
auth=auth,
json={
"url": "https://your-app.com/fivetran/webhook",
"events": [
"sync_end",
"connector_warning",
"connector_failure",
],
"active": True,
"secret": "webhook_signing_secret",
},
)
Common Patterns
Trigger sync and wait for completion
import time
def trigger_and_wait(connector_id: str, timeout: int = 3600) -> dict:
# Force sync
requests.post(
f"{FIVETRAN_BASE_URL}/connectors/{connector_id}/force",
auth=auth,
)
start = time.time()
while time.time() - start < timeout:
resp = requests.get(
f"{FIVETRAN_BASE_URL}/connectors/{connector_id}",
auth=auth,
)
status = resp.json()["data"]["status"]["sync_state"]
if status == "scheduled":
return resp.json()["data"]
time.sleep(30)
raise TimeoutError(f"Sync did not complete within {timeout}s")
Block sensitive columns from syncing
# Modify schema to exclude PII columns
requests.patch(
f"{FIVETRAN_BASE_URL}/connectors/{connector_id}/schemas",
auth=auth,
json={
"schemas": {
"public": {
"tables": {
"users": {
"columns": {
"ssn": {"enabled": False},
"email": {"hashed": True},
}
}
}
}
}
},
)
Manage connectors with Terraform
resource "fivetran_connector" "salesforce" {
group_id = fivetran_destination.warehouse.id
service = "salesforce"
sync_frequency = 60
paused = false
config {
domain = "mycompany"
is_sandbox = false
}
}
resource "fivetran_connector_schema_config" "salesforce_schema" {
connector_id = fivetran_connector.salesforce.id
schema_change_handling = "ALLOW_ALL"
schema {
name = "salesforce"
enabled = true
table {
name = "Account"
enabled = true
}
table {
name = "Opportunity"
enabled = true
}
}
}
Integrate with Airflow for orchestration
from airflow.providers.fivetran.operators.fivetran import FivetranOperator
from airflow.providers.fivetran.sensors.fivetran import FivetranSensor
trigger_sync = FivetranOperator(
task_id="trigger_fivetran_sync",
fivetran_conn_id="fivetran_default",
connector_id="{{ var.value.fivetran_connector_id }}",
)
wait_for_sync = FivetranSensor(
task_id="wait_for_sync",
fivetran_conn_id="fivetran_default",
connector_id="{{ var.value.fivetran_connector_id }}",
poke_interval=60,
)
trigger_sync >> wait_for_sync >> dbt_run
Anti-Patterns
- Replicating entire databases when only a few tables are needed: Wastes MAR budget and destination storage; selectively enable only required schemas and tables
- Ignoring connector warnings and schema change notifications: Unaddressed warnings escalate to sync failures; route alerts to Slack or PagerDuty
- Using Fivetran transformations for complex multi-table joins: Fivetran SQL transformations are meant for lightweight post-load cleanup; use dbt for complex modeling
- Storing Fivetran API credentials in application code: Use a secrets manager; API keys have full account access including billing and connector deletion
When to Use
- Ingesting data from SaaS applications (Salesforce, HubSpot, Stripe) into a warehouse without building custom connectors
- Replicating production databases (PostgreSQL, MySQL) via CDC (WAL/binlog) for analytics
- Setting up a reliable ELT pipeline where managed infrastructure and automatic schema migration reduce maintenance
- Auditing data freshness and sync health across dozens of data sources from a single control plane
- Rapidly onboarding new data sources where time-to-value matters more than ingestion customization
Install this skill directly: skilldb add data-pipeline-services-skills
Related Skills
Airbyte
Configure Airbyte open-source data integration with custom connectors, destinations, and CDC replication.
Apache Airflow
Orchestrate data pipelines using Apache Airflow DAGs, operators, sensors, and XCom.
Apache Spark
Process large-scale data with Apache Spark using PySpark DataFrames, Spark SQL, and structured streaming.
Bigquery
Build analytical pipelines on Google BigQuery using SQL, streaming inserts, and federated queries.
Clickhouse
Build high-performance OLAP queries on ClickHouse using MergeTree engines, materialized views, and aggregations.
DBT
Build and test data transformation pipelines using dbt models, macros, and incremental strategies.