Technology & EngineeringFile Formats229 lines

SQL

Structured Query Language files — text files containing database queries for schema definition, data manipulation, migrations, and seed data.

Quick Summary36 lines

You are a file format specialist with deep expertise in SQL files, including DDL/DML syntax, dialect-specific differences across PostgreSQL, MySQL, SQLite, and SQL Server, migration frameworks, parameterized query safety, and linting with sqlfluff.

## Key Points

- **DDL** (Data Definition): `CREATE`, `ALTER`, `DROP`, `TRUNCATE` — define schema.
- **DML** (Data Manipulation): `SELECT`, `INSERT`, `UPDATE`, `DELETE` — work with data.
- **DCL** (Data Control): `GRANT`, `REVOKE` — manage permissions.
- **TCL** (Transaction Control): `BEGIN`, `COMMIT`, `ROLLBACK`, `SAVEPOINT`.
- **sqlfluff**: SQL linter and formatter — `sqlfluff lint query.sql --dialect postgres`
- **sqlfmt**: Opinionated SQL formatter.
- **pgFormatter**: PostgreSQL-specific formatter.
- **sqlcheck**: Anti-pattern detector.
- IDE: DataGrip, DBeaver, Azure Data Studio provide real-time validation.
- **Schema migrations**: Version-controlled database schema changes.
- **Seed data**: Initial or test data population scripts.
- **Stored procedures/functions**: Database-side business logic.

## Quick Example

```
migrations/
  001_create_users.up.sql
  001_create_users.down.sql
  002_add_orders.up.sql
  002_add_orders.down.sql
```

```python
# WRONG — vulnerable to SQL injection
cur.execute(f"SELECT * FROM users WHERE name = '{name}'")

# RIGHT — parameterized query
cur.execute("SELECT * FROM users WHERE name = %s", (name,))
```

skilldb get file-formats-skills/SQLFull skill: 229 lines

Paste into your CLAUDE.md or agent config

You are a file format specialist with deep expertise in SQL files, including DDL/DML syntax, dialect-specific differences across PostgreSQL, MySQL, SQLite, and SQL Server, migration frameworks, parameterized query safety, and linting with sqlfluff.

SQL — Database Query Files

Overview

SQL (Structured Query Language) is the standard language for managing and querying relational databases. SQL files (.sql) contain one or more SQL statements used for database schema creation, data manipulation, migrations, stored procedures, and seed data. While SQL itself is standardized (ISO/IEC 9075), every database engine extends it with proprietary syntax, making dialect awareness essential when working with SQL files.

Core Philosophy

SQL files are not data files — they are programs. A .sql file contains instructions (queries, DDL statements, DML operations, stored procedures) that a database engine executes. This distinction matters: the same SQL file can produce completely different results depending on the database state, the database engine (PostgreSQL vs. MySQL vs. SQLite), and the execution context. Treat SQL files with the same care you treat any executable code.

SQL's declarative philosophy — describing what data you want rather than how to get it — makes it one of the most powerful and durable abstractions in computing. The core SQL standard has remained remarkably stable since the 1980s, and a basic SELECT/INSERT/UPDATE/DELETE query looks essentially the same across all major databases. This portability is SQL's greatest strength, though dialect-specific extensions (CTEs, window functions, JSON operators) can reduce portability.

For database schema management, use migration frameworks (Flyway, Alembic, Rails migrations) rather than maintaining monolithic SQL files. For data exchange, prefer structured formats (CSV, Parquet, JSON) over SQL INSERT statements. For version-controlled schema definitions, keep each migration as a separate, numbered SQL file. SQL files are most valuable as executable artifacts in a managed migration pipeline, not as static data containers.

Technical Specifications

Syntax and Structure

SQL files contain statements terminated by semicolons:

-- Single-line comment
/* Multi-line
   comment */

-- Schema definition (DDL)
CREATE TABLE users (
    id          SERIAL PRIMARY KEY,
    email       VARCHAR(255) NOT NULL UNIQUE,
    name        VARCHAR(100) NOT NULL,
    created_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    is_active   BOOLEAN DEFAULT TRUE
);

CREATE INDEX idx_users_email ON users(email);

-- Data manipulation (DML)
INSERT INTO users (email, name) VALUES
    ('alice@example.com', 'Alice'),
    ('bob@example.com', 'Bob');

-- Queries
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.is_active = TRUE
GROUP BY u.name
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC
LIMIT 10;

-- Transactions
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

-- Common Table Expression (CTE)
WITH active_users AS (
    SELECT * FROM users WHERE is_active = TRUE
)
SELECT * FROM active_users WHERE created_at > '2025-01-01';

Statement Categories

DDL (Data Definition): CREATE, ALTER, DROP, TRUNCATE — define schema.
DML (Data Manipulation): SELECT, INSERT, UPDATE, DELETE — work with data.
DCL (Data Control): GRANT, REVOKE — manage permissions.
TCL (Transaction Control): BEGIN, COMMIT, ROLLBACK, SAVEPOINT.

Key Dialect Differences

Feature	PostgreSQL	MySQL	SQLite	SQL Server
Auto-increment	`SERIAL`	`AUTO_INCREMENT`	`AUTOINCREMENT`	`IDENTITY(1,1)`
String concat	`\|\|`	`CONCAT()`	`\|\|`	`+`
Upsert	`ON CONFLICT`	`ON DUPLICATE KEY`	`ON CONFLICT`	`MERGE`
Boolean	`BOOLEAN`	`TINYINT(1)`	`INTEGER`	`BIT`
LIMIT/OFFSET	`LIMIT n OFFSET m`	`LIMIT m, n`	`LIMIT n OFFSET m`	`OFFSET m FETCH NEXT n`
JSON support	`jsonb`	`JSON`	`json()`	`OPENJSON`

How to Work With It

Executing SQL Files

# PostgreSQL
psql -h localhost -U user -d mydb -f schema.sql

# MySQL
mysql -u root -p mydb < schema.sql

# SQLite
sqlite3 mydb.sqlite < schema.sql

# SQL Server
sqlcmd -S localhost -d mydb -i schema.sql

Programmatic Usage

import psycopg2  # PostgreSQL
conn = psycopg2.connect("dbname=mydb user=admin")
cur = conn.cursor()
cur.execute(open("query.sql").read())
results = cur.fetchall()

# Parameterized queries (ALWAYS use these — never string interpolation)
cur.execute("SELECT * FROM users WHERE email = %s", (email,))

// Node.js with pg
import { readFileSync } from 'fs';
import pg from 'pg';
const client = new pg.Client();
await client.connect();
await client.query(readFileSync('schema.sql', 'utf-8'));
// Parameterized:
await client.query('SELECT * FROM users WHERE id = $1', [userId]);

Migration Frameworks

SQL files are commonly organized as migrations:

migrations/
  001_create_users.up.sql
  001_create_users.down.sql
  002_add_orders.up.sql
  002_add_orders.down.sql

Tools: Flyway, Liquibase, golang-migrate, Alembic (Python), Knex (JS), ActiveRecord (Ruby).

Validating and Linting

sqlfluff: SQL linter and formatter — sqlfluff lint query.sql --dialect postgres
sqlfmt: Opinionated SQL formatter.
pgFormatter: PostgreSQL-specific formatter.
sqlcheck: Anti-pattern detector.
IDE: DataGrip, DBeaver, Azure Data Studio provide real-time validation.

Security

SQL Injection Prevention: Never concatenate user input into SQL strings.

# WRONG — vulnerable to SQL injection
cur.execute(f"SELECT * FROM users WHERE name = '{name}'")

# RIGHT — parameterized query
cur.execute("SELECT * FROM users WHERE name = %s", (name,))

Common Use Cases

Schema migrations: Version-controlled database schema changes.
Seed data: Initial or test data population scripts.
Stored procedures/functions: Database-side business logic.
Reports and analytics: Complex queries saved for reuse.
Database backups: pg_dump, mysqldump produce SQL files.
ETL pipelines: Data extraction and transformation queries.

Pros & Cons

Pros

Universal language for relational databases — decades of industry adoption.
Declarative — describe what you want, not how to get it.
Powerful for complex queries (joins, subqueries, CTEs, window functions).
Plain text — version-controllable, diffable, reviewable.
Migration frameworks provide orderly schema evolution.

Cons

Dialect fragmentation — SQL is not truly portable across databases.
No built-in parameterization in files — injection risk when misused.
Large migration sets become hard to manage.
Stored procedures create database-vendor lock-in.
No module system or import mechanism — each file is standalone.
Debugging complex queries can be challenging.

Compatibility

Database	CLI Tool	Popular ORM / Query Builder
PostgreSQL	`psql`	SQLAlchemy, Prisma, TypeORM
MySQL	`mysql`	Sequelize, Knex, Hibernate
SQLite	`sqlite3`	Drizzle, Diesel, GORM
SQL Server	`sqlcmd`	Entity Framework, Dapper
Oracle	`sqlplus`	Hibernate, cx_Oracle

MIME type: application/sql. File extension: .sql.

Related Formats

SQLite: Single-file database that executes SQL (binary format).
Prisma Schema: Declarative schema DSL that generates SQL migrations.
Liquibase XML/YAML: Database changelog in XML or YAML format.
GraphQL: Query language for APIs (not databases directly).
PRQL: Pipelined Relational Query Language — compiles to SQL.
EdgeQL: Query language for EdgeDB.

Practical Usage

Always use parameterized queries (prepared statements) when incorporating user input -- never concatenate strings into SQL, as this is the primary vector for SQL injection attacks.
Use a migration framework (Flyway, Alembic, golang-migrate) to manage schema changes as versioned SQL files in source control rather than applying ad-hoc DDL changes.
Use sqlfluff lint --dialect postgres (or your target dialect) to catch syntax errors, style violations, and anti-patterns before executing SQL files.
Write idempotent migration scripts with IF NOT EXISTS and IF EXISTS guards so they can be safely re-run without errors.
Keep seed data separate from schema migrations -- use distinct files for DDL (schema) and DML (data) to make the execution order explicit.
Use CTEs (Common Table Expressions) to break complex queries into readable, named steps rather than deeply nested subqueries.

Anti-Patterns

Concatenating user input directly into SQL strings -- This is the most exploited vulnerability in web applications; always use parameterized queries with placeholders ($1, %s, ?).
Writing SQL that assumes a specific dialect without documenting it -- SQL dialects differ significantly (auto-increment, string concatenation, LIMIT syntax); always document the target database and test portability if multiple databases are supported.
Using SELECT * in production queries and migrations -- This makes queries fragile to schema changes; always enumerate the specific columns you need.
Running DDL changes directly in production without a migration framework -- Ad-hoc schema changes are unreproducible, irreversible, and invisible to version control; always use tracked migrations.
Storing SQL dumps as backups without testing restore -- A SQL dump is only a valid backup if it can be successfully restored; periodically test restoration to a fresh database.

Install this skill directly: skilldb add file-formats-skills

Get CLI access →

SQL