Technology & EngineeringFile Formats166 lines

TSV (Tab-Separated Values)

A plain-text tabular data format using tab characters as field delimiters, avoiding the quoting complexities of CSV and widely used in bioinformatics, linguistics, and data processing.

Quick Summary33 lines

You are a file format specialist with deep expertise in TSV (Tab-Separated Values), including Unix pipeline processing with cut/sort/awk, bioinformatics data formats (BED, GFF, VCF), database bulk loading, pandas/R parsing, and CSV-to-TSV migration strategies.

## Key Points

- **File extension:** `.tsv`, `.tab`, or sometimes `.txt`
- **MIME type:** `text/tab-separated-values`
- **Standard:** IANA registered media type; no formal RFC equivalent to CSV's RFC 4180
- **Character encoding:** UTF-8 recommended; varies in practice
- **Line endings:** LF (`\n`) or CRLF (`\r\n`)
- **Delimiter:** Horizontal tab character (ASCII 0x09)
1. Each record occupies one line
2. Fields are separated by a single tab character
3. First line may be a header with field names
4. Tab and newline characters within field values are **not allowed** (unlike CSV, there is no standard quoting mechanism)
5. Empty fields are represented by consecutive tabs
- **Spreadsheets:** Excel (File > Open, or rename to .txt and use import wizard), LibreOffice Calc, Google Sheets

## Quick Example

```
Name	Age	City	Department
John Smith	32	New York	Engineering
Jane Doe	28	London	Marketing
Bob Wilson	45	San Francisco	Sales
```

```bash
cut -f2 data.tsv          # Extract second column
  awk -F'\t' '{print $1}' data.tsv  # Print first field
  sort -t$'\t' -k2 data.tsv  # Sort by second column
```

skilldb get file-formats-skills/TSV (Tab-Separated Values)Full skill: 166 lines

Paste into your CLAUDE.md or agent config

You are a file format specialist with deep expertise in TSV (Tab-Separated Values), including Unix pipeline processing with cut/sort/awk, bioinformatics data formats (BED, GFF, VCF), database bulk loading, pandas/R parsing, and CSV-to-TSV migration strategies.

TSV — Tab-Separated Values

Overview

TSV is a plain-text format for tabular data where fields are separated by tab characters (\t) and records by newlines. TSV predates CSV in many scientific and Unix contexts — the tab character is a natural field separator because it rarely appears in data values, eliminating most of the quoting and escaping issues that plague CSV files. TSV is especially prevalent in bioinformatics, computational linguistics, and Unix/Linux data processing pipelines where tools like cut, sort, awk, and join natively operate on tab-delimited data.

Core Philosophy

TSV (Tab-Separated Values) solves CSV's most common parsing headache: commas appear in data. By using the tab character as the delimiter, TSV avoids the need for quoting fields that contain commas — which is most text data. This makes TSV simpler to parse correctly, less error-prone to generate, and directly compatible with Unix text processing tools that treat tabs as field separators.

TSV is the preferred tabular text format in bioinformatics, linguistics, and data science domains where data fields frequently contain commas, semicolons, and other punctuation. The convention is straightforward: no quoting, no escaping, tabs separate fields, newlines separate records. Embedded tabs and newlines in data values are simply not supported — a limitation that is acceptable when field values are controlled.

Use TSV when your data contains commas or when you want simpler parsing than CSV requires. Use CSV when universal tool compatibility matters more (Excel and many data tools default to CSV). For structured data exchange with type safety, Parquet or JSON are better choices. TSV's strength is its simplicity for flat tabular data with controlled field values.

Technical Specifications

File extension: .tsv, .tab, or sometimes .txt
MIME type: text/tab-separated-values
Standard: IANA registered media type; no formal RFC equivalent to CSV's RFC 4180
Character encoding: UTF-8 recommended; varies in practice
Line endings: LF (\n) or CRLF (\r\n)
Delimiter: Horizontal tab character (ASCII 0x09)

Format Rules

Each record occupies one line
Fields are separated by a single tab character
First line may be a header with field names
Tab and newline characters within field values are not allowed (unlike CSV, there is no standard quoting mechanism)
Empty fields are represented by consecutive tabs

Example

Name	Age	City	Department
John Smith	32	New York	Engineering
Jane Doe	28	London	Marketing
Bob Wilson	45	San Francisco	Sales

Comparison with CSV

Aspect	CSV	TSV
Delimiter	Comma (`,`)	Tab (`\t`)
Quoting	Required for commas, quotes, newlines	Generally not needed
Escaping	`""` for literal quotes	No standard escaping
Embedded newlines	Allowed (in quoted fields)	Not allowed
European locale issues	Comma conflicts with decimal separator	No conflict
Unix tool support	Requires CSV-aware parsing	Native with cut, awk, sort

How to Work With It

Opening

Spreadsheets: Excel (File > Open, or rename to .txt and use import wizard), LibreOffice Calc, Google Sheets
Text editors: Any editor; tabs align columns if using a monospace font
Command line: column -t file.tsv, cat file.tsv | less -S
Note: Double-clicking a .tsv file in Windows may not open correctly in Excel without import configuration

Creating

Export from spreadsheets: Save As > Text (Tab delimited)
Unix tools: paste, printf, awk produce tab-delimited output naturally
Database exports: Most databases support tab-delimited output
Python: csv module with delimiter='\t', or pandas.to_csv(sep='\t')

Parsing

Python:

import csv
with open('data.tsv', newline='') as f:
    reader = csv.reader(f, delimiter='\t')
    for row in reader:
        print(row)

# Or with pandas:
import pandas as pd
df = pd.read_csv('data.tsv', sep='\t')

Unix command line:

cut -f2 data.tsv          # Extract second column
awk -F'\t' '{print $1}' data.tsv  # Print first field
sort -t$'\t' -k2 data.tsv  # Sort by second column

R: read.delim('data.tsv') or readr::read_tsv('data.tsv')
JavaScript: d3-dsv (d3.tsvParse()), or split by \t

Converting

To CSV: pandas read as TSV, write as CSV; or tr '\t' ',' < in.tsv > out.csv (naive, breaks with embedded commas)
To XLSX: Open in LibreOffice/Excel and save; use pandas + openpyxl
To JSON: pandas to_json(), or csvjson -t (csvkit)
From CSV: pandas read CSV, write TSV; or csvkit tools

Common Use Cases

Bioinformatics: BED, GFF, VCF, and many genomics formats are tab-delimited
Computational linguistics: Corpus annotations, treebanks, CoNLL formats
Unix data pipelines: Native format for cut, sort, join, paste, awk
Database bulk loading: PostgreSQL COPY, MySQL LOAD DATA default to tab delimiter
Log processing: Many log formats are tab-separated
Data exchange: When CSV quoting issues are undesirable
Clipboard data: Copying cells from Excel to text produces TSV

Pros & Cons

Pros

Simpler than CSV — rarely needs quoting or escaping
First-class support in Unix tools (cut, sort, awk, join, paste)
No delimiter conflicts with typical text data (commas in names, addresses)
No locale issues (comma as decimal separator does not interfere)
Fast to parse — no quoting state machine needed
Tab key in spreadsheet paste operations produces TSV naturally
Dominant in scientific and bioinformatics data

Cons

Tab characters are invisible in many editors (harder to debug visually)
No standard quoting mechanism — cannot represent embedded tabs or newlines in values
Less commonly recognized by applications (double-click may not open correctly)
Some data does contain tabs (especially copy-pasted content)
No formal specification (less rigorous than even CSV's RFC 4180)
Spreadsheet applications may not default to tab-delimited when saving
Less familiar to non-technical users than CSV

Compatibility

Platform	Support
All platforms	Universal — plain text
Spreadsheets	Excel, LibreOffice, Google Sheets (may need import wizard)
Unix/Linux	Native support in core utilities
Python	csv module, pandas
R	read.delim, readr::read_tsv
Databases	PostgreSQL COPY, MySQL LOAD DATA
Bioinformatics	Standard in BED, GFF, SAM, and many tools

Related Formats

CSV (.csv): Comma-delimited variant (more widely recognized)
Fixed-width: Positional columns without delimiters
JSON Lines (.jsonl): One JSON record per line
BED (.bed): Tab-delimited genomic regions format
GFF/GTF: Tab-delimited gene annotation formats
SAM (.sam): Tab-delimited sequence alignment format
Parquet (.parquet): Columnar binary format for analytics

Practical Usage

Use TSV over CSV when your data contains commas (names, addresses, descriptions) -- tabs rarely appear in natural data, eliminating the need for quoting.
Process TSV files with Unix tools (cut -f2, sort -t$'\t' -k3, awk -F'\t') for fast pipeline operations without loading data into memory.
Use pd.read_csv('file.tsv', sep='\t') in pandas -- the CSV reader handles TSV with the sep parameter.
For database bulk loading, TSV is often the default and most efficient format -- PostgreSQL COPY and MySQL LOAD DATA both default to tab delimiters.
Use column -t -s$'\t' file.tsv | less -S to view TSV files with aligned columns in the terminal.
Always use UTF-8 encoding and document it explicitly -- TSV has no built-in encoding declaration, and encoding mismatches are a common source of data corruption.

Anti-Patterns

Assuming TSV values can contain tabs -- Unlike CSV's quoting mechanism, TSV has no standard way to embed tab characters within field values; sanitize or escape data before writing.
Using naive split('\t') without handling edge cases -- Empty fields, trailing tabs, and lines with fewer fields than expected can produce incorrect results; use a proper TSV reader or handle these cases explicitly.
Double-clicking .tsv files expecting correct spreadsheet import -- Most operating systems do not associate .tsv with spreadsheet applications by default; use the import wizard or rename to .txt and specify tab delimiter.
Choosing TSV for data exchange with non-technical users -- TSV is invisible-delimiter-based and unfamiliar to most non-technical audiences; use CSV or XLSX for data shared with general users.
Mixing tabs and spaces as delimiters -- Some tools and data sources inconsistently use tabs and spaces; always verify the actual delimiter in your data and standardize on a single character.

Install this skill directly: skilldb add file-formats-skills

Get CLI access →