TSV (Tab-Separated Values)
A plain-text tabular data format using tab characters as field delimiters, avoiding the quoting complexities of CSV and widely used in bioinformatics, linguistics, and data processing.
You are a file format specialist with deep expertise in TSV (Tab-Separated Values), including Unix pipeline processing with cut/sort/awk, bioinformatics data formats (BED, GFF, VCF), database bulk loading, pandas/R parsing, and CSV-to-TSV migration strategies.
## Key Points
- **File extension:** `.tsv`, `.tab`, or sometimes `.txt`
- **MIME type:** `text/tab-separated-values`
- **Standard:** IANA registered media type; no formal RFC equivalent to CSV's RFC 4180
- **Character encoding:** UTF-8 recommended; varies in practice
- **Line endings:** LF (`\n`) or CRLF (`\r\n`)
- **Delimiter:** Horizontal tab character (ASCII 0x09)
1. Each record occupies one line
2. Fields are separated by a single tab character
3. First line may be a header with field names
4. Tab and newline characters within field values are **not allowed** (unlike CSV, there is no standard quoting mechanism)
5. Empty fields are represented by consecutive tabs
- **Spreadsheets:** Excel (File > Open, or rename to .txt and use import wizard), LibreOffice Calc, Google Sheets
## Quick Example
```
Name Age City Department
John Smith 32 New York Engineering
Jane Doe 28 London Marketing
Bob Wilson 45 San Francisco Sales
```
```bash
cut -f2 data.tsv # Extract second column
awk -F'\t' '{print $1}' data.tsv # Print first field
sort -t$'\t' -k2 data.tsv # Sort by second column
```skilldb get file-formats-skills/TSV (Tab-Separated Values)Full skill: 166 linesYou are a file format specialist with deep expertise in TSV (Tab-Separated Values), including Unix pipeline processing with cut/sort/awk, bioinformatics data formats (BED, GFF, VCF), database bulk loading, pandas/R parsing, and CSV-to-TSV migration strategies.
TSV — Tab-Separated Values
Overview
TSV is a plain-text format for tabular data where fields are separated by tab characters (\t) and records by newlines. TSV predates CSV in many scientific and Unix contexts — the tab character is a natural field separator because it rarely appears in data values, eliminating most of the quoting and escaping issues that plague CSV files. TSV is especially prevalent in bioinformatics, computational linguistics, and Unix/Linux data processing pipelines where tools like cut, sort, awk, and join natively operate on tab-delimited data.
Core Philosophy
TSV (Tab-Separated Values) solves CSV's most common parsing headache: commas appear in data. By using the tab character as the delimiter, TSV avoids the need for quoting fields that contain commas — which is most text data. This makes TSV simpler to parse correctly, less error-prone to generate, and directly compatible with Unix text processing tools that treat tabs as field separators.
TSV is the preferred tabular text format in bioinformatics, linguistics, and data science domains where data fields frequently contain commas, semicolons, and other punctuation. The convention is straightforward: no quoting, no escaping, tabs separate fields, newlines separate records. Embedded tabs and newlines in data values are simply not supported — a limitation that is acceptable when field values are controlled.
Use TSV when your data contains commas or when you want simpler parsing than CSV requires. Use CSV when universal tool compatibility matters more (Excel and many data tools default to CSV). For structured data exchange with type safety, Parquet or JSON are better choices. TSV's strength is its simplicity for flat tabular data with controlled field values.
Technical Specifications
- File extension:
.tsv,.tab, or sometimes.txt - MIME type:
text/tab-separated-values - Standard: IANA registered media type; no formal RFC equivalent to CSV's RFC 4180
- Character encoding: UTF-8 recommended; varies in practice
- Line endings: LF (
\n) or CRLF (\r\n) - Delimiter: Horizontal tab character (ASCII 0x09)
Format Rules
- Each record occupies one line
- Fields are separated by a single tab character
- First line may be a header with field names
- Tab and newline characters within field values are not allowed (unlike CSV, there is no standard quoting mechanism)
- Empty fields are represented by consecutive tabs
Example
Name Age City Department
John Smith 32 New York Engineering
Jane Doe 28 London Marketing
Bob Wilson 45 San Francisco Sales
Comparison with CSV
| Aspect | CSV | TSV |
|---|---|---|
| Delimiter | Comma (,) | Tab (\t) |
| Quoting | Required for commas, quotes, newlines | Generally not needed |
| Escaping | "" for literal quotes | No standard escaping |
| Embedded newlines | Allowed (in quoted fields) | Not allowed |
| European locale issues | Comma conflicts with decimal separator | No conflict |
| Unix tool support | Requires CSV-aware parsing | Native with cut, awk, sort |
How to Work With It
Opening
- Spreadsheets: Excel (File > Open, or rename to .txt and use import wizard), LibreOffice Calc, Google Sheets
- Text editors: Any editor; tabs align columns if using a monospace font
- Command line:
column -t file.tsv,cat file.tsv | less -S - Note: Double-clicking a .tsv file in Windows may not open correctly in Excel without import configuration
Creating
- Export from spreadsheets: Save As > Text (Tab delimited)
- Unix tools:
paste,printf,awkproduce tab-delimited output naturally - Database exports: Most databases support tab-delimited output
- Python:
csvmodule withdelimiter='\t', orpandas.to_csv(sep='\t')
Parsing
- Python:
import csv with open('data.tsv', newline='') as f: reader = csv.reader(f, delimiter='\t') for row in reader: print(row) # Or with pandas: import pandas as pd df = pd.read_csv('data.tsv', sep='\t') - Unix command line:
cut -f2 data.tsv # Extract second column awk -F'\t' '{print $1}' data.tsv # Print first field sort -t$'\t' -k2 data.tsv # Sort by second column - R:
read.delim('data.tsv')orreadr::read_tsv('data.tsv') - JavaScript:
d3-dsv(d3.tsvParse()), or split by\t
Converting
- To CSV:
pandasread as TSV, write as CSV; ortr '\t' ',' < in.tsv > out.csv(naive, breaks with embedded commas) - To XLSX: Open in LibreOffice/Excel and save; use
pandas+openpyxl - To JSON:
pandasto_json(), orcsvjson -t(csvkit) - From CSV:
pandasread CSV, write TSV; or csvkit tools
Common Use Cases
- Bioinformatics: BED, GFF, VCF, and many genomics formats are tab-delimited
- Computational linguistics: Corpus annotations, treebanks, CoNLL formats
- Unix data pipelines: Native format for cut, sort, join, paste, awk
- Database bulk loading: PostgreSQL COPY, MySQL LOAD DATA default to tab delimiter
- Log processing: Many log formats are tab-separated
- Data exchange: When CSV quoting issues are undesirable
- Clipboard data: Copying cells from Excel to text produces TSV
Pros & Cons
Pros
- Simpler than CSV — rarely needs quoting or escaping
- First-class support in Unix tools (cut, sort, awk, join, paste)
- No delimiter conflicts with typical text data (commas in names, addresses)
- No locale issues (comma as decimal separator does not interfere)
- Fast to parse — no quoting state machine needed
- Tab key in spreadsheet paste operations produces TSV naturally
- Dominant in scientific and bioinformatics data
Cons
- Tab characters are invisible in many editors (harder to debug visually)
- No standard quoting mechanism — cannot represent embedded tabs or newlines in values
- Less commonly recognized by applications (double-click may not open correctly)
- Some data does contain tabs (especially copy-pasted content)
- No formal specification (less rigorous than even CSV's RFC 4180)
- Spreadsheet applications may not default to tab-delimited when saving
- Less familiar to non-technical users than CSV
Compatibility
| Platform | Support |
|---|---|
| All platforms | Universal — plain text |
| Spreadsheets | Excel, LibreOffice, Google Sheets (may need import wizard) |
| Unix/Linux | Native support in core utilities |
| Python | csv module, pandas |
| R | read.delim, readr::read_tsv |
| Databases | PostgreSQL COPY, MySQL LOAD DATA |
| Bioinformatics | Standard in BED, GFF, SAM, and many tools |
Related Formats
- CSV (.csv): Comma-delimited variant (more widely recognized)
- Fixed-width: Positional columns without delimiters
- JSON Lines (.jsonl): One JSON record per line
- BED (.bed): Tab-delimited genomic regions format
- GFF/GTF: Tab-delimited gene annotation formats
- SAM (.sam): Tab-delimited sequence alignment format
- Parquet (.parquet): Columnar binary format for analytics
Practical Usage
- Use TSV over CSV when your data contains commas (names, addresses, descriptions) -- tabs rarely appear in natural data, eliminating the need for quoting.
- Process TSV files with Unix tools (
cut -f2,sort -t$'\t' -k3,awk -F'\t') for fast pipeline operations without loading data into memory. - Use
pd.read_csv('file.tsv', sep='\t')in pandas -- the CSV reader handles TSV with thesepparameter. - For database bulk loading, TSV is often the default and most efficient format -- PostgreSQL
COPYand MySQLLOAD DATAboth default to tab delimiters. - Use
column -t -s$'\t' file.tsv | less -Sto view TSV files with aligned columns in the terminal. - Always use UTF-8 encoding and document it explicitly -- TSV has no built-in encoding declaration, and encoding mismatches are a common source of data corruption.
Anti-Patterns
- Assuming TSV values can contain tabs -- Unlike CSV's quoting mechanism, TSV has no standard way to embed tab characters within field values; sanitize or escape data before writing.
- Using naive
split('\t')without handling edge cases -- Empty fields, trailing tabs, and lines with fewer fields than expected can produce incorrect results; use a proper TSV reader or handle these cases explicitly. - Double-clicking .tsv files expecting correct spreadsheet import -- Most operating systems do not associate .tsv with spreadsheet applications by default; use the import wizard or rename to .txt and specify tab delimiter.
- Choosing TSV for data exchange with non-technical users -- TSV is invisible-delimiter-based and unfamiliar to most non-technical audiences; use CSV or XLSX for data shared with general users.
- Mixing tabs and spaces as delimiters -- Some tools and data sources inconsistently use tabs and spaces; always verify the actual delimiter in your data and standardize on a single character.
Install this skill directly: skilldb add file-formats-skills
Related Skills
3MF 3D Manufacturing Format
The 3MF file format — the modern replacement for STL in 3D printing, supporting colors, materials, multi-object assemblies, and precise manufacturing data in a single package.
7-Zip Compressed Archive
The 7z archive format — open-source high-ratio compression using LZMA2, with strong AES-256 encryption, solid archives, and multi-threading support.
AAC (Advanced Audio Coding)
A lossy audio codec standardized as part of MPEG-2 and MPEG-4, designed to supersede MP3 with better quality at equivalent or lower bitrates.
AC3 (Dolby Digital)
Dolby's surround sound audio codec used in cinema, DVD, Blu-ray, and broadcast television for multichannel 5.1 audio delivery.
AI Adobe Illustrator Format
AI is Adobe Illustrator's native vector graphics file format, used for
AIFF (Audio Interchange File Format)
Apple's uncompressed audio format storing raw PCM data, serving as the Mac equivalent of WAV for professional audio production.