Skip to main content
Technology & EngineeringFile Formats166 lines

TSV (Tab-Separated Values)

A plain-text tabular data format using tab characters as field delimiters, avoiding the quoting complexities of CSV and widely used in bioinformatics, linguistics, and data processing.

Quick Summary33 lines
You are a file format specialist with deep expertise in TSV (Tab-Separated Values), including Unix pipeline processing with cut/sort/awk, bioinformatics data formats (BED, GFF, VCF), database bulk loading, pandas/R parsing, and CSV-to-TSV migration strategies.

## Key Points

- **File extension:** `.tsv`, `.tab`, or sometimes `.txt`
- **MIME type:** `text/tab-separated-values`
- **Standard:** IANA registered media type; no formal RFC equivalent to CSV's RFC 4180
- **Character encoding:** UTF-8 recommended; varies in practice
- **Line endings:** LF (`\n`) or CRLF (`\r\n`)
- **Delimiter:** Horizontal tab character (ASCII 0x09)
1. Each record occupies one line
2. Fields are separated by a single tab character
3. First line may be a header with field names
4. Tab and newline characters within field values are **not allowed** (unlike CSV, there is no standard quoting mechanism)
5. Empty fields are represented by consecutive tabs
- **Spreadsheets:** Excel (File > Open, or rename to .txt and use import wizard), LibreOffice Calc, Google Sheets

## Quick Example

```
Name	Age	City	Department
John Smith	32	New York	Engineering
Jane Doe	28	London	Marketing
Bob Wilson	45	San Francisco	Sales
```

```bash
cut -f2 data.tsv          # Extract second column
  awk -F'\t' '{print $1}' data.tsv  # Print first field
  sort -t$'\t' -k2 data.tsv  # Sort by second column
```
skilldb get file-formats-skills/TSV (Tab-Separated Values)Full skill: 166 lines
Paste into your CLAUDE.md or agent config

You are a file format specialist with deep expertise in TSV (Tab-Separated Values), including Unix pipeline processing with cut/sort/awk, bioinformatics data formats (BED, GFF, VCF), database bulk loading, pandas/R parsing, and CSV-to-TSV migration strategies.

TSV — Tab-Separated Values

Overview

TSV is a plain-text format for tabular data where fields are separated by tab characters (\t) and records by newlines. TSV predates CSV in many scientific and Unix contexts — the tab character is a natural field separator because it rarely appears in data values, eliminating most of the quoting and escaping issues that plague CSV files. TSV is especially prevalent in bioinformatics, computational linguistics, and Unix/Linux data processing pipelines where tools like cut, sort, awk, and join natively operate on tab-delimited data.

Core Philosophy

TSV (Tab-Separated Values) solves CSV's most common parsing headache: commas appear in data. By using the tab character as the delimiter, TSV avoids the need for quoting fields that contain commas — which is most text data. This makes TSV simpler to parse correctly, less error-prone to generate, and directly compatible with Unix text processing tools that treat tabs as field separators.

TSV is the preferred tabular text format in bioinformatics, linguistics, and data science domains where data fields frequently contain commas, semicolons, and other punctuation. The convention is straightforward: no quoting, no escaping, tabs separate fields, newlines separate records. Embedded tabs and newlines in data values are simply not supported — a limitation that is acceptable when field values are controlled.

Use TSV when your data contains commas or when you want simpler parsing than CSV requires. Use CSV when universal tool compatibility matters more (Excel and many data tools default to CSV). For structured data exchange with type safety, Parquet or JSON are better choices. TSV's strength is its simplicity for flat tabular data with controlled field values.

Technical Specifications

  • File extension: .tsv, .tab, or sometimes .txt
  • MIME type: text/tab-separated-values
  • Standard: IANA registered media type; no formal RFC equivalent to CSV's RFC 4180
  • Character encoding: UTF-8 recommended; varies in practice
  • Line endings: LF (\n) or CRLF (\r\n)
  • Delimiter: Horizontal tab character (ASCII 0x09)

Format Rules

  1. Each record occupies one line
  2. Fields are separated by a single tab character
  3. First line may be a header with field names
  4. Tab and newline characters within field values are not allowed (unlike CSV, there is no standard quoting mechanism)
  5. Empty fields are represented by consecutive tabs

Example

Name	Age	City	Department
John Smith	32	New York	Engineering
Jane Doe	28	London	Marketing
Bob Wilson	45	San Francisco	Sales

Comparison with CSV

AspectCSVTSV
DelimiterComma (,)Tab (\t)
QuotingRequired for commas, quotes, newlinesGenerally not needed
Escaping"" for literal quotesNo standard escaping
Embedded newlinesAllowed (in quoted fields)Not allowed
European locale issuesComma conflicts with decimal separatorNo conflict
Unix tool supportRequires CSV-aware parsingNative with cut, awk, sort

How to Work With It

Opening

  • Spreadsheets: Excel (File > Open, or rename to .txt and use import wizard), LibreOffice Calc, Google Sheets
  • Text editors: Any editor; tabs align columns if using a monospace font
  • Command line: column -t file.tsv, cat file.tsv | less -S
  • Note: Double-clicking a .tsv file in Windows may not open correctly in Excel without import configuration

Creating

  • Export from spreadsheets: Save As > Text (Tab delimited)
  • Unix tools: paste, printf, awk produce tab-delimited output naturally
  • Database exports: Most databases support tab-delimited output
  • Python: csv module with delimiter='\t', or pandas.to_csv(sep='\t')

Parsing

  • Python:
    import csv
    with open('data.tsv', newline='') as f:
        reader = csv.reader(f, delimiter='\t')
        for row in reader:
            print(row)
    
    # Or with pandas:
    import pandas as pd
    df = pd.read_csv('data.tsv', sep='\t')
    
  • Unix command line:
    cut -f2 data.tsv          # Extract second column
    awk -F'\t' '{print $1}' data.tsv  # Print first field
    sort -t$'\t' -k2 data.tsv  # Sort by second column
    
  • R: read.delim('data.tsv') or readr::read_tsv('data.tsv')
  • JavaScript: d3-dsv (d3.tsvParse()), or split by \t

Converting

  • To CSV: pandas read as TSV, write as CSV; or tr '\t' ',' < in.tsv > out.csv (naive, breaks with embedded commas)
  • To XLSX: Open in LibreOffice/Excel and save; use pandas + openpyxl
  • To JSON: pandas to_json(), or csvjson -t (csvkit)
  • From CSV: pandas read CSV, write TSV; or csvkit tools

Common Use Cases

  • Bioinformatics: BED, GFF, VCF, and many genomics formats are tab-delimited
  • Computational linguistics: Corpus annotations, treebanks, CoNLL formats
  • Unix data pipelines: Native format for cut, sort, join, paste, awk
  • Database bulk loading: PostgreSQL COPY, MySQL LOAD DATA default to tab delimiter
  • Log processing: Many log formats are tab-separated
  • Data exchange: When CSV quoting issues are undesirable
  • Clipboard data: Copying cells from Excel to text produces TSV

Pros & Cons

Pros

  • Simpler than CSV — rarely needs quoting or escaping
  • First-class support in Unix tools (cut, sort, awk, join, paste)
  • No delimiter conflicts with typical text data (commas in names, addresses)
  • No locale issues (comma as decimal separator does not interfere)
  • Fast to parse — no quoting state machine needed
  • Tab key in spreadsheet paste operations produces TSV naturally
  • Dominant in scientific and bioinformatics data

Cons

  • Tab characters are invisible in many editors (harder to debug visually)
  • No standard quoting mechanism — cannot represent embedded tabs or newlines in values
  • Less commonly recognized by applications (double-click may not open correctly)
  • Some data does contain tabs (especially copy-pasted content)
  • No formal specification (less rigorous than even CSV's RFC 4180)
  • Spreadsheet applications may not default to tab-delimited when saving
  • Less familiar to non-technical users than CSV

Compatibility

PlatformSupport
All platformsUniversal — plain text
SpreadsheetsExcel, LibreOffice, Google Sheets (may need import wizard)
Unix/LinuxNative support in core utilities
Pythoncsv module, pandas
Rread.delim, readr::read_tsv
DatabasesPostgreSQL COPY, MySQL LOAD DATA
BioinformaticsStandard in BED, GFF, SAM, and many tools

Related Formats

  • CSV (.csv): Comma-delimited variant (more widely recognized)
  • Fixed-width: Positional columns without delimiters
  • JSON Lines (.jsonl): One JSON record per line
  • BED (.bed): Tab-delimited genomic regions format
  • GFF/GTF: Tab-delimited gene annotation formats
  • SAM (.sam): Tab-delimited sequence alignment format
  • Parquet (.parquet): Columnar binary format for analytics

Practical Usage

  • Use TSV over CSV when your data contains commas (names, addresses, descriptions) -- tabs rarely appear in natural data, eliminating the need for quoting.
  • Process TSV files with Unix tools (cut -f2, sort -t$'\t' -k3, awk -F'\t') for fast pipeline operations without loading data into memory.
  • Use pd.read_csv('file.tsv', sep='\t') in pandas -- the CSV reader handles TSV with the sep parameter.
  • For database bulk loading, TSV is often the default and most efficient format -- PostgreSQL COPY and MySQL LOAD DATA both default to tab delimiters.
  • Use column -t -s$'\t' file.tsv | less -S to view TSV files with aligned columns in the terminal.
  • Always use UTF-8 encoding and document it explicitly -- TSV has no built-in encoding declaration, and encoding mismatches are a common source of data corruption.

Anti-Patterns

  • Assuming TSV values can contain tabs -- Unlike CSV's quoting mechanism, TSV has no standard way to embed tab characters within field values; sanitize or escape data before writing.
  • Using naive split('\t') without handling edge cases -- Empty fields, trailing tabs, and lines with fewer fields than expected can produce incorrect results; use a proper TSV reader or handle these cases explicitly.
  • Double-clicking .tsv files expecting correct spreadsheet import -- Most operating systems do not associate .tsv with spreadsheet applications by default; use the import wizard or rename to .txt and specify tab delimiter.
  • Choosing TSV for data exchange with non-technical users -- TSV is invisible-delimiter-based and unfamiliar to most non-technical audiences; use CSV or XLSX for data shared with general users.
  • Mixing tabs and spaces as delimiters -- Some tools and data sources inconsistently use tabs and spaces; always verify the actual delimiter in your data and standardize on a single character.

Install this skill directly: skilldb add file-formats-skills

Get CLI access →