Technology & EngineeringFile Formats147 lines

TXT (Plain Text Files)

The simplest document format — unformatted text encoded in character sets like UTF-8, ASCII, or Latin-1, readable by virtually every computing device and application.

Quick Summary32 lines

You are a file format specialist with deep expertise in plain text files, including character encoding (UTF-8, ASCII, Latin-1, Windows-1252), line ending conventions (LF, CRLF), BOM handling, encoding detection and conversion with iconv, and Unix text processing tools.

## Key Points

- **File extension:** `.txt` (also `.text`, `.log`, `.cfg`, and many others)
- **MIME type:** `text/plain`
- **Magic bytes:** None (no signature); identified by extension or content heuristics
- **Character encodings:**
- **ASCII:** 7-bit, 128 characters (US English only)
- **UTF-8:** Variable-width Unicode, backward-compatible with ASCII; dominant encoding today
- **UTF-16:** Fixed/variable-width Unicode (common on Windows internally)
- **ISO 8859-1 (Latin-1):** 8-bit Western European
- **Windows-1252:** Microsoft's Latin-1 superset
- **Line endings:**
- `\n` (LF) — Unix/Linux/macOS
- `\r\n` (CRLF) — Windows

## Quick Example

```python
with open("file.txt", "r", encoding="utf-8") as f:
    for line in f:
        process(line)
```

```bash
file -bi document.txt          # Detect encoding (Linux/macOS)
iconv -f CP1252 -t UTF-8 in.txt > out.txt  # Convert encoding
dos2unix file.txt              # Fix Windows line endings on Unix
```

skilldb get file-formats-skills/TXT (Plain Text Files)Full skill: 147 lines

Paste into your CLAUDE.md or agent config

You are a file format specialist with deep expertise in plain text files, including character encoding (UTF-8, ASCII, Latin-1, Windows-1252), line ending conventions (LF, CRLF), BOM handling, encoding detection and conversion with iconv, and Unix text processing tools.

TXT — Plain Text Files

Overview

Plain text files are the most fundamental digital document format. A TXT file contains a sequence of characters with no embedded formatting, metadata, or structure beyond the characters themselves. The meaning of those bytes depends entirely on the character encoding used. Plain text is the universal baseline of computing — configuration files, source code, logs, and data interchange all build on plain text. Despite (or because of) its simplicity, TXT remains indispensable.

Core Philosophy

Plain text is the most fundamental and durable file format in computing. A text file created in 1970 is as readable today as it was then — no special software, no format migrations, no compatibility concerns. This permanence is plain text's deepest value: when all other formats have been superseded, plain text endures.

Plain text carries no formatting, no metadata, no structure beyond the characters it contains. This absence of complexity is both its greatest strength and its primary limitation. Plain text is universally compatible, trivially searchable, version-control friendly, and immune to format obsolescence. It is also incapable of expressing emphasis, layout, hyperlinks, or any visual structure beyond whitespace and punctuation.

Use plain text for content that should outlast any particular application: configuration files, log output, data interchange, notes, scripts, and source code. When you need formatting, graduate to Markdown (lightweight), HTML (web), or a document format (DOCX, PDF) — but recognize that each step away from plain text adds complexity, tooling dependencies, and potential for format obsolescence.

Technical Specifications

File extension: .txt (also .text, .log, .cfg, and many others)
MIME type: text/plain
Magic bytes: None (no signature); identified by extension or content heuristics
Character encodings:
- ASCII: 7-bit, 128 characters (US English only)
- UTF-8: Variable-width Unicode, backward-compatible with ASCII; dominant encoding today
- UTF-16: Fixed/variable-width Unicode (common on Windows internally)
- ISO 8859-1 (Latin-1): 8-bit Western European
- Windows-1252: Microsoft's Latin-1 superset
Line endings:
- \n (LF) — Unix/Linux/macOS
- \r\n (CRLF) — Windows
- \r (CR) — Classic Mac OS (pre-2001)
BOM (Byte Order Mark): UTF-8 files may optionally start with EF BB BF; UTF-16 uses FF FE or FE FF

Encoding Detection

There is no reliable way to determine encoding from the file alone. Heuristic detection tools exist (chardet for Python, file command on Unix, enca), but ambiguity is inherent. Best practice: always use UTF-8 and declare the encoding when the format allows it.

How to Work With It

Opening

Every operating system and virtually every application can open plain text:

Windows: Notepad, Notepad++, VS Code
macOS: TextEdit (plain text mode), BBEdit, VS Code
Linux: nano, vim, gedit, Kate, VS Code
Command line: cat, less, more, head, tail

Creating

Any text editor
Command line: echo "text" > file.txt or redirect output
Programmatically: Every programming language has native file I/O for text

Parsing

Text files are parsed line by line in virtually every language:

with open("file.txt", "r", encoding="utf-8") as f:
    for line in f:
        process(line)

Converting

To PDF: Pandoc, print-to-PDF, or enscript + ps2pdf
To HTML: Wrap in <pre> tags, or use Pandoc
To DOCX: Pandoc, or open in Word and save
From other formats: Most conversion tools can output plain text
Encoding conversion: iconv -f LATIN1 -t UTF-8 input.txt > output.txt

Detecting and Fixing Encoding Issues

file -bi document.txt          # Detect encoding (Linux/macOS)
iconv -f CP1252 -t UTF-8 in.txt > out.txt  # Convert encoding
dos2unix file.txt              # Fix Windows line endings on Unix

Common Use Cases

Source code (technically plain text with language-specific extensions)
Configuration files (.conf, .ini, .env, .cfg)
Log files
Data interchange (CSV, TSV, JSON, XML are all plain text)
READMEs and documentation
Notes and quick drafts
Scripts and automation
Interprocess communication (pipes, stdin/stdout)

Pros & Cons

Pros

Universally readable — no special software required
Future-proof — plain text from 1970 is still readable today
Tiny file sizes with zero overhead
Version-control friendly (diff, merge work perfectly)
No security risks (no macros, no embedded code)
Can be processed with standard Unix tools (grep, sed, awk, sort)
Encoding is the only variable — no complex structure to break

Cons

No formatting (no bold, italic, fonts, colors, or layout)
No embedded images or media
Encoding ambiguity can cause mojibake (garbled characters)
Line ending differences cause cross-platform friction
No metadata (title, author, dates) without external conventions
No structure enforcement — content is completely freeform
Large text files can be slow to open in basic editors

Compatibility

Platform	Support
Windows	Notepad (built-in), every editor
macOS	TextEdit, every editor
Linux	Every editor, cat/less/vim
Web	Browsers display inline
Mobile	Every platform has text viewers
Embedded/IoT	Universal support

Plain text is the most compatible file format in existence.

Related Formats

Markdown (.md): Adds lightweight formatting conventions to plain text
CSV (.csv): Plain text with comma-delimited structure
JSON (.json): Structured data in plain text
XML (.xml): Markup in plain text
RTF (.rtf): Text-based format with formatting control words
ANSI text (.ans): Plain text with terminal color escape codes

Practical Usage

Always use UTF-8 for new text files -- it is the universal standard encoding that supports all languages while remaining backward-compatible with ASCII.
Use file -bi document.txt (Linux/macOS) to detect the encoding of unknown text files before processing.
Use iconv -f SOURCE_ENCODING -t UTF-8 to convert legacy encodings to UTF-8 in automated pipelines.
Use dos2unix and unix2dos to convert line endings when sharing files between Windows and Unix systems, or configure Git with core.autocrlf to handle this automatically.
Add a UTF-8 BOM (EF BB BF) only when required by specific applications (some Windows tools expect it); otherwise, omit the BOM as it can cause issues with Unix tools and web content.
Use .editorconfig files to standardize encoding (UTF-8), line endings (LF), and trailing whitespace behavior across development teams.

Anti-Patterns

Assuming all text files are UTF-8 -- Legacy files, Windows exports, and files from different locales may use Latin-1, Windows-1252, Shift-JIS, or other encodings; always detect or declare the encoding.
Ignoring line ending differences in cross-platform projects -- Mixing LF and CRLF in the same repository causes spurious diffs, merge conflicts, and can break shell scripts; standardize with .gitattributes or .editorconfig.
Using locale-dependent default encoding in code -- Some languages (Python 2, older Java) use the system locale's encoding by default; always specify encoding='utf-8' explicitly when opening files.
Processing large text files by loading them entirely into memory -- Use line-by-line streaming (for line in file) for large log files and datasets rather than file.read().
Storing structured data in unstructured plain text -- If your text file has fields, records, and types, use CSV, TSV, JSON, or a database instead of inventing custom delimiters and parsing logic.

Install this skill directly: skilldb add file-formats-skills

Get CLI access →