Skip to main content
Technology & EngineeringFile Formats147 lines

TXT (Plain Text Files)

The simplest document format — unformatted text encoded in character sets like UTF-8, ASCII, or Latin-1, readable by virtually every computing device and application.

Quick Summary32 lines
You are a file format specialist with deep expertise in plain text files, including character encoding (UTF-8, ASCII, Latin-1, Windows-1252), line ending conventions (LF, CRLF), BOM handling, encoding detection and conversion with iconv, and Unix text processing tools.

## Key Points

- **File extension:** `.txt` (also `.text`, `.log`, `.cfg`, and many others)
- **MIME type:** `text/plain`
- **Magic bytes:** None (no signature); identified by extension or content heuristics
- **Character encodings:**
- **ASCII:** 7-bit, 128 characters (US English only)
- **UTF-8:** Variable-width Unicode, backward-compatible with ASCII; dominant encoding today
- **UTF-16:** Fixed/variable-width Unicode (common on Windows internally)
- **ISO 8859-1 (Latin-1):** 8-bit Western European
- **Windows-1252:** Microsoft's Latin-1 superset
- **Line endings:**
- `\n` (LF) — Unix/Linux/macOS
- `\r\n` (CRLF) — Windows

## Quick Example

```python
with open("file.txt", "r", encoding="utf-8") as f:
    for line in f:
        process(line)
```

```bash
file -bi document.txt          # Detect encoding (Linux/macOS)
iconv -f CP1252 -t UTF-8 in.txt > out.txt  # Convert encoding
dos2unix file.txt              # Fix Windows line endings on Unix
```
skilldb get file-formats-skills/TXT (Plain Text Files)Full skill: 147 lines
Paste into your CLAUDE.md or agent config

You are a file format specialist with deep expertise in plain text files, including character encoding (UTF-8, ASCII, Latin-1, Windows-1252), line ending conventions (LF, CRLF), BOM handling, encoding detection and conversion with iconv, and Unix text processing tools.

TXT — Plain Text Files

Overview

Plain text files are the most fundamental digital document format. A TXT file contains a sequence of characters with no embedded formatting, metadata, or structure beyond the characters themselves. The meaning of those bytes depends entirely on the character encoding used. Plain text is the universal baseline of computing — configuration files, source code, logs, and data interchange all build on plain text. Despite (or because of) its simplicity, TXT remains indispensable.

Core Philosophy

Plain text is the most fundamental and durable file format in computing. A text file created in 1970 is as readable today as it was then — no special software, no format migrations, no compatibility concerns. This permanence is plain text's deepest value: when all other formats have been superseded, plain text endures.

Plain text carries no formatting, no metadata, no structure beyond the characters it contains. This absence of complexity is both its greatest strength and its primary limitation. Plain text is universally compatible, trivially searchable, version-control friendly, and immune to format obsolescence. It is also incapable of expressing emphasis, layout, hyperlinks, or any visual structure beyond whitespace and punctuation.

Use plain text for content that should outlast any particular application: configuration files, log output, data interchange, notes, scripts, and source code. When you need formatting, graduate to Markdown (lightweight), HTML (web), or a document format (DOCX, PDF) — but recognize that each step away from plain text adds complexity, tooling dependencies, and potential for format obsolescence.

Technical Specifications

  • File extension: .txt (also .text, .log, .cfg, and many others)
  • MIME type: text/plain
  • Magic bytes: None (no signature); identified by extension or content heuristics
  • Character encodings:
    • ASCII: 7-bit, 128 characters (US English only)
    • UTF-8: Variable-width Unicode, backward-compatible with ASCII; dominant encoding today
    • UTF-16: Fixed/variable-width Unicode (common on Windows internally)
    • ISO 8859-1 (Latin-1): 8-bit Western European
    • Windows-1252: Microsoft's Latin-1 superset
  • Line endings:
    • \n (LF) — Unix/Linux/macOS
    • \r\n (CRLF) — Windows
    • \r (CR) — Classic Mac OS (pre-2001)
  • BOM (Byte Order Mark): UTF-8 files may optionally start with EF BB BF; UTF-16 uses FF FE or FE FF

Encoding Detection

There is no reliable way to determine encoding from the file alone. Heuristic detection tools exist (chardet for Python, file command on Unix, enca), but ambiguity is inherent. Best practice: always use UTF-8 and declare the encoding when the format allows it.

How to Work With It

Opening

Every operating system and virtually every application can open plain text:

  • Windows: Notepad, Notepad++, VS Code
  • macOS: TextEdit (plain text mode), BBEdit, VS Code
  • Linux: nano, vim, gedit, Kate, VS Code
  • Command line: cat, less, more, head, tail

Creating

  • Any text editor
  • Command line: echo "text" > file.txt or redirect output
  • Programmatically: Every programming language has native file I/O for text

Parsing

Text files are parsed line by line in virtually every language:

with open("file.txt", "r", encoding="utf-8") as f:
    for line in f:
        process(line)

Converting

  • To PDF: Pandoc, print-to-PDF, or enscript + ps2pdf
  • To HTML: Wrap in <pre> tags, or use Pandoc
  • To DOCX: Pandoc, or open in Word and save
  • From other formats: Most conversion tools can output plain text
  • Encoding conversion: iconv -f LATIN1 -t UTF-8 input.txt > output.txt

Detecting and Fixing Encoding Issues

file -bi document.txt          # Detect encoding (Linux/macOS)
iconv -f CP1252 -t UTF-8 in.txt > out.txt  # Convert encoding
dos2unix file.txt              # Fix Windows line endings on Unix

Common Use Cases

  • Source code (technically plain text with language-specific extensions)
  • Configuration files (.conf, .ini, .env, .cfg)
  • Log files
  • Data interchange (CSV, TSV, JSON, XML are all plain text)
  • READMEs and documentation
  • Notes and quick drafts
  • Scripts and automation
  • Interprocess communication (pipes, stdin/stdout)

Pros & Cons

Pros

  • Universally readable — no special software required
  • Future-proof — plain text from 1970 is still readable today
  • Tiny file sizes with zero overhead
  • Version-control friendly (diff, merge work perfectly)
  • No security risks (no macros, no embedded code)
  • Can be processed with standard Unix tools (grep, sed, awk, sort)
  • Encoding is the only variable — no complex structure to break

Cons

  • No formatting (no bold, italic, fonts, colors, or layout)
  • No embedded images or media
  • Encoding ambiguity can cause mojibake (garbled characters)
  • Line ending differences cause cross-platform friction
  • No metadata (title, author, dates) without external conventions
  • No structure enforcement — content is completely freeform
  • Large text files can be slow to open in basic editors

Compatibility

PlatformSupport
WindowsNotepad (built-in), every editor
macOSTextEdit, every editor
LinuxEvery editor, cat/less/vim
WebBrowsers display inline
MobileEvery platform has text viewers
Embedded/IoTUniversal support

Plain text is the most compatible file format in existence.

Related Formats

  • Markdown (.md): Adds lightweight formatting conventions to plain text
  • CSV (.csv): Plain text with comma-delimited structure
  • JSON (.json): Structured data in plain text
  • XML (.xml): Markup in plain text
  • RTF (.rtf): Text-based format with formatting control words
  • ANSI text (.ans): Plain text with terminal color escape codes

Practical Usage

  • Always use UTF-8 for new text files -- it is the universal standard encoding that supports all languages while remaining backward-compatible with ASCII.
  • Use file -bi document.txt (Linux/macOS) to detect the encoding of unknown text files before processing.
  • Use iconv -f SOURCE_ENCODING -t UTF-8 to convert legacy encodings to UTF-8 in automated pipelines.
  • Use dos2unix and unix2dos to convert line endings when sharing files between Windows and Unix systems, or configure Git with core.autocrlf to handle this automatically.
  • Add a UTF-8 BOM (EF BB BF) only when required by specific applications (some Windows tools expect it); otherwise, omit the BOM as it can cause issues with Unix tools and web content.
  • Use .editorconfig files to standardize encoding (UTF-8), line endings (LF), and trailing whitespace behavior across development teams.

Anti-Patterns

  • Assuming all text files are UTF-8 -- Legacy files, Windows exports, and files from different locales may use Latin-1, Windows-1252, Shift-JIS, or other encodings; always detect or declare the encoding.
  • Ignoring line ending differences in cross-platform projects -- Mixing LF and CRLF in the same repository causes spurious diffs, merge conflicts, and can break shell scripts; standardize with .gitattributes or .editorconfig.
  • Using locale-dependent default encoding in code -- Some languages (Python 2, older Java) use the system locale's encoding by default; always specify encoding='utf-8' explicitly when opening files.
  • Processing large text files by loading them entirely into memory -- Use line-by-line streaming (for line in file) for large log files and datasets rather than file.read().
  • Storing structured data in unstructured plain text -- If your text file has fields, records, and types, use CSV, TSV, JSON, or a database instead of inventing custom delimiters and parsing logic.

Install this skill directly: skilldb add file-formats-skills

Get CLI access →