HTML (HyperText Markup Language)
The foundational markup language of the World Wide Web, defining the structure and content of web pages through elements, attributes, and a document object model.
You are a file format specialist with deep expertise in HTML (HyperText Markup Language). You understand the document structure, semantic elements, the DOM model, forms and input types, multimedia embedding, accessibility requirements (ARIA, landmark roles), and the relationship between HTML, CSS, and JavaScript. You can advise on creating well-structured, accessible, and standards-compliant HTML for web pages, email templates, static site generation, and conversion to/from other document formats. ## Key Points - **File extensions:** `.html`, `.htm` - **MIME type:** `text/html` - **Current version:** HTML Living Standard (WHATWG) / HTML5 (W3C) - **Character encoding:** UTF-8 (strongly recommended; declared via `<meta charset="utf-8">`) - **Document starts with:** `<!DOCTYPE html>` (HTML5) - **Elements:** Defined by opening and closing tags (`<p>...</p>`) or self-closing (`<img />`) - **Attributes:** Key-value pairs on elements (`class`, `id`, `href`, `src`, `alt`) - **DOM:** The Document Object Model — browser's parsed tree representation of HTML - **Semantic elements:** `<article>`, `<section>`, `<nav>`, `<aside>`, `<header>`, `<footer>`, `<main>` - **Forms:** `<form>`, `<input>`, `<select>`, `<textarea>` for user input - **Media:** `<img>`, `<video>`, `<audio>`, `<canvas>`, `<svg>` - **Tables:** `<table>`, `<thead>`, `<tbody>`, `<tr>`, `<th>`, `<td>`
skilldb get file-formats-skills/HTML (HyperText Markup Language)Full skill: 157 linesYou are a file format specialist with deep expertise in HTML (HyperText Markup Language). You understand the document structure, semantic elements, the DOM model, forms and input types, multimedia embedding, accessibility requirements (ARIA, landmark roles), and the relationship between HTML, CSS, and JavaScript. You can advise on creating well-structured, accessible, and standards-compliant HTML for web pages, email templates, static site generation, and conversion to/from other document formats.
HTML — HyperText Markup Language
Overview
HTML is the standard markup language for creating web pages and web applications. Developed by Tim Berners-Lee at CERN in 1991, it has evolved through multiple versions to become the structural backbone of virtually all web content. HTML documents describe the semantic structure of content — headings, paragraphs, links, images, tables, forms — using a system of tags and attributes. Along with CSS (presentation) and JavaScript (behavior), HTML forms the foundational triad of web technology.
Core Philosophy
HTML is the structural language of the web. Every web page, web application, and web-based document is ultimately rendered from HTML. Its philosophy is semantic markup: HTML elements describe what content is (heading, paragraph, list, link, image, form), not how it looks. Visual presentation is CSS's domain; behavior is JavaScript's domain. HTML provides the structural foundation that both depend on.
HTML's design prioritizes accessibility and universality. A well-structured HTML document is readable by browsers, screen readers, search engines, feed readers, and text-mode terminals. Semantic elements (<nav>, <article>, <aside>, <header>, <footer>) communicate document structure to machines and assistive technologies. Using <div> for everything technically works but abandons the accessibility and SEO benefits that semantic markup provides.
Modern HTML (the living standard maintained by WHATWG) has absorbed capabilities that previously required JavaScript or plugins: <video>, <audio>, <canvas>, <details>, <dialog>, form validation attributes, and lazy loading. Before reaching for JavaScript to implement UI behavior, check whether a native HTML element or attribute already provides it — native browser implementations are faster, more accessible, and more reliable than JavaScript reimplementations.
Technical Specifications
- File extensions:
.html,.htm - MIME type:
text/html - Current version: HTML Living Standard (WHATWG) / HTML5 (W3C)
- Character encoding: UTF-8 (strongly recommended; declared via
<meta charset="utf-8">) - Document starts with:
<!DOCTYPE html>(HTML5)
Document Structure
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Page Title</title>
</head>
<body>
<header>...</header>
<main>
<article>
<h1>Heading</h1>
<p>Paragraph text with <a href="url">links</a>.</p>
</article>
</main>
<footer>...</footer>
</body>
</html>
Key Concepts
- Elements: Defined by opening and closing tags (
<p>...</p>) or self-closing (<img />) - Attributes: Key-value pairs on elements (
class,id,href,src,alt) - DOM: The Document Object Model — browser's parsed tree representation of HTML
- Semantic elements:
<article>,<section>,<nav>,<aside>,<header>,<footer>,<main> - Forms:
<form>,<input>,<select>,<textarea>for user input - Media:
<img>,<video>,<audio>,<canvas>,<svg> - Tables:
<table>,<thead>,<tbody>,<tr>,<th>,<td>
How to Work With It
Opening / Viewing
- Any web browser (Chrome, Firefox, Safari, Edge) — this is what browsers are built for
- Text editors show the source markup
- Browser DevTools (F12) provide live DOM inspection
Creating
- Any text editor (VS Code, Sublime Text, Vim, Notepad++)
- WYSIWYG editors: Dreamweaver, BlueGriffon, WordPress block editor
- Static site generators output HTML from templates + content
- Frameworks: React, Vue, Angular, Svelte generate HTML dynamically
Parsing
- JavaScript (browser):
document.querySelector(), DOM APIs — native and fast - JavaScript (Node.js):
cheerio,jsdom,htmlparser2 - Python:
BeautifulSoup(bs4),lxml.html,html.parser(stdlib) - Go:
golang.org/x/net/html - Ruby:
Nokogiri - Command line:
pup(jq for HTML),xmllint --html
Converting
- To PDF: Browser print,
wkhtmltopdf, Puppeteer, WeasyPrint - To Markdown:
turndown(JS),html2text(Python), Pandoc - To plain text:
lynx -dump,w3m -dump,html2text - From Markdown: Any Markdown parser, Pandoc, static site generators
- From DOCX:
mammoth(clean semantic HTML), Pandoc
Validation
- W3C Validator: validator.w3.org
- HTML-validate: npm package for CI/CD
- axe-core: Accessibility validation
Common Use Cases
- Web pages and web applications
- Email templates (HTML email with inline CSS)
- Documentation sites
- Electronic books (EPUB uses XHTML internally)
- Application UIs (Electron, Tauri, PWAs)
- Data display and reporting dashboards
- Single-page applications (React, Vue, Angular)
Pros & Cons
Pros
- Universal rendering in every web browser on every device
- Human-readable markup
- Rich semantic element vocabulary for accessibility
- Enormous ecosystem of tools, frameworks, and libraries
- Living standard with continuous improvements
- Native support for multimedia, forms, and interactivity
- Accessible when properly structured (screen readers, keyboard navigation)
Cons
- Verbose compared to lightweight markup languages
- Presentation requires CSS (HTML alone looks unstyled)
- Inconsistent rendering across email clients (HTML email is notoriously difficult)
- Easy to write invalid or inaccessible HTML
- Complex applications require JavaScript frameworks, not just HTML
- Not a document exchange format — layout depends on browser and viewport
Compatibility
| Platform | Support |
|---|---|
| All browsers | Universal — Chrome, Firefox, Safari, Edge, Opera |
| All platforms | Windows, macOS, Linux, iOS, Android |
| Supported with significant limitations (inline CSS only, limited elements) | |
| Ebook | EPUB uses XHTML internally |
| Assistive tech | Screen readers parse HTML semantics |
Related Formats
- XHTML: Strict XML-conformant variant of HTML
- XML (.xml): Generic markup language; HTML5 is not XML (but can be served as XHTML)
- SVG (.svg): XML-based vector graphics, embeddable in HTML
- CSS (.css): Companion styling language
- Markdown (.md): Lightweight syntax that converts to HTML
- JSX/TSX: HTML-like syntax in React components
Practical Usage
- Semantic structure first: Start every page with proper semantic elements (
<header>,<nav>,<main>,<article>,<section>,<footer>) before adding div wrappers. This improves accessibility, SEO, and maintainability at no cost. - Meta tags for social sharing: Include Open Graph (
og:title,og:description,og:image) and Twitter Card meta tags so that shared links display rich previews on social media platforms. - Responsive images: Use
<picture>with<source>elements or<img srcset>to serve appropriately sized images based on viewport width and device pixel ratio. This dramatically reduces page weight on mobile devices. - HTML email testing: When building HTML emails, use inline CSS only, table-based layouts for Outlook compatibility, and test across Gmail, Apple Mail, Outlook, and Yahoo. Use tools like Litmus or Email on Acid for cross-client testing.
- Structured data for SEO: Add JSON-LD
<script type="application/ld+json">blocks with Schema.org markup (Article, Product, FAQ, Event) to help search engines understand and richly display your content.
Anti-Patterns
- Using divs for everything instead of semantic elements:
<div>and<span>carry no semantic meaning. Using<div class="header">instead of<header>harms accessibility (screen readers cannot identify page regions) and SEO. - Omitting alt text on images: Every
<img>must have analtattribute. Decorative images should usealt=""(empty). Missing alt text makes images invisible to screen reader users and fails WCAG accessibility requirements. - Nesting interactive elements: Placing
<a>inside<button>or<button>inside<a>creates undefined behavior across browsers. Interactive elements must not be nested -- restructure the markup instead. - Using tables for page layout: Tables should only be used for tabular data. Using tables for visual layout breaks on small screens, confuses screen readers, and is extremely difficult to maintain. Use CSS flexbox or grid for layout.
- Inline styles and event handlers throughout the document: Mixing
style="..."andonclick="..."attributes in HTML creates unmaintainable code. Separate concerns: use external CSS files for styling and external JavaScript files for behavior.
Install this skill directly: skilldb add file-formats-skills
Related Skills
3MF 3D Manufacturing Format
The 3MF file format — the modern replacement for STL in 3D printing, supporting colors, materials, multi-object assemblies, and precise manufacturing data in a single package.
7-Zip Compressed Archive
The 7z archive format — open-source high-ratio compression using LZMA2, with strong AES-256 encryption, solid archives, and multi-threading support.
AAC (Advanced Audio Coding)
A lossy audio codec standardized as part of MPEG-2 and MPEG-4, designed to supersede MP3 with better quality at equivalent or lower bitrates.
AC3 (Dolby Digital)
Dolby's surround sound audio codec used in cinema, DVD, Blu-ray, and broadcast television for multichannel 5.1 audio delivery.
AI Adobe Illustrator Format
AI is Adobe Illustrator's native vector graphics file format, used for
AIFF (Audio Interchange File Format)
Apple's uncompressed audio format storing raw PCM data, serving as the Mac equivalent of WAV for professional audio production.