File Types and Extensions: What They Are and How Computers Use Them
A file type defines the internal format and structure of a file’s data. A file extension is the suffix after the final dot in a filename that identifies the file type to the operating system. This guide explains how file extensions work, covers 8 file type categories with format details, and compares image formats by size, quality, and transparency support.
What Is a File Type?
A file type is a specification for how binary data is organized inside a file. The file type determines which software can open the file, how the data is interpreted, and what operations are valid. A JPEG file stores compressed image data using discrete cosine transform (DCT) encoding.
A PNG file stores image data using lossless DEFLATE compression. The same pixels stored in each format produce different byte sequences because the internal structure differs.
Most file types are identified by a “magic number” — a specific byte sequence at the start of the file (the file header). For example, every JPEG begins with bytes FF D8 FF.
Every PDF begins with %PDF. The operating system and file analysis tools use these headers, not just extensions, to identify file types reliably.
What Is a File Extension?
A file extension is the suffix after the final period in a filename (e.g., .docx, .mp4, .exe). Extensions are a convention introduced by early operating systems (CP/M limited extensions to 3 characters; DOS followed). Modern filesystems impose no character limit on extensions.

Operating systems use extensions to determine the default program for opening a file. Windows stores extension-to-application mappings in the registry. macOS uses a Uniform Type Identifier (UTI) system. Linux uses MIME types via shared-mime-info databases.
What Are MIME Types?
MIME (Multipurpose Internet Mail Extensions) types are standardized identifiers for file types used in web protocols and email. MIME types are independent of file extensions — they travel in HTTP headers and email headers to declare content type regardless of the filename.
MIME type format: type/subtype. Common examples:
- HTML file: text/html
- JPEG image: image/jpeg
- MP4 video: video/mp4
- JSON data: application/json
- PDF document: application/pdf
A web server sending a file with the wrong MIME type causes browsers to misinterpret it. A JavaScript file sent as text/plain causes the browser to display source code instead of executing it.
What Happens When a File Extension Is Changed?
Changing a file extension changes only the filename — it does not alter the internal file data. Renaming image.jpg to image.png does not convert the file to PNG format.
Programs that read by magic number (most professional software) will still identify the file as JPEG. Programs that rely solely on the extension may fail to open it or produce errors.
Security risk: malicious executables are sometimes renamed with innocent extensions (.jpg, .txt) to evade casual inspection. Antivirus software reads file headers, not just extensions, to detect such threats.
8 File Type Categories with Formats and Sizes
The 8 primary file type categories cover all common computing use cases:

1. Document Files
Document files store human-readable text and formatting. Common formats:
- .docx (Word): ZIP archive containing XML files. A 10-page document ≈ 50–200 KB.
- .pdf (Portable Document Format): Fixed-layout format; a 10-page document ≈ 100 KB to 5 MB depending on embedded images.
- .txt (Plain text): Raw UTF-8 or ASCII text, no formatting. 1,000 words ≈ 6 KB.
2. Image Files
Image files store raster or vector graphics. Common formats:
- .jpg/.jpeg: Lossy compression using DCT. A 12-megapixel photo ≈ 3–6 MB.
- .png: Lossless compression, supports transparency (alpha channel). Same photo ≈ 15–25 MB uncompressed equivalent.
- .webp: Google’s format; 25–35% smaller than JPEG at equivalent quality, supports transparency.
- .raw: Unprocessed sensor data from cameras. A 12-megapixel raw file ≈ 18–36 MB.
3. Audio Files
Audio files store sound data in compressed or uncompressed form:
- .mp3: Lossy compression (MPEG-1 Layer III). 4-minute song at 128 kbps ≈ 4 MB.
- .wav: Uncompressed PCM audio. Same song at CD quality (44.1 kHz, 16-bit stereo) ≈ 40 MB.
- .flac: Lossless compression. Same song ≈ 20–25 MB (50% of WAV with identical audio quality).
- .aac: Lossy compression; successor to MP3. Same song at 128 kbps ≈ 3.5 MB, higher quality than MP3 at equivalent bitrate.
4. Video Files
Video files contain compressed video streams, audio streams, and metadata:
- .mp4: H.264 or H.265 video in MPEG-4 container. 1 hour 1080p ≈ 4–8 GB (H.264).
- .mkv: Matroska container; supports multiple audio/subtitle tracks. Same content as .mp4 container with no size difference from the video codec itself.
- .mov: Apple QuickTime container; used in macOS and iOS workflows.
- .avi: Legacy Microsoft container. Often uncompressed or lightly compressed; 1 hour ≈ 50–100 GB.
5. Executable Files
Executable files contain machine code or installer packages that the OS can run directly:
- .exe: Windows Portable Executable (PE) format. Contains machine code, resources, and metadata.
- .msi: Windows Installer package. Contains installation logic and files in a structured database.
- .dmg: macOS disk image containing an application bundle (.app folder).
- .deb: Debian/Ubuntu package format containing binary files and install scripts.
- .AppImage: Linux self-contained application bundle; runs without installation.
6. Archive Files
Archive files bundle multiple files into a single compressed container:
- .zip: DEFLATE compression, widely supported across all operating systems natively.
- .rar: Proprietary format; better compression ratio than ZIP for most file types. Requires third-party software to extract.
- .7z: LZMA/LZMA2 compression; typically 30–50% smaller than ZIP for the same input.
- .tar.gz: TAR archive (no compression) compressed with gzip; standard format for Linux software distribution.
7. Code Files
Code files store human-readable source code as plain text:
- .py: Python source code. Interpreted by the Python runtime.
- .js: JavaScript source code. Executed by browsers or Node.js.
- .html: HyperText Markup Language. Rendered by browsers as web pages.
- .css: Cascading Style Sheets. Defines visual styling for HTML documents.
- .java: Java source code. Compiled to bytecode (.class files) executed by the JVM.
8. Data Files
Data files store structured information for programs to read, process, and exchange:
- .csv: Comma-separated values; plain text table data readable by spreadsheets and databases.
- .json: JavaScript Object Notation; key-value structured data used in APIs and configuration.
- .xml: Extensible Markup Language; hierarchical structured data used in legacy APIs and document formats.
- .sql: Structured Query Language file; contains database schema definitions and data insertion statements.
Image Format Comparison: Size, Quality, and Transparency
| Format | Compression | Typical Size (12 MP photo) | Quality | Transparency | Best Use |
|---|---|---|---|---|---|
| JPEG (.jpg) | Lossy | 3–6 MB | Good (artifacts at low quality) | No | Photos, web images |
| PNG (.png) | Lossless | 15–30 MB | Perfect | Yes (alpha) | Logos, screenshots, graphics |
| WebP (.webp) | Lossy or Lossless | 2–4 MB (lossy) | Good to excellent | Yes | Web images (modern browsers) |
| GIF (.gif) | Lossless (LZW) | 50 KB–2 MB | Limited (256 colors) | Yes (1-bit) | Simple animations |
| AVIF (.avif) | Lossy or Lossless | 1–3 MB (lossy) | Excellent | Yes | Next-gen web images |
| RAW (.raw/.cr2) | None or lossless | 18–36 MB | Maximum (sensor data) | N/A | Professional photography |
Key Takeaways
- A file type defines the internal binary structure of a file; a file extension is the filename suffix that hints at the type.
- Operating systems use file extensions to associate files with programs; security tools use magic numbers (file headers) for reliable identification.
- Changing a file extension does not convert the file — the internal data structure remains unchanged.
- MIME types identify file types in HTTP and email headers, independent of the filename extension.
- JPEG uses lossy compression; PNG uses lossless compression; WebP is 25–35% smaller than JPEG at equivalent quality.
- 7z archives are typically 30–50% smaller than ZIP archives for the same input data.
Frequently Asked Questions
What is the difference between a file type and a file extension?
A file type is the actual internal format of the data. A file extension is a filename suffix that signals the type. Extensions can be changed or omitted — the actual file type is determined by the internal header bytes (magic number), not the extension.
What happens if you rename a .exe file to .txt?
The file’s internal data does not change. A text editor attempting to open it will display garbled binary characters. The OS will no longer run it by double-clicking, but the file remains a valid executable and security tools will identify it by its PE header.
Is PNG or JPEG better for web images?
JPEG is better for photographs — 3–6 MB vs PNG’s 15–30 MB at equivalent pixel dimensions. PNG is better for logos and graphics with transparency. WebP is best overall: 25–35% smaller than JPEG with transparency support.
What is a MIME type?
A MIME type is a standardized string that identifies a file’s format in HTTP and email headers, formatted as type/subtype (e.g., image/jpeg, application/json). It operates independently of the filename extension to ensure correct content handling.
Which archive format compresses files the most?
7z typically achieves the best compression ratio using LZMA2, producing archives 30–50% smaller than ZIP for most file types. RAR also outperforms ZIP. The best format depends on file type — already-compressed files (JPEG, MP4) gain little from any archive format.
Last Thoughts on File Types and Extensions
File types and extensions are the system by which operating systems, browsers, and applications determine how to interpret and process data. Understanding the distinction between the extension (a human-readable label) and the actual file format (defined by internal structure and magic numbers) prevents errors in file handling, security analysis, and cross-platform data exchange. Format selection directly affects storage size, quality, compatibility, and processing speed.


