ASCII vs Unicode: Character Encoding Standards Explained

Q: What character does ASCII value 65 represent?

ASCII value 65 represents the uppercase letter A . In binary it is 01000001. In hexadecimal it is 0x41. Unicode code point U+0041 maps to the same character for full ASCII backward compatibility.

Nizam Ud Deen2 weeks agoLast Updated: July 8, 2026

0 25 6 minutes read

Character encoding is the agreed map a computer uses to turn each character – a letter, a digit, a symbol, an emoji – into a number, and then into the bytes it stores. ASCII and Unicode are two such maps. ASCII is the original 1963 standard with 128 characters for English only; Unicode is the universal modern standard with room for over 1.1 million characters covering every language plus emoji. UTF-8 is the way Unicode is usually stored, and it now carries almost the entire web.

In shortASCII maps 128 characters (English letters, digits, punctuation) using 7 bits. Unicode is one universal set with capacity for 1,114,112 code points across every script and emoji. UTF-8 is the dominant way to store Unicode – variable 1 to 4 bytes, fully ASCII-compatible, and used by about 98% of websites. Use UTF-8 for almost everything; ASCII only lives on inside it.

128

ASCII characters (7-bit)

1.1M+

Unicode code points

1-4

UTF-8 bytes per character

~98%

Web pages using UTF-8

The three names answer three different questions. Here is the plain-language split:

ASCII

The original map. 128 characters, 7 bits, English only – letters, digits, punctuation, and control codes. The base every later standard kept.

Unicode

The universal map. One unique number (a code point) for every character in every language, plus symbols and emoji. Over 1.1 million slots. It is the numbering, not a storage format.

UTF-8

The storage format. The most common way to write Unicode out as bytes – 1 byte for English, up to 4 for emoji – and it keeps old ASCII files valid and unchanged.

What Is ASCII?

ASCII (American Standard Code for Information Interchange) is a 7-bit character encoding from 1963 that defines 128 characters, numbered 0 to 127:

Purpose: it gave US computer makers one shared code, replacing the proprietary schemes that made data transfer between systems unreliable.
Coverage: 95 printable characters (letters, digits, punctuation) plus 33 non-printing control codes such as newline (10), tab (9), and null (0).
How it maps: each character is just a number stored in one byte, with the 8th bit unused.

A few core mappings show the pattern – the letter A is 65, and lowercase a is 97:

Space = 32
Digits 0–9 = 48–57
Uppercase A–Z = 65–90
Lowercase a–z = 97–122
Letter A = 65 (binary 01000001)
Letter a = 97 (binary 01100001)

What Are ASCII’s Limitations?

ASCII’s core limit is that it only covers English, which broke down as computing went global:

No other scripts: no accented letters (the e-acute, n-tilde, u-umlaut family), and nothing for Chinese, Arabic, Hebrew, Cyrillic, or Japanese.
No emoji and few symbols – 128 slots fill up fast.
Fragmented fixes: the spare 8th bit spawned dozens of incompatible 256-character extensions (ISO 8859-1, Windows-1252), so a file made with one and opened with another showed garbled text (mojibake).

Why this matteredBy the 1980s, international computing needed one encoding that could hold every writing system at once – not a patchwork of region-specific code pages that corrupted each other’s text.

What Is Unicode?

Unicode (Universal Coded Character Set) is one international standard, first published in 1991, that gives every character a single unique number:

Scale: Unicode 15.1 (2023) defines 149,813 characters across 161 modern and historic scripts, symbols, and emoji.
Code points: each character has one number written as U+ and hex – the letter A is U+0041 (same value as ASCII 65), the euro sign is U+20AC.
Capacity: 17 planes of 65,536 each give room for 1,114,112 code points, so there is space to keep adding for decades. Plane 0, the Basic Multilingual Plane, holds all modern scripts.

What Is UTF-8?

UTF-8 is a variable-width way to store Unicode using 1 to 4 bytes per character, and it is the encoding that made Unicode practical:

ASCII-compatible: the first 128 code points are 1 byte and identical to ASCII, so any ASCII file is already a valid UTF-8 file with the same bytes.
Compact for English: common text stays 1 byte per character; only rarer characters cost more.
Dominant: about 98% of websites use UTF-8, which is why it is the default for web pages, source code, and APIs.

The byte count grows with how far the character sits from the ASCII range:

Code points U+0000 to U+007F (ASCII range): 1 byte (identical to ASCII — fully backward compatible)
Code points U+0080 to U+07FF (Latin extended, Greek, Cyrillic, Arabic, Hebrew): 2 bytes
Code points U+0800 to U+FFFF (most Asian scripts, symbols): 3 bytes
Code points U+10000 to U+10FFFF (emoji, historic scripts, supplementary characters): 4 bytes

What Is UTF-16?

UTF-16 is another Unicode encoding that uses 2 or 4 bytes per character, common inside software rather than on the web:

Sizing: characters in the Basic Multilingual Plane take 2 bytes; characters above it take 4 bytes as a surrogate pair.
Where it lives: the internal string type of Windows, Java, and JavaScript.
Trade-offs: it is not ASCII-compatible and needs a Byte Order Mark to signal byte order, but it can be smaller than UTF-8 for text that is mostly Asian scripts.

What Is UTF-32?

UTF-32 is a fixed-width encoding that uses exactly 4 bytes for every character:

Upside: every character is the same size, so jumping to the Nth character is one calculation with no scanning.
Downside: English text becomes 4 times larger than UTF-8, since even a plain letter takes 4 bytes.
Best for: some internal processing pipelines that need fast random access; it is rare in files or web transfer.

Emoji in Unicode

Emoji are ordinary Unicode characters with their own code points, not images bolted on:

Code point: the grinning face is U+1F600, which sits above the Basic Multilingual Plane.
Bytes: in UTF-8 it is 4 bytes (F0 9F 98 80); in UTF-16 it is the surrogate pair D83D DE00.
Combined emoji: family and skin-tone variants join several code points with a zero-width joiner (U+200D) to show one glyph. Unicode 15.1 defines 3,664 emoji.

How Big Is English Text in Each Encoding?

For plain English, ASCII and UTF-8 tie at 1 byte per character, while UTF-16 and UTF-32 cost more:

Bytes per English character (lower is better)

ASCII1 B

UTF-81 B

UTF-162 B

UTF-324 B

ASCII and UTF-8: 1 byte each, which is why UTF-8 is a free upgrade for English-heavy content.
UTF-16: 2 bytes per character, doubling the size of English text.
UTF-32: 4 bytes per character regardless of the character.

ASCII vs UTF-8 vs UTF-16 vs UTF-32 Comparison

The table lines up all four across age, capacity, byte width, ASCII compatibility, web usage, and typical use:

Property	ASCII	UTF-8	UTF-16	UTF-32
Year introduced	1963	1993	1996	1993
Character set size	128	1,114,112	1,114,112	1,114,112
Bytes per character	1 (fixed)	1–4 (variable)	2 or 4 (variable)	4 (fixed)
ASCII compatible	Yes	Yes	No	No
Web usage (2024)	Obsolete	98.2%	Rare	Rare
Primary use	Legacy systems	Web, files, APIs	Windows, Java, JS	Internal processing
English text size	1 byte/char	1 byte/char	2 bytes/char	4 bytes/char

Which to usePick UTF-8 for almost everything – web pages, files, source code, and APIs – because it is ASCII-compatible and compact. Reach for UTF-16 only inside platforms that already default to it (Windows, Java, JavaScript), and UTF-32 only when fixed-width random access is worth the extra size. Plain ASCII is legacy and now lives on as the first 128 code points of UTF-8.

ASCII and Unicode LookupType a single character to see its code, or a decimal code point to see its character

Character or code point

Last Thoughts on ASCII vs Unicode

ASCII gave English-language computing its first shared text code, but 128 characters could never serve a global internet. Unicode fixed that with one universal number space for every writing system, and UTF-8 made it practical by staying ASCII-compatible while encoding everything else. In short: think in Unicode, store in UTF-8, and treat ASCII as the compatible core you already get for free.

Key Takeaways:

ASCII is a 7-bit standard covering 128 characters (English only), published in 1963.
Unicode covers 149,813 characters across 161 scripts as of version 15.1 (2023).
UTF-8 uses 1–4 bytes per character, is backward-compatible with ASCII, and is used by 98.2% of websites.
UTF-16 uses 2 or 4 bytes and is the internal encoding of Windows, Java, and JavaScript.
UTF-32 uses a fixed 4 bytes per character — simple but memory-inefficient for English text.
The emoji 😀 = U+1F600 = 4 bytes in UTF-8 (F0 9F 98 80).

Frequently Asked Questions (FAQs)

What is the difference between ASCII and Unicode?

ASCII covers 128 characters (English only) using 7 bits. Unicode covers 149,813 characters (all world scripts) with code points up to 21 bits. UTF-8 is the most common Unicode encoding and is backward-compatible with ASCII.

What character does ASCII value 65 represent?

ASCII value 65 represents the uppercase letter A. In binary it is 01000001. In hexadecimal it is 0x41. Unicode code point U+0041 maps to the same character for full ASCII backward compatibility.

Why does UTF-8 dominate the web?

UTF-8 dominates because it is backward-compatible with ASCII, uses only 1 byte for English characters (minimizing file size), and supports every Unicode character. 98.2% of websites use UTF-8 as of 2024.

What is a Unicode code point?

A Unicode code point is a unique integer assigned to each character, written as U+ followed by a hex number. Letter A = U+0041. Euro sign = U+20AC. The 😀 emoji = U+1F600. Each code point maps to exactly one character.

Can UTF-8 handle all languages?

Yes. UTF-8 encodes all 149,813 Unicode characters using 1–4 bytes. It handles every modern and historic script including Chinese, Arabic, Hindi, Japanese, Korean, and all emoji defined in Unicode 15.1.