Unicode Character Table: Browse Blocks, Find Codepoints & Understand UTF-8 Encoding

What Unicode Is

Unicode is a universal character encoding standard that assigns a unique number (codepoint) to every character in every writing system — Latin, Cyrillic, Arabic, CJK, Devanagari, and dozens more — plus symbols, emoji, mathematical notation, and historical scripts. As of Unicode 16.0 (2024), the standard defines 149,813 characters across 161 scripts. UTF-8, the dominant encoding on the web, represents each Unicode codepoint in 1 to 4 bytes.

How Unicode Is Organized

Unicode divides the codepoint space (0x0000–0x10FFFF) into 17 planes, each with 65,536 codepoints:

Plane 0 (BMP — Basic Multilingual Plane): U+0000 to U+FFFF. Contains almost all modern writing systems. Most text fits entirely in the BMP.
Plane 1 (SMP — Supplementary Multilingual Plane): U+10000 to U+1FFFF. Historic scripts, music notation, emoji.
Plane 2 (SIP — Supplementary Ideographic Plane): U+20000 to U+2FFFF. Rare and historic CJK characters.
Planes 3–13: Unassigned (reserved for future use).
Plane 14 (SSP): Special-purpose characters and control codes.
Planes 15–16: Private use areas — guaranteed never to be assigned by Unicode, free for application-specific use.

Major Unicode Blocks

Basic Latin (U+0000–U+007F): ASCII — the first 128 characters.
Latin-1 Supplement (U+0080–U+00FF): Accented letters, punctuation — used by Western European languages.
Cyrillic (U+0400–U+04FF): Russian, Ukrainian, Bulgarian, Serbian, and other Slavic languages.
CJK Unified Ideographs (U+4E00–U+9FFF): Chinese, Japanese, and Korean characters — the largest block in the BMP.
Arrows (U+2190–U+21FF): Every arrow direction and style: → ← ↑ ↓ ↔ ⇒ ⇐.
Mathematical Operators (U+2200–U+22FF): ∀ ∃ ∈ ∉ ∑ ∏ ∫ ∮.
Emoticons & Emoji (U+1F600–U+1F64F): 😀 through 🙏.

UTF-8 Encoding: How Codepoints Become Bytes

U+0041 ('A')    → 0x41                    (1 byte)
U+00E9 ('é')    → 0xC3 0xA9               (2 bytes)
U+4E16 ('世')   → 0xE4 0xB8 0x96          (3 bytes)
U+1F600 ('😀')  → 0xF0 0x9F 0x98 0x80     (4 bytes)

BMP characters (U+0000–U+FFFF) encode in 1–3 bytes. Supplementary characters (U+10000+) need 4 bytes. ASCII through U+007F encode as a single byte — this is why UTF-8 is backwards-compatible with ASCII.

Browse Unicode Now

Use ToolsVito's Unicode Character Table to browse Unicode by block — Latin, Cyrillic, CJK, symbols, arrows, math, emoji. See codepoint, UTF-8 and UTF-16 encoding, and HTML entity for every character. All in your browser.