Skip to content
References 7 min read

Unicode Character Table: Browse Blocks, Find Codepoints & Understand UTF-8 Encoding

Browse every Unicode block — Latin, Cyrillic, CJK, symbols, arrows, math, emoji. See codepoints, UTF-8 and UTF-16 encoding, and HTML entities for every character. Essential for internationalization.

ToolsVito Team

What Unicode Is

Unicode is a universal character encoding standard that assigns a unique number (codepoint) to every character in every writing system — Latin, Cyrillic, Arabic, CJK, Devanagari, and dozens more — plus symbols, emoji, mathematical notation, and historical scripts. As of Unicode 16.0 (2024), the standard defines 149,813 characters across 161 scripts. UTF-8, the dominant encoding on the web, represents each Unicode codepoint in 1 to 4 bytes.

How Unicode Is Organized

Unicode divides the codepoint space (0x0000–0x10FFFF) into 17 planes, each with 65,536 codepoints:

  • Plane 0 (BMP — Basic Multilingual Plane): U+0000 to U+FFFF. Contains almost all modern writing systems. Most text fits entirely in the BMP.
  • Plane 1 (SMP — Supplementary Multilingual Plane): U+10000 to U+1FFFF. Historic scripts, music notation, emoji.
  • Plane 2 (SIP — Supplementary Ideographic Plane): U+20000 to U+2FFFF. Rare and historic CJK characters.
  • Planes 3–13: Unassigned (reserved for future use).
  • Plane 14 (SSP): Special-purpose characters and control codes.
  • Planes 15–16: Private use areas — guaranteed never to be assigned by Unicode, free for application-specific use.

Major Unicode Blocks

  • Basic Latin (U+0000–U+007F): ASCII — the first 128 characters.
  • Latin-1 Supplement (U+0080–U+00FF): Accented letters, punctuation — used by Western European languages.
  • Cyrillic (U+0400–U+04FF): Russian, Ukrainian, Bulgarian, Serbian, and other Slavic languages.
  • CJK Unified Ideographs (U+4E00–U+9FFF): Chinese, Japanese, and Korean characters — the largest block in the BMP.
  • Arrows (U+2190–U+21FF): Every arrow direction and style: → ← ↑ ↓ ↔ ⇒ ⇐.
  • Mathematical Operators (U+2200–U+22FF): ∀ ∃ ∈ ∉ ∑ ∏ ∫ ∮.
  • Emoticons & Emoji (U+1F600–U+1F64F): 😀 through 🙏.

UTF-8 Encoding: How Codepoints Become Bytes

U+0041 ('A')    → 0x41                    (1 byte)
U+00E9 ('é')    → 0xC3 0xA9               (2 bytes)
U+4E16 ('世')   → 0xE4 0xB8 0x96          (3 bytes)
U+1F600 ('😀')  → 0xF0 0x9F 0x98 0x80     (4 bytes)

BMP characters (U+0000–U+FFFF) encode in 1–3 bytes. Supplementary characters (U+10000+) need 4 bytes. ASCII through U+007F encode as a single byte — this is why UTF-8 is backwards-compatible with ASCII.

Browse Unicode Now

Use ToolsVito's Unicode Character Table to browse Unicode by block — Latin, Cyrillic, CJK, symbols, arrows, math, emoji. See codepoint, UTF-8 and UTF-16 encoding, and HTML entity for every character. All in your browser.

Try it now — free, runs in your browser

Unicode Character Table

Browse Unicode blocks