ASCII (/‘æski:/ ASS-kee), an acronym for American Standard Code for Information Interchange. It’s a character encoding standard used in electronic communication to represent text in computers, telecommunications equipment, and other devices. ASCII uses a 7-bit character encoding system to represent 128 unique characters, which include both control characters and printable characters.

The first 32 code points (numbers 0-31 in decimal) and the last one (number 127 in decimal) are reserved for control characters, originally from Teletype models. While most of these control characters are outdated, some, like carriage return, line feed, and tab, are still in common use today.

Of the 128 characters, 95 are printable, including the digits 0-9, lowercase letters (a-z), uppercase letters (A-Z), and various punctuation symbols.

Note: The standard ASCII encoding is essential for learning programming. It uses 7 bits within one byte. To make use of the 8th bit (the highest bit), an extended version of ASCII was created. The extended ASCII set includes additional special characters, letters from foreign languages, and graphical symbols. However, extended ASCII is not internationally standardized, and different countries may use different character sets for it.

Below is the standard ASCII table (with light yellow backgrounds for control characters and white for printable characters).

DEC OCT HEX BIN Symbol HTML Number HTML Name Description
0 000 00 00000000 &#00 Null character
1 001 01 00000001 &#01 Start of Heading
2 002 02 00000010 &#02 Start of Text
3 003 03 00000011 &#03 End of Text
4 004 04 00000100 &#04 End of Transmission
5 005 05 00000101 &#05 Enquiry
6 006 06 00000110 &#06 Acknowledge
7 007 07 00000111 &#07 Bell, Alert
8 010 08 00001000 &#08 Backspace
9 011 09 00001001 &#09 Horizontal Tab
10 012 0A 00001010 &#10 Line Feed
11 013 0B 00001011 &#11 Vertical Tabulation
12 014 0C 00001100 &#12 Form Feed
13 015 0D 00001101 &#13 Carriage Return
14 016 0E 00001110 &#14 Shift Out
15 017 0F 00001111 &#15 Shift In
16 020 10 00010000 &#16 Data Link Escape
17 021 11 00010001 &#17 Device Control One (XON)
18 022 12 00010010 &#18 Device Control Two
19 023 13 00010011 &#19 Device Control Three (XOFF)
20 024 14 00010100 &#20 Device Control Four
21 025 15 00010101 &#21 Negative Acknowledge
22 026 16 00010110 &#22 Synchronous Idle
23 027 17 00010111 &#23 End of Transmission Block
24 030 18 00011000 &#24 Cancel
25 031 19 00011001 &#25 End of medium
26 032 1A 00011010 &#26 Substitute
27 033 1B 00011011 &#27 Escape
28 034 1C 00011100 &#28 File Separator
29 035 1D 00011101 &#29 Group Separator
30 036 1E 00011110 &#30 Record Separator
31 037 1F 00011111 &#31 Unit Separator
32 040 20 00100000 &#32 Space
33 041 21 00100001 ! &#33 &excl Exclamation mark
34 042 22 00100010 &#34 &quot Double quotes (or speech marks)
35 043 23 00100011 # &#35 &num Number sign
36 044 24 00100100 $ &#36 &dollar Dollar
37 045 25 00100101 % &#37 &percnt Per cent sign
38 046 26 00100110 & &#38 &amp Ampersand
39 047 27 00100111 &#39 &apos Single quote
40 050 28 00101000 ( &#40 &lparen Open parenthesis (or open bracket)
41 051 29 00101001 ) &#41 &rparen Close parenthesis (or close bracket)
42 052 2A 00101010 * &#42 &ast Asterisk
43 053 2B 00101011 + &#43 &plus Plus
44 054 2C 00101100 , &#44 &comma Comma
45 055 2D 00101101 - &#45 Hyphen-minus
46 056 2E 00101110 . &#46 &period Period, dot or full stop
47 057 2F 00101111 / &#47 &sol Slash or divide
48 060 30 00110000 0 &#48 Zero
49 061 31 00110001 1 &#49 One
50 062 32 00110010 2 &#50 Two
51 063 33 00110011 3 &#51 Three
52 064 34 00110100 4 &#52 Four
53 065 35 00110101 5 &#53 Five
54 066 36 00110110 6 &#54 Six
55 067 37 00110111 7 &#55 Seven
56 070 38 00111000 8 &#56 Eight
57 071 39 00111001 9 &#57 Nine
58 072 3A 00111010 : &#58 &colon Colon
59 073 3B 00111011 ` &#59 &semi Semicolon
60 074 3C 00111100 < &#60 &lt Less than (or open angled bracket)
61 075 3D 00111101 = &#61 &equals Equals
62 076 3E 00111110 > &#62 &gt Greater than (or close angled bracket)
63 077 3F 00111111 ? &#63 &quest Question mark
64 100 40 01000000 @ &#64 &commat At sign
65 101 41 01000001 A &#65 Uppercase A
66 102 42 01000010 B &#66 Uppercase B
67 103 43 01000011 C &#67 Uppercase C
68 104 44 01000100 D &#68 Uppercase D
69 105 45 01000101 E &#69 Uppercase E
70 106 46 01000110 F &#70 Uppercase F
71 107 47 01000111 G &#71 Uppercase G
72 110 48 01001000 H &#72 Uppercase H
73 111 49 01001001 I &#73 Uppercase I
74 112 4A 01001010 J &#74 Uppercase J
75 113 4B 01001011 K &#75 Uppercase K
76 114 4C 01001100 L &#76 Uppercase L
77 115 4D 01001101 M &#77 Uppercase M
78 116 4E 01001110 N &#78 Uppercase N
79 117 4F 01001111 O &#79 Uppercase O
80 120 50 01010000 P &#80 Uppercase P
81 121 51 01010001 Q &#81 Uppercase Q
82 122 52 01010010 R &#82 Uppercase R
83 123 53 01010011 S &#83 Uppercase S
84 124 54 01010100 T &#84 Uppercase T
85 125 55 01010101 U &#85 Uppercase U
86 126 56 01010110 V &#86 Uppercase V
87 127 57 01010111 W &#87 Uppercase W
88 130 58 01011000 X &#88 Uppercase X
89 131 59 01011001 Y &#89 Uppercase Y
90 132 5A 01011010 Z &#90 Uppercase Z
91 133 5B 01011011 [ &#91 &lsqb Opening bracket
92 134 5C 01011100 ` &#92 &bsol Backslash
93 135 5D 01011101 ] &#93 &rsqb Closing bracket
94 136 5E 01011110 ^ &#94 &Hat Caret - circumflex
95 137 5F 01011111 _ &#95 &lowbar Underscore
96 140 60 01100000 ` &#96 &grave Grave accent
97 141 61 01100001 a &#97 Lowercase a
98 142 62 01100010 b &#98 Lowercase b
99 143 63 01100011 c &#99 Lowercase c
100 144 64 01100100 d &#100 Lowercase d
101 145 65 01100101 e &#101 Lowercase e
102 146 66 01100110 f &#102 Lowercase f
103 147 67 01100111 g &#103 Lowercase g
104 150 68 01101000 h &#104 Lowercase h
105 151 69 01101001 i &#105 Lowercase i
106 152 6A 01101010 j &#106 Lowercase j
107 153 6B 01101011 k &#107 Lowercase k
108 154 6C 01101100 l &#108 Lowercase l
109 155 6D 01101101 m &#109 Lowercase m
110 156 6E 01101110 n &#110 Lowercase n
111 157 6F 01101111 o &#111 Lowercase o
112 160 70 01110000 p &#112 Lowercase p
113 161 71 01110001 q &#113 Lowercase q
114 162 72 01110010 r &#114 Lowercase r
115 163 73 01110011 s &#115 Lowercase s
116 164 74 01110100 t &#116 Lowercase t
117 165 75 01110101 u &#117 Lowercase u
118 166 76 01110110 v &#118 Lowercase v
119 167 77 01110111 w &#119 Lowercase w
120 170 78 01111000 x &#120 Lowercase x
121 171 79 01111001 y &#121 Lowercase y
122 172 7A 01111010 z &#122 Lowercase z
123 173 7B 01111011 { &#123 &lcub Opening brace
124 174 7C 01111100 | &#124 &verbar Vertical bar
125 175 7D 01111101 } &#125 &rcub Closing brace
126 176 7E 01111110 ~ &#126 &tilde Equivalency sign - tilde
127 177 7F 01111111 &#127 Delete

Explanation of Control Characters

In ASCII encoding, the first 32 characters (from 0 to 31) and the last character (number 127) are invisible (non-displayable), but they all serve special functions. These are called control characters or function codes.

These 33 control characters are mostly related to communication, data storage, and older devices, though some of their meanings have changed in modern computers. Some of these control characters might require a bit of technical knowledge to fully understand, so beginners can skip over the more complex ones and focus on the easier concepts.

Here are some examples of control characters and their functions:

NUL (0)
The NULL character. Originally, it was intended to be a “no operation” (NOP) character, meaning it doesn’t do anything. It can be thought of as an empty placeholder.

In the early days of computing, NUL was used in punched tape systems to reserve a spot for later use—kind of like leaving a blank space in case you want to add something later.

Later on, NUL was adopted in the C programming language to mark the end of a string. If a NUL character appears within a string, it signals the end of that string. This makes it easy to define strings of any length, as long as there’s enough memory, and you can always terminate the string with a \0 (NULL), indicating the end of the string.

SOH (1) – Start Of Heading

SOH stands for “Start of Heading,” which is used to mark the beginning of a message or command in communication systems. It was originally defined in the 1963 ASCII standard as “Start of Message,” but later changed to “Start of Heading.”

Today, SOH is commonly used in master-slave communication systems, like in RS232 protocols. In this setup, the master device begins communication with SOH, signaling the start of a new message. This helps the slave device resynchronize if there’s an error during data transmission. Without a clear marker like SOH to indicate the start of each command, resynchronization would be much harder to achieve.

STX (2) and ETX (3) – Start of Text / End of Text

STX stands for “Start of Text” and marks the beginning of the actual data or message. ETX, on the other hand, stands for “End of Text” and signals the end of the data.

When transmitting data through a protocol, the transmission is often divided into “frames.” Each frame includes a header, which contains addressing information—like who the message is for and where it’s going—followed by the actual data you want to send.

STX marks the start of this data, and ETX marks the end. The specific content of the data, however, is defined by the protocol you’re using, not by ASCII.

In short, a frame consists of:

  • SOH (Start of Heading) marks the beginning of the frame header.
  • STX (Start of Text) marks the start of the data.
  • ETX (End of Text) marks the end of the data.

BEL (7)
The BEL character, also known as “bell,” triggers a sound, usually a beep, to grab attention. It’s part of the ASCII character set and was commonly used in early computing systems to alert users or signal problems. While the beep itself doesn’t come from the sound card or speakers, it’s usually emitted by a small internal speaker or buzzer connected to the motherboard. This sound was often heard when a system encountered errors or when a computer started up. However, modern computers often no longer have internal buzzers, so even if the BEL character is triggered, you might not hear anything.

BS (8)
The BS character stands for BackSpace, which is the backspace key on your keyboard. Over time, its role has evolved. Originally, on typewriters and early printers, pressing backspace moved the cursor one space back, allowing you to overwrite the previous character for emphasis. For example, pressing backspace after typing “a” might result in something like “aBS^.”

In modern systems, backspace not only moves the cursor but also deletes the character at the cursor’s position. This dual function is now standard, but its origins in physical typewriters are still reflected in the key’s behavior.

HT (9)
The HT character, or Horizontal Tab, is used to create space in text. It’s the same as the Tab key on your keyboard, and it helps with formatting by moving the cursor to the next tab stop. The width of a tab can vary, but it’s commonly set to four spaces on most systems. HT is a handy tool because it reduces the effort needed for manual spacing and can save storage space—one tab can replace multiple spaces.

LF (10)
LF stands for Line Feed, which means to move the cursor to the next line. In the early days, LF was used to move the print head of a typewriter or printer down one line, while another character, CR (Carriage Return), would return the print head to the beginning of the line.

In modern systems like Unix and C programming, LF has taken on the role of a “newline” character, which typically signifies both a Carriage Return and Line Feed combined (the equivalent of pressing both Enter and Return). MS-DOS and Windows, however, use both CR and LF together to represent a new line.

In most modern text editors, you can use just LF or the combined CR/LF, depending on the system.

VT (11)
The VT character stands for Vertical Tab, which, like the horizontal tab, is used for formatting text. It moves the cursor to the next predefined line, creating vertical space. However, VT is not widely used today, as LF (Line Feed) has largely replaced it for most tasks involving line breaks.

FF (12)
Form Feed – Page break. The “Form Feed” key is used to control printer behavior. When the printer receives this command, it moves to the next page.

Different devices may interpret this control character differently. For example, on some systems, it clears the screen, while others display the ^L character or simply move to the next line. In Unix/Linux, both Bash Shell and Tcsh treat FF as a screen-clear command.

CR (13)
Carriage Return – Moves the machine’s print head or cursor to the left margin.

Originally, CR was used to return the print head to the start of a line without moving down. Over time, it became associated with the “Enter” key, indicating the end of input. In modern systems, pressing “Enter” also moves the cursor to the next line, and in C and Unix, CR is often redefined to mean “move to the next line.” When CR is pressed, systems may automatically convert it to LF (Line Feed).

SO (14) and SI (15)
Shift Out (SO) and Shift In (SI) – Control characters for switching character sets or fonts.

These were introduced in the 1960s as part of the ASCII standard to handle multiple character sets. Initially, SO and SI were used to switch between Cyrillic and Latin alphabets. In Latin-based ASCII, SO was used to print double-width characters (like full-width), and SI printed compressed (half-width) characters.

DLE (16)
Data Link Escape – Used to interrupt a data stream and signal that the following characters should be handled differently.

DLE is used when control characters are part of the data stream but need to be treated as special. When DLE is detected, the receiving system treats the following characters in a unique way. However, the exact action depends on the system, as the ASCII standard only specifies the interruption without defining the handling process.

DC1 (17)
Device Control 1 – Also known as XON (Transmission ON).

DC1, or XON, is used for software flow control in serial communication. It resumes data transmission after it’s been paused by an XOFF signal. For example, pressing Ctrl+Q (equivalent to XON) can restart communication if it was interrupted due to an error, allowing the data transmission to continue smoothly.

DC3 (19)
Device Control 3, or XOFF (Transmission Off), is used to interrupt data transmission.

EM (25)
End of Medium. This indicates that the storage medium is full or that the logical end of the data has been reached, similar to how a tape or a tape head reaches the end of its physical medium. It doesn’t necessarily mean the physical end of the storage, but rather the logical endpoint.

FS (28)
File Separator. This control character is interesting because it shows how computers from the 1960s organized data.

Back then, storage systems were mostly sequential and not random-access like RAM or hard drives we use today. This included devices like punch cards, paper tapes, or magnetic tapes. In that context, a control character like FS was a clever way to separate two different files in a sequential storage system.

GS (29)
Group Separator. This was used to separate different groups of data in a sequential data storage system.

One reason ASCII includes control characters is for organizing data storage. In database design, data is often structured in tables, where each table contains multiple records of the same type. The GS character was used to separate different groups of records in a sequential system, long before Excel spreadsheets existed. In the early days, these were just referred to as “groups.”

RS (30)
Record Separator. This is used to separate individual records within a group or table.

US (31)
Unit Separator. In ASCII, the smallest data item in a database was called a “unit,” which we now refer to as a “field.” The US character separates these units in a sequential data storage environment.

Back then, storage space was limited, so fields were usually given enough space for the largest possible entry, even if they didn’t always need that much. This could waste a lot of space. The US control character helped save storage by allowing fields to have variable lengths.

DEL (127)
Delete. The DEL character is interesting because its value (127) is much higher than the other control characters, which range from 0 to 31.

This was due to its use on paper tapes, which typically had 7 holes for encoding data. The value 127 corresponds to the binary number 1111111, meaning all 7 bits are set to 1. When this was used on paper tape, it effectively punched all the holes, deleting any existing data.