ASCII Character Code Reference

[Copyright 1975,1979,1983,2002,2003,2004,2005,2007,2008,2011 Frank Durda IV, All Rights Reserved.
Mirroring of any material on this site in any form is expressly prohibited.
The official web site for this material is:  http://nemesis.lonestar.org
Contact this address for use clearances: clearance at nemesis.lonestar.org
Comments and queries to this address: web_reference at nemesis.lonestar.org]

The United States of America Standard Code for Information Interchange (USACII, later renamed American Standard Code for Information Interchange, or simply "ASCII") describes a communications system where 7-bit words represent printable symbols and control codes. The 1963 USACII standard went through numerous revisions between 1963 and 1968, when it was formally adopted in 1968 by the American National Standards Institute (ANSI). The ANSI X.3.4-1968 ASCII character code assignments are shown in the following table.

Least Significant Bits
0
0000
1
0001
2
0010
3
0011
4
0100
5
0101
6
0110
7
0111
8
1000
9
1001
A
1010
B
1011
C
1100
D
1101
E
1110
F
1111
M
o
s
t

S
i
g
n
i
f
i
c
a
n
t

B
i
t
s
0
000
NUL
(0)
00
SOH
(1)
01
STX
(2)
02
ETX
(3)
03
EOT
(4)
04
ENQ
(5)
05
ACK
(6)
06
BEL
(7)
07
BS
(8)
08
HT
(9)
09
LF
(10)
0A
VT
(11)
0B
FF
(12)
0C
CR
(13)
0D
SO
(14)
0E
SI
(15)
0F
1
001
DLE
(16)
10
DC1
(17)
11
DC2
(18)
12
DC3
(19)
13
DC4
(20)
14
NAK
(21)
15
SYN
(22)
16
ETB
(23)
17
CAN
(24)
18
EM
(25)
19
SUB
(26)
1A
ESC
(27)
1B
FS
(28)
1C
GS
(29)
1D
RS
(30)
1E
US
(31)
1F
2
010
SP
(32)
20
!
(33)
21
"
(34)
22
#
(35)
23
$
(36)
24
%
(37)
25
&
(38)
26
'
(39)
27
(
(40)
28
)
(41)
29
*
(42)
2A
+
(43)
2B
,
(44)
2C
-
(45)
2D
.
(46)
2E
/
(47)
2F
3
011
0
(48)
30
1
(49)
31
2
(50)
32
3
(51)
33
4
(52)
34
5
(53)
35
6
(54)
36
7
(55)
37
8
(56)
38
9
(57)
39
:
(58)
3A
;
(59)
3B
<
(60)
3C
=
(61)
3D
>
(62)
3E
?
(63)
3F
4
100
@
(64)
40
A
(65)
41
B
(66)
42
C
(67)
43
D
(68)
44
E
(69)
45
F
(70)
46
G
(71)
47
H
(72)
48
I
(73)
49
J
(74)
4A
K
(75)
4B
L
(76)
4C
M
(77)
4D
N
(78)
4E
O
(79)
4F
5
101
P
(80)
50
Q
(81)
51
R
(82)
52
S
(83)
53
T
(84)
54
U
(85)
55
V
(86)
56
W
(87)
57
X
(88)
58
Y
(89)
59
Z
(90)
5A
[
(91)
5B
\
(92)
5C
]
(93)
5D
^
(94)
5E
_
(95)
5F
6
110
`
(96)
60
a
(97)
61
b
(98)
62
c
(99)
63
d
(100)
64
e
(101)
65
f
(102)
66
g
(103)
67
h
(104)
68
i
(105)
69
j
(106)
6A
k
(107)
6B
l
(108)
6C
m
(109)
6D
n
(110)
6E
o
(111)
6F
7
111
p
(112)
70
q
(113)
71
r
(114)
72
s
(115)
73
t
(116)
74
u
(117)
75
v
(118)
76
w
(119)
77
x
(120)
78
y
(121)
79
z
(122)
7A
{
(123)
7B
|
(124)
7C
}
(125)
7D
~
(126)
7E
DEL
(127)
7F
(Information on printing color tables on color printers can be found here.)

In this table, the code or symbol name is shown on the first line, followed by the decimal value for that code or symbol, followed by the hexadecimal value. The binary value can be computed based on the row and column where the code or symbol resides, or directly from the hexadecimal value. For example, the character "+" has the binary value "010 1011", with "010" taken from the row and "1011" taken from the column. Similarly, the lowercase letter 'p' has the binary value "111 0000".

The background color for each code or symbol indicates the category that the code resides in. Red indicates control (non-printable) codes. Orange indicates basic punctuation and symbols. Yellow indicates numeric digits. Green indicates the uppercase letters. Blue indicates lowercase letters. (part of the extended character set). Purple indicates punctuation and symbols that are in the extended character set. (If color viewing is not available, the following table gives these categories as numeric ranges.)



ASCII Code Divisions and Categories

The ASCII code is divided into three main divisions and five categories as shown in this table:

Division
Category
ASCII Range
Decimal
Hexadecimal
Binary
Control
0 to 31 and 127
Control Characters (non-printable) (Red) 0 to 31 0x00 to 0x1F 0b0000000 to 0b0011111
127 0x7F 0b1111111
Basic Printable
32 to 95
Symbols and Punctuation (Orange) 32 to 47 0x20 to 0x2F 0b0100000 to 0b0101111
58 to 64 0x3A to 0x40 0b0111010 to 0b1000000
91 to 95 0x5B to 0x5F 0b1011011 to 0b1011111
Numbers (Yellow) 48 to 57 0x30 to 0x39 0b0110000 to 0b0111001
Uppercase Letters (Green) 65 to 90 0x41 to 0x5A 0b1000001 to 0b1011010
Extended Printable
96 to 126
Lowercase Letters (Blue) 97 to 122 0x61 to 0x7A 0b1100001 to 0b1111010
Extended Symbols and Punctuation (Purple) 96 0x60 0b1100000
123 to 126 0x7B to 0x7E 0b1111011 to 0b1111110

The extended printable character set was deliberately arranged so that if a symbol was received in this range and could not be displayed due to limitations of the printing or display device, the symbol in the basic printable range exactly 32 (0x20) positions earlier could be substituted and would provide reasonable results. In such situations, "{" and "}" would be displayed or printed as "[" and "]", while lowercase letters would be displayed or printed in uppercase.



ASCII Control Codes

This section gives the full names and some additional information on the 33 control characters in ASCII. The section is color coded by the original use of the given code, which may not reflect modern use, particularly when control characters are not being used to manage a data transmission.

Codes shown in Yellow are used for physical level synchronous link idle fills, editing, and DCE command functions. Orange is for codes primarily used in Synchronous transmission protocols, such as SDLC. Green codes are used to direct printer (or VDT display) paper and print head (cursor) non-printing movement. Blue codes are peripheral operator alert controls. Red codes divide data in higher level protocols, including some file system and multi-track tape format structures.

It should be mentioned that the operating systems for computers made by the late Digital Equipment Corporation (DEC), particularly the PDP-8, PDP-11, and PDP-10/DECSystem-10/DECSsystem-20 systems, had a profound and lasting influence on the uses of ASCII control characters that were employed by users at client terminals, directing operating systems and applications.

Virtually all the control character uses and conventions that were used in the DEC PDP-11 operating systems (including RT-11 and RSTS-11) were copied into the Digital Research CP/M operating system, which itself was later copied by other vendors to create a CP/M-clone that ran on the Intel 8088/8086 processor, an operating system which Microsoft Corporation bought, renamed PC-DOS, and licensed to IBM. (PC-DOS was later renamed again to MS-DOS.)

The earliest versions of UNIX were developed on DEC systems, and the Bell Labs programmers elected to use many of the same character codes that DEC had already defined for various functions in their operating systems.

Finally, additional control code uses from the DECSystem TOPS, TWENEX and ITS (aka "Incompatible Time Sharing") operating systems can be found today in BSD-derived versions of UNIX terminal drivers and applications. One specific duplication of the TOPS environment can be found in the "set filec" mode of BSD csh shell. (I am talking about the real BSD 4.x csh, not the Convex tcsh one commonly renamed as "csh" today. The "set" command with no parameters will show you if it is the real csh or not.)

ASCII
Mnemonic
Decimal
Binary
Hexadecimal
Control Key
Full Name
Notes and Common Uses
NUL 0
0b0000000
0x00
CTRL-@
NULL - No Punch Generates an unpunched position on paper tape (except for the traction hole) and was commonly used to create leader and trailer areas.
Some systems use the character to indicate a [BREAK] signal has been sent, even though an actual asynchronous modem break has no character code and is actually represented by a period exceeding at least one charactaer transmission time in duration with all spacing and no marking. (Discussed further below.)
Also used by some systems as an idle transmission or pad character instead of SYN.
SOH 1
0b0000001
0x01
CTRL-A
Start Of Heading  
STX 2
0b0000010
0x02
CTRL-B
Start Of Text  
ETX 3
0b0000011
0x03
CTRL-C
End Of Text Many DEC-derived systems use ETX as an interrupt signal to abort software execution.

Most UNIX systems used ETX as the default code to generate a SIGINT signal for the foreground process. (Exception: Solaris responds to ETX by sending the SIGINT signal to all processes on the given controlling terminal.)

EOT 4
0b0000100
0x04
CTRL-D
End Of Transmission DEC TOPS-10/20 and UNIX C shell use EOT for command line options displays.

In C language environments with a Standard In (stdin) device, a EOT can indicate that the end of input has been reached.

ENQ 5
0b0000101
0x05
CTRL-E
Enquiry,
Also known as WRU (Who aRe You), HERE IS, and Answerback
Some Teletype models would transmit an equipment identification string in response to this code.

In TOPS-20 environments, applications usually responded to ENQ by displaying the executable version string and other identifying information.

ACK 6
0b0000110
0x06
CTRL-F
Acknowledge  
BEL 7
0b0000111
0x07
CTRL-G
Bell Audible Signal or Alert, or visual indicator on some VDTs
BS 8
0b0001000
0x08
CTRL-H
Backspace The print head or cursor is moved one position to the left. In the case of VDTs, the character in that position may be erased, depending on local settings. If VDT cursor is already at column 1, cursor may move to end of previous line, depending on local settings.

In many operating systems and DCE devices, directs that the most recently entered character that has not yet been processed should be erased from the input buffer.

On some operating systems that are aware that the client is using a hard copy terminal, transmitting this character to the server causes the server to send a sequence of characters that indicate that the previous character has been discarded, but the print head does not actually back over the now-erased character. Deleting the last three characters of the sequence ABCDEF on a DECSystem-20 with a printing terminal would result in ABCDEF\F\E\D\ being printed. If a VDT was being used, the characters DEF would be erased and the cursor positioned just after the "C" character.

HT 9
0b0001001
0x09
CTRL-I
Horizontal Tabulation This moved the print head or cursor to the next tab stop, traditionally placed every eight columns. Some electronic printers and VDTs allow tab stops to be programmed. Most Teletype models did not implement horizontal tab stops and would ignore the code entirely.

Card punch systems skip the current card in response to this code.

The DEC TOPS-10/20 ESC COMND JSYS completion behavior is partly emulated in the UNIX tcsh shell using the HT code instead of the ESC code. The BSD csh shell uses the traditional ESC code for original TOPS behavior. (See ESC for more information.)

LF 10
0b0001010
0x0A
CTRL-J
Line Feed
(Paper Advance)
Paper Advance one line or move cursor down one line. If VDT is at the bottom of screen already, scroll screen one line or wrap to top, depending on settings.

UNIX system display routines treat LF as though it received CR and LF in most situations. However, TCP communication software on UNIX systems running in the default "cooked" mode must use the proper CR/LF sequence to end a given line of ASCII text that is transmitted or received.

VT 11
0b0001011
0x0B
CTRL-K
Vertical Tabulation Paper Advance by number of lines dictated by the control tape or similar mechanism.
FF 12
0b0001100
0x0C
CTRL-L
Form Feed Paper Advance to next page, screen clear and/or position to top or bottom line on some VDTs.
CR 13
0b0001101
0x0D
CTRL-M
Carriage Return Move print head or cursor to column 1.

Early TRS-80 systems performed the actions of a CR and LF when a CR was output by applications.

SO 14
0b0001110
0x0E
CTRL-N
Shift Out De-select alternate Font or character set on some equipment. On terminals with APL character sets, this code would return to using the normal ASCII character set.
SI 15
0b0001111
0x0F
CTRL-O
Shift In Select alternate Font or character set (such as APL) on some equipment
DLE 16
0b0010000
0x10
CTRL-P
Data Link Escape Controls access on some DCE equipment by the DTE, such as voice-capable modems. On DEC VAX and some related equipment, halts main processor when entered from console.
DC1 17
0b0010001
0x11
CTRL-Q
Device Control 1,
Also known as X-ON
Starts paper reader on some equipment. Seven bit asynchronous communications frequently use this code for flow control.
DC2 18
0b0010010
0x12
CTRL-R
Device Control 2 Starts paper punch or tape recorder on some equipment
DC3 19
0b0010011
0x13
CTRL-S
Device Control 3,
Also known as X-OFF
Stops paper reader on some equipment. Seven bit asynchronous communications frequently use this code for flow control.
DC4 20
0b0010100
0x14
CTRL-T
Device Control 4 Stops paper punch or tape recorder on some equipment.

On DEC TOPS10/20 and some UNIX platforms, causes the display of current load status of system or run status of the foreground process. May also cause the controlling terminal shell to send a SIGINFO signal to a foreground process on modern BSD and BSD-dervied UNIX platforms.

NAK 21
0b0010101
0x15
CTRL-U
Negative Acknowledge May initiate re-transmit of frame in Synchronous transmission systems.

DEC-derived systems use this to discard/erase an entire unprocessed line of command text.

SYN 22
0b0010110
0x16
CTRL-V
Sychronous Idle Transmitted to maintain timing on a Synchronous data link when no other data was ready for transmission.
ETB 23
0b0010111
0x17
CTRL-W
End of Transmission Block  
CAN 24
0b0011000
0x18
CTRL-X
Cancel Early HP operating systems used CAN to signal that an entire line of unprocessed command text was to be discarded.
EM 25
0b0011001
0x19
CTRL-Y
End of Medium  
SUB 26
0b0011010
0x1A
CTRL-Z
Substitute BSD-derived shells use this code to suspend program execution.

Some older DEC-derived systems use this character as an end-of-file indicator for text files.

ESC 27
0b0011011
0x1B
CTRL-[
Escape For output to displays, ESC is commonly used to begin a sequence of characters that are used to alter terminal behavior. In these display control systems, the characters that immediately follow the ESC character instruct the receiving device to reposition the display cursor or printing position, erase the screen or reposition paper, alter character sets or display colors to be used from this point forward, even start or stop peripherals attached to the terminal.
In the 1970s, numerous display control code systems were developed by the various manufacturers. However, the control code system developed by DEC for the VT50/VT52 display terminals were extremely popular and emulated by other manufacturer equipment. The VT50/VT52 display control system was expanded for the DEC VT100, and that command set was largely adopted as an ANSI standard which is widely used today.

For input from terminals, DEC TOPS-10/20 and UNIX csh shell use this code to attempt a command line completion or guide word display. (Guide words only in TOPS COMND JSYS calls.)

FS 28
0b0011100
0x1C
CTRL-\
File Separator By default, the command shells in UNIX systems treat this as a QUIT signal, and will pass a signal to the current foreground process that it should abort, and if allowed and possible, make a core dump.
GS 29
0b0011101
0x1D
CTRL-]
Group Separator  
RS 30
0b0011110
0x1E
CTRL-^
Record Separator  
US 31
0b0011111
0x1F
CTRL-_
Unit Separator  
DEL 127
0b1111111
0x7F
No Standard
Delete,
Also known as RUB OUT
Used on paper tape systems to "erase" a bad punch by over-punching an incorrect byte with all holes. Some systems also used this code to create a leader and trailer sequence for paper and digital tape recordings. Some operating systems from the paper tape and punch card eras ignore this code when received.

Some operating systems use this as an alternate to the Back space (BS) code, erasing the most recently received and unprocessed input character.




Evolution of ASCII

There have been several versions of the ASCII coding system. There were formal versions in 1963, 1965, 1967 and the ANSI version in 1968. The following list details many of the changes made to the coding system during this period.



Symbols not included in ASCII

There are numerous symbols that do not exist in ASCII but might seem logical to have. Some do exist in other character sets, but these are not part of ASCII.

ASCII has only 94 code combinations that can be used to produce printable characters. Since 52 codes are consumed by the alphabet, and another 10 are consumed by numeric digits, this only leaves 32 codes for punctuation and other symbols. And in those 32 codes, 5 had to contain similar looking characters to characters found in the other set of 5. For example, eg '[' and ']' is in one set of 5 with '{' and '}' in the other set of 5. and so on. This design of ASCII was intentionally organized to allow simpler display devices to be produced that only had to print 62 of the 94 ASCII printable codes and could substitute something "close" when asked to display an ASCII character that the device was incapable of producing, such as using the uppercase letter when the lowercase letter could not be printed.

Subsequently, many symbols that might be desired are just not present in ASCII. For example, to provide all accent marks commonly used in European languages on vowels, as many as twelve codes per vowel would be needed, requiring perhaps sixty codes. The ASCII character coding system just doesn't have the space to include these symbols.

Here is a list of codes that people frequently inquire about that do not exist in ASCII.

ISO-8859 and other character sets provide some of these desired symbols (although ISO-8859 does not include any of the ASCII codes shown in the table above), while some symbols are impractical to provide. Usually, this is simply due to the limited number of code combinations available, but in some cases, the complexity of the symbol itself and an inability to make it legible in the space provided is an additional reason as to why the symbol was not included. Asian symbol and some Middle Eastern language character sets are a particular problem on this point.

Some display systems also offer special "fonts" that include symbols specific to certain occupations or world regions, but these typically re-use the same numerical code values that ASCII uses for its printable and extended printable characters. Because of the overlap, in order to mix special character codes and normal ASCII characters together, the special font must be activated, the special character selected, and then the special font deactivated. This must be repeated each time a character from the set not currently selected is desired, and in some equipment, only one set can be displayed at a time.

The typical World Wide Web browser is able to display both ASCII and ISO-8859 characters simultaneously because their numerical codes do not overlap. However, since most keyboards can only produce ASCII characters, the display of ISO-8859 characters is achieved by using HTML escape codes that are entered using ASCII codes. For example in the HTML language, the ASCII character sequence '&cent;' in an HTML document will display the ISO-8859 character '¢' on most systems.



ASCII printable characters and color, font type and font size issues

It should be understood that no ASCII code specifies the font type, font size or color of the ASCII printable characters. These and any other additional attributes of printable text are optionally specified at a higher coding level, usually by preceding the target characters with an escape sequence, followed by instructions specifying how printable characters from this point forward should be displayed.

For example, in the HTML language, the HTML tag sequence <FONT COLOR="#FF6666"> specifies that subsequent characters should be displayed in the same font type and size previously used, but that when displayed on a device capable of displaying colors, the color of the subsequent characters should be displayed in the specified shade of red. You can see some color tables and the HTML values needed to produce them in the appendix of this document: The Use and Misuse of Color in Web Pages



A note about BREAK and Modem BREAK signaling

BREAK and Modem BREAK are not character codes or symbols of any coding system. They are actually line signaling conditions initiated by the sender in an asynchronous serial communications system.

When the BREAK key is pressed on a real communications terminal or similar device, the asynchronous serial transmission line begins to send continuous Spacing, the opposite of the "rest" state of continuous Marking. While the BREAK signal condition is present, there are no start, data, stop or parity bits being sent.

If the duration of a continuous spacing condition exceeds 1.6 seconds, it is usually considered to be a Modem BREAK indication. Traditionally, a Modem BREAK indication directed the local and distant modems to drop carrier and end the call. Some teleprinters also turned their motors off in response to this signal, while some "party line" networks used a BREAK signal to attract the attention of the network controller.

In "current-loop" transmission systems, the Spacing condition is equivalent to a lack of current present on the loop, and if the condition persisted it was treated as a "break" in the circuit.

On time-sharing dial-up systems, a BREAK signal is usually interpreted by the receiving communications controller or computer as a directive to stop the current operation, usually by behaving as though it had received some ASCII control character that normally performed the interrupt function, like CTRL-C.

On keyboards with computers attached or integrated (such as all modern PCs), the BREAK key (if present) likely is translated from a keyboard scan code into some ASCII code (such as CTRL-C), rather than producing an actual BREAK line condition. If an actual BREAK signal was to be sent out a serial port, this would have to be done by a special instruction to the tty/serial driver of that operating system. (POSIX environments use an ioctl() call to do this.)



Early Uses of ASCII and alternate coding systems

One of the earliest 7-bit ASCII devices was an improved line of electro-mechanical printers made by the Teletype corporation. With an operational speed of up to 10 characters per second, these devices were used worldwide for message transmission by Western Union, various news wire services and the military. Later, these devices found new uses as input/output devices connected to computer systems that also communicated using the ASCII character set.

The most widely-manufactured Teletype model was number 33, which was sold under a variety of model names such as the KSR-33 and ASR-33. These devices could only print the basic printable character portion of the ASCII character set (64 characters). This limited these devices to uppercase letters, numbers and most punctuation characters as shown in the table above. Some early video terminals and computers (such as the Digital Equipment Corporation VT50 and the Radio Shack TRS-80 Model I) supported only the basic printable set of characters, despite being designed and manufactured years after the ASCII extended character set was adopted. Some manufacturers did offer upgrades that allowed for the display of all ASCII printable characters.

Prior to the introduction of the ASCII-based teletype printers, the Teletype corporation produced teleprinters that used Baudot or "5-Level" character codes, operating at speeds between 40 and 75 baud. These were widely used for over thirty years, but were largely removed from service by the mid 1960s.

IBMs earlier mainframe computers (notably the IBM 360 and 370 families) did not use ASCII. Instead, they used an alternate character coding system called EBCDIC which was devised by IBM as a way to ensure that any peripherals to be connected to IBM computers were also made by IBM. IBM eventually lost this battle and by the late 1970s, it was common to see IBM systems that used EBCDIC internally, but had external communication processors that translated transmissions between IBMs EBCDIC and what other equipment makers were using, which was ASCII.



Related Topics

Extended Binary Coded Decimal Information Code (EBCDIC) Reference (HTML)

Baudot (5-Level) Character Code Reference (HTML)

SIXBIT Character Code Reference (HTML)

RADIX50 Character Code Reference (HTML)

Baudot (5-Level) Character Code Reference (HTML)

Return to the Telecommunications Reference Index (HTML)


[Copyright 1975,1979,1983,2002,2003,2004,2005,2007,2008,2011 Frank Durda IV, All Rights Reserved.
Mirroring of any material on this site in any form is expressly prohibited.
The official web site for this material is:  http://nemesis.lonestar.org
Contact this address for use clearances: clearance at nemesis.lonestar.org
Comments and queries to this address: web_reference at nemesis.lonestar.org]

Visit the nemesis.lonestar.org home page and index


Valid HTML 4.01!