The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) :: RFC5892
Internet Engineering Task Force (IETF) P. Faltstrom, Ed.
Request for Comments: 5892 Cisco
Category: Standards Track August 2010
ISSN: 2070-1721
The Unicode Code Points and
Internationalized Domain Names for Applications (IDNA)
Abstract
This document specifies rules for deciding whether a code point,
considered in isolation or in context, is a candidate for inclusion
in an Internationalized Domain Name (IDN).
It is part of the specification of Internationalizing Domain Names in
Applications 2008 (IDNA2008).
Status of This Memo
This is an Internet Standards Track document.
This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 5741.
Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
http://www.rfc-editor.org/info/rfc5892.
Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Faltstrom Standards Track [Page 1]
RFC 5892 IDNA Code Points August 2010
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Category Definitions Used to Calculate Derived Property
Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1. LetterDigits (A) . . . . . . . . . . . . . . . . . . . . . 5
2.2. Unstable (B) . . . . . . . . . . . . . . . . . . . . . . . 6
2.3. IgnorableProperties (C) . . . . . . . . . . . . . . . . . 6
2.4. IgnorableBlocks (D) . . . . . . . . . . . . . . . . . . . 7
2.5. LDH (E) . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6. Exceptions (F) . . . . . . . . . . . . . . . . . . . . . . 7
2.7. BackwardCompatible (G) . . . . . . . . . . . . . . . . . . 9
2.8. JoinControl (H) . . . . . . . . . . . . . . . . . . . . . 9
2.9. OldHangulJamo (I) . . . . . . . . . . . . . . . . . . . . 9
2.10. Unassigned (J) . . . . . . . . . . . . . . . . . . . . . . 9
3. Calculation of the Derived Property . . . . . . . . . . . . . 10
4. Code Points . . . . . . . . . . . . . . . . . . . . . . . . . 10
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
5.1. IDNA-Derived Property Value Registry . . . . . . . . . . . 11
5.2. IDNA Context Registry . . . . . . . . . . . . . . . . . . 11
5.2.1. Template for Context Registry . . . . . . . . . . . . 11
6. Security Considerations . . . . . . . . . . . . . . . . . . . 12
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 12
Appendix A. Contextual Rules Registry . . . . . . . . . . . . . 13
Appendix A.1. ZERO WIDTH NON-JOINER . . . . . . . . . . . . . . . 15
Appendix A.2. ZERO WIDTH JOINER . . . . . . . . . . . . . . . . . 16
Appendix A.3. MIDDLE DOT . . . . . . . . . . . . . . . . . . . . . 16
Appendix A.4. GREEK LOWER NUMERAL SIGN (KERAIA) . . . . . . . . . 17
Appendix A.5. HEBREW PUNCTUATION GERESH . . . . . . . . . . . . . 17
Appendix A.6. HEBREW PUNCTUATION GERSHAYIM . . . . . . . . . . . . 18
Appendix A.7. KATAKANA MIDDLE DOT . . . . . . . . . . . . . . . . 18
Appendix A.8. ARABIC-INDIC DIGITS . . . . . . . . . . . . . . . . 19
Appendix A.9. EXTENDED ARABIC-INDIC DIGITS . . . . . . . . . . . . 19
Appendix B. Code Points 0x0000 - 0x10FFFF . . . . . . . . . . . 20
Appendix B.1. Code Points in Unicode Character Database (UCD)
Format . . . . . . . . . . . . . . . . . . . . . . . 20
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.1. Normative References . . . . . . . . . . . . . . . . . . . 69
8.2. Informative References . . . . . . . . . . . . . . . . . . 69
Faltstrom Standards Track [Page 2]
RFC 5892 IDNA Code Points August 2010
1. Introduction
RFC 4690 [RFC4690] suggests an inclusion-based approach for selecting
the code points from The Unicode Standard [Unicode52] that should be
included in the list of code points that may be used in
Internationalized Domain Names.
Specifically, RFC 4690 [RFC4690] says the following:
The IAB has concluded that there is a consensus within the broader
community that lists of code points should be specified by the use
of an inclusion-based mechanism (i.e., identifying the characters
that are permitted), rather than by excluding a small number of
characters from the total Unicode set as Stringprep [RFC3454] and
Nameprep [RFC3491] do today. That conclusion should be reviewed
by the IETF community and action taken as appropriate.
This document reviews and classifies the collections of code points
in the Unicode character set by examining various properties of the
code points. It then defines an algorithm for determining a derived
property value. It specifies a procedure, and not a table, of code
points so that the algorithm can be used to determine code point sets
independent of the version of Unicode that is in use.
This document is not intended to specify precisely how these property
values are to be applied in IDN labels. That information appears in
the Protocol document [RFC5891], but it is important to understand
that the assignment of a value of this property to a particular
character is not sufficient to determine whether it can be used in a
given label. In particular, some combinations of allowed code points
are not advisable for use in IDNs due to rules specific to a script
or class of characters. The requirement for such rules is linked to
the operations in the Protocol document and especially to the
characters designated as requiring contextual rules.
The value of the property is to be interpreted as follows.
o PROTOCOL VALID: Those that are allowed to be used in IDNs. Code
points with this property value are permitted for general use in
IDNs. However, that a label consists only of code points that
have this property value does not imply that the label can be used
in DNS. See the Protocol document for algorithms to make
decisions about labels in domain names. The abbreviated term
PVALID is used to refer to this value in the rest of this
document.
Faltstrom Standards Track [Page 3]
RFC 5892 IDNA Code Points August 2010
o CONTEXTUAL RULE REQUIRED: Some characteristics of the character,
such as it being invisible in certain contexts or problematic in
others, require that it not be used in labels unless specific
other characters or properties are present. The abbreviated term
CONTEXT is used to refer to this value in the rest of this
document. There are two subdivisions of CONTEXTUAL RULE REQUIRED,
one for Join_controls (called CONTEXTJ) and for other characters
(called CONTEXTO). These are discussed in more detail below and
in the Protocol document.
o DISALLOWED: Those that should clearly not be included in IDNs.
Code points with this property value are not permitted in IDNs.
o UNASSIGNED: Those code points that are not designated (i.e., are
unassigned) in the Unicode Standard.
The mechanisms described here allow determination of the value of the
property for future versions of Unicode (including characters added
after Unicode 5.2). Changes in Unicode properties that do not affect
the outcome of this process do not affect IDN. For example, a
character can have its Unicode General_Category value (see
[Unicode52]) change from So to Sm or from Lo to Ll, without affecting
the algorithm results. Moreover, even if such changes were the
result, the BackwardCompatible list (Section 2.7) can be adjusted to
ensure the stability of the results.
Some code points need to be allowed in exceptional circumstances but
should be excluded in all other cases; these rules are also described
in other documents. The most notable of these are the Join Control
characters, U+200D ZERO WIDTH JOINER and U+200C ZERO WIDTH
NON-JOINER. Both of them have the derived property value CONTEXTJ.
A character with the derived property value CONTEXTJ or CONTEXTO
(CONTEXTUAL RULE REQUIRED) is not to be used unless an appropriate
rule has been established and the context of the character is
consistent with that rule. It is invalid to either register a string
containing these characters or even to look one up unless such a
contextual rule is found and satisfied. Please see Appendix A, "The
Contextual Rules Registry", for more information.
This document is part of a series that, together, constitute a
proposal for updating the IDNA standards to resolve issues uncovered
in recent years, cover a broader range of scripts, and provide for
migration to newer versions of Unicode. See the Rationale document
[RFC5894] for a broader discussion.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
Faltstrom Standards Track [Page 4]
RFC 5892 IDNA Code Points August 2010
2. Category Definitions Used to Calculate Derived Property Value
The derived property obtains its value based on a two-step procedure.
First, characters are placed in one or more character categories
based on either core properties defined by the Unicode Standard or by
treating the code point as an exception and addressing the code point
by its code point value. These categories are not mutually
exclusive.
In the second step, set operations are used with these categories to
determine the values for an IDN-specific property. Those operations
are specified in Section 3.
Unicode property names and property value names may have short
abbreviations, such as gc for the General_Category property, and Ll
for the Lowercase_Letter property value of the gc property.
In the following specification of categories, the operation that
returns the value of a particular Unicode character property for a
code point is designated by using the formal name of that property
(from PropertyAliases.txt) followed by '(cp)'. For example, the
value of the General_Category property for a code point is indicated
by General_Category(cp).
2.1. LetterDigits (A)
A: General_Category(cp) is in {Ll, Lu, Lo, Nd, Lm, Mn, Mc}
These rules identify characters commonly used in mnemonics and often
informally described as "language characters". In general, only code
points assigned to this category are suitable for use in IDN.
For more information, see Section 4.5 of The Unicode Standard
[Unicode].
The categories used in this rule are:
o Ll - Lowercase_Letter
o Lu - Uppercase_Letter
o Lo - Other_Letter
o Nd - Decimal_Number
o Lm - Modifier_Letter
Faltstrom Standards Track [Page 5]
RFC 5892 IDNA Code Points August 2010
o Mn - Nonspacing_Mark
o Mc - Spacing_Mark
2.2. Unstable (B)
B: toNFKC(toCaseFold(toNFKC(cp))) != cp
This category is used to group the characters that are not stable
under Normalization Form K (NFKC) and case folding. In general,
these code points are not suitable for use for IDN.
The toCaseFold() operation is defined in Section 3.13 of The Unicode
Standard [Unicode].
The toNFKC() operation returns the code point in normalization form
KC. For more information, see Section 5 of Unicode Standard Annex
#15 [TR15].
It should be noted that NFKC is used, although Normalization Form C
(NFC) is used in the "IDNA Protocol" document [RFC5891].
2.3. IgnorableProperties (C)
C: Default_Ignorable_Code_Point(cp) = True or
White_Space(cp) = True or
Noncharacter_Code_Point(cp) = True
This category is used to group code points that are not recommended
for use in identifiers. In general, these code points are not
suitable for use in an IDN.
The definition for Default_Ignorable_Code_Point can be found in
DerivedCoreProperties.txt [DerivedCoreProperties] and is at the time
of Unicode 5.2:
Other_Default_Ignorable_Code_Point + Cf (Format characters)
+ Variation_Selector - White_Space - FFF9..FFFB (Annotation
Characters) - 0600..0603, 06DD, 070F (exceptional Cf characters
that should be visible)
Faltstrom Standards Track [Page 6]
RFC 5892 IDNA Code Points August 2010
2.4. IgnorableBlocks (D)
D: Block(cp) is in {Combining Diacritical Marks for Symbols,
Musical Symbols, Ancient Greek Musical Notation}
This category is used to identify code points that are not useful in
mnemonics or that are otherwise impractical for IDN use. In general,
these code points are not suitable for use for IDN.
The definition of blocks can be found in Blocks.txt [BlockNames].
2.5. LDH (E)
E: cp is in {002D, 0030..0039, 0061..007A}
This category is used in the second step to preserve the traditional
"hostname" (LDH -- as described in the Definitions document
[RFC5890]) characters ('-', 0-9, and a-z). In general, these code
points are suitable for use for IDN. Note that there are other rules
regarding the code point U+002D HYPHEN-MINUS that are specified in
the IDNA Protocol Specification [RFC5891].
2.6. Exceptions (F)
F: cp is in {00B7, 00DF, 0375, 03C2, 05F3, 05F4, 0640, 0660,
0661, 0662, 0663, 0664, 0665, 0666, 0667, 0668,
0669, 06F0, 06F1, 06F2, 06F3, 06F4, 06F5, 06F6,
06F7, 06F8, 06F9, 06FD, 06FE, 07FA, 0F0B, 3007,
302E, 302F, 3031, 3032, 3033, 3034, 3035, 303B,
30FB}
This category explicitly lists code points for which the category
cannot be assigned using only the core property values that exist in
the Unicode standard. The values are according to the table below:
PVALID -- Would otherwise have been DISALLOWED
00DF; PVALID # LATIN SMALL LETTER SHARP S
03C2; PVALID # GREEK SMALL LETTER FINAL SIGMA
06FD; PVALID # ARABIC SIGN SINDHI AMPERSAND
06FE; PVALID # ARABIC SIGN SINDHI POSTPOSITION MEN
0F0B; PVALID # TIBETAN MARK INTERSYLLABIC TSHEG
3007; PVALID # IDEOGRAPHIC NUMBER ZERO
Faltstrom Standards Track [Page 7]
RFC 5892 IDNA Code Points August 2010
CONTEXTO -- Would otherwise have been DISALLOWED
00B7; CONTEXTO # MIDDLE DOT
0375; CONTEXTO # GREEK LOWER NUMERAL SIGN (KERAIA)
05F3; CONTEXTO # HEBREW PUNCTUATION GERESH
05F4; CONTEXTO # HEBREW PUNCTUATION GERSHAYIM
30FB; CONTEXTO # KATAKANA MIDDLE DOT
CONTEXTO -- Would otherwise have been PVALID
0660; CONTEXTO # ARABIC-INDIC DIGIT ZERO
0661; CONTEXTO # ARABIC-INDIC DIGIT ONE
0662; CONTEXTO # ARABIC-INDIC DIGIT TWO
0663; CONTEXTO # ARABIC-INDIC DIGIT THREE
0664; CONTEXTO # ARABIC-INDIC DIGIT FOUR
0665; CONTEXTO # ARABIC-INDIC DIGIT FIVE
0666; CONTEXTO # ARABIC-INDIC DIGIT SIX
0667; CONTEXTO # ARABIC-INDIC DIGIT SEVEN
0668; CONTEXTO # ARABIC-INDIC DIGIT EIGHT
0669; CONTEXTO # ARABIC-INDIC DIGIT NINE
06F0; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT ZERO
06F1; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT ONE
06F2; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT TWO
06F3; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT THREE
06F4; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT FOUR
06F5; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT FIVE
06F6; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT SIX
06F7; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT SEVEN
06F8; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT EIGHT
06F9; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT NINE
DISALLOWED -- Would otherwise have been PVALID
0640; DISALLOWED # ARABIC TATWEEL
07FA; DISALLOWED # NKO LAJANYALAN
302E; DISALLOWED # HANGUL SINGLE DOT TONE MARK
302F; DISALLOWED # HANGUL DOUBLE DOT TONE MARK
3031; DISALLOWED # VERTICAL KANA REPEAT MARK
3032; DISALLOWED # VERTICAL KANA REPEAT WITH VOICED SOUND MARK
3033; DISALLOWED # VERTICAL KANA REPEAT MARK UPPER HALF
3034; DISALLOWED # VERTICAL KANA REPEAT WITH VOICED SOUND MARK UPPER HA
3035; DISALLOWED # VERTICAL KANA REPEAT MARK LOWER HALF
303B; DISALLOWED # VERTICAL IDEOGRAPHIC ITERATION MARK
Faltstrom Standards Track [Page 8]
RFC 5892 IDNA Code Points August 2010
2.7. BackwardCompatible (G)
G: cp is in {}
This category includes the code points that property values in
versions of Unicode after 5.2 have changed in such a way that the
derived property value would no longer be PVALID or DISALLOWED. If
changes are made to future versions of Unicode so that code points
might change the property value from PVALID or DISALLOWED, then this
table can be updated and keep special exception values so that the
property values for code points stay stable.
2.8. JoinControl (H)
H: Join_Control(cp) = True
This category consists of Join Control characters (i.e., they are not
in LetterDigits (Section 2.1) but are still required in IDN labels
under some circumstances).
2.9. OldHangulJamo (I)
I: Hangul_Syllable_Type(cp) is in {L, V, T}
This category consists of all conjoining Hangul Jamo (Leading Jamo,
Vowel Jamo, and Trailing Jamo).
Elimination of conjoining Hangul Jamo from the set of PVALID
characters results in restricting the set of Korean PVALID characters
just to preformed, modern Hangul syllable characters. Old Hangul
syllables, which must be spelled with sequences of conjoining Hangul
Jamo, are not PVALID for IDNs.
2.10. Unassigned (J)
J: General_Category(cp) is in {Cn} and
Noncharacter_Code_Point(cp) = False
This category consists of code points in the Unicode character set
that are not (yet) assigned. It should be noted that Unicode
distinguishes between "unassigned code points" and "unassigned
characters". The unassigned code points are all but (Cn -
Noncharacters), while the unassigned *characters* are all but (Cn +
Cs).
Faltstrom Standards Track [Page 9]
RFC 5892 IDNA Code Points August 2010
3. Calculation of the Derived Property
As described above (Section 1) and in more detail in the IDNA
Protocol document [RFC5891], possible values of the IDN property are:
o PVALID
o CONTEXTJ
o CONTEXTO
o DISALLOWED
o UNASSIGNED
The algorithm to calculate the value of the derived property is as
follows. If the name of a rule (such as Exception) is used, that
implies the set of code points that the rule defines, while the same
name as a function call (such as Exception(cp)) implies the value cp
has in the Exceptions table.
If .cp. .in. Exceptions Then Exceptions(cp);
Else If .cp. .in. BackwardCompatible Then BackwardCompatible(cp);
Else If .cp. .in. Unassigned Then UNASSIGNED;
Else If .cp. .in. LDH Then PVALID;
Else If .cp. .in. JoinControl Then CONTEXTJ;
Else If .cp. .in. Unstable Then DISALLOWED;
Else If .cp. .in. IgnorableProperties Then DISALLOWED;
Else If .cp. .in. IgnorableBlocks Then DISALLOWED;
Else If .cp. .in. OldHangulJamo Then DISALLOWED;
Else If .cp. .in. LetterDigits Then PVALID;
Else DISALLOWED;
4. Code Points
The categories and rules defined in Sections 2 and 3 apply to all
Unicode code points. The table in Appendix B shows, for illustrative
purposes, the consequences of the categories and classification
rules, and the resulting property values.
The list of code points that can be found in Appendix B is
non-normative. Sections 2 and 3 are normative.
Faltstrom Standards Track [Page 10]
RFC 5892 IDNA Code Points August 2010
5. IANA Considerations
5.1. IDNA-Derived Property Value Registry
IANA has created a registry with the derived properties for the
versions of Unicode released after (and including) version 5.2. The
derived property value is to be calculated in cooperation with a
designated expert [RFC5226] according to the specifications in
Sections 2 and 3 and not by copying the non-normative table found in
Appendix B.
If non-backward-compatible changes or other problems arise during the
creation or designated expert review of the table of derived property
values, they should be flagged for the IESG. Changes to the rules
(as specified in Sections 2 and 3), including BackwardCompatible
(Section 2.7) (a set that is at release of this document is empty)
require IETF Review, as described in RFC 5226 [RFC5226].
5.2. IDNA Context Registry
For characters that are defined in the IDNA derived property value
registry (Section 5.1) as CONTEXTO or CONTEXTJ and that therefore
require a contextual rule, IANA has created and now maintains a list
of approved contextual rules. Additions or changes to these rules
require IETF Review, as described in [RFC5226].
Appendix A contains further discussion and a table from which that
registry can be initialized.
5.2.1. Template for Context Registry
The following information is to be given when a new rule is created.
Name: Unique name of the rule
Code point: Rule that should be applied when this code point
exists in the label
Overview: Description in plain English on what the rule verifies
Lookup: Should the rule be applied at time of lookup?
Rule Set: The set of rules, with a reference to the defining
document.
Faltstrom Standards Track [Page 11]
RFC 5892 IDNA Code Points August 2010
6. Security Considerations
Security Considerations for this version of IDNA, except for the
special issues associated with right-to-left scripts and characters,
are described in the Definitions document [RFC5890]. Specific issues
for labels containing characters associated with scripts written
right to left appear in the Bidi document [RFC5893].
7. Acknowledgements
This document would not have been possible to produce without input
from many people. The main contributors are (in alphabetical order)
Harald Alvestrand, Vint Cerf, Tina Dam, Mark Davis, Gihan Dias,
Mouhammet Diop, Michael Everson, Asmus Freytag, Debbie Garside, Paul
Hoffman, Kent Karlsson, Cary Karp, Jaeyoun Kim, John Klensin, Olaf
Kolkman, Gervase Markham, Ram Mohan, Lisa Moore, Yngve Pettersen,
Erik van der Poel, Hualin Qian, Rick Reed, Pete Resnick, Lakmal
Silva, Michel Suignard, Andrew Sullivan, Wil Tan, Kenneth Whistler,
Chris Wright, and Yoshiro Yoneya.
Faltstrom Standards Track [Page 12]
RFC 5892 IDNA Code Points August 2010
Appendix A. Contextual Rules Registry
As discussed in Section 5.2 and in the IANA Considerations section of
the Rationale document [RFC5894], a registry of rules that define the
contexts in which particular PROTOCOL-VALID characters, characters
associated with a requirement for Contextual Information, are
permitted. These rules are expressed as tests on the label in which
the characters appear (all, or any part of, the label may be tested).
The grammatical rules are expressed in pseudo-code. The conventions
used for that pseudo-code are explained here.
Each rule is constructed as a Boolean expression that evaluates to
either True or False. A simple "True;" or "False;" rule sets the
default result value for the rule set. Subsequent conditional rules
that evaluate to True or False may re-set the result value.
A special value "Undefined" is used to deal with any error
conditions, such as an attempt to test a character before the start
of a label or after the end of a label. If any term of a rule
evaluates to Undefined, further evaluation of the rule immediately
terminates, as the result value of the rule will itself be Undefined.
cp represents the code point to be tested.
FirstChar is a special term that denotes the first code point in a
label.
LastChar is a special term that denotes the last code point in a
label.
.eq. represents the equality relation.
A .eq. B evaluates to True if A equals B.
.is. represents checking the position in a label.
A .is. B evaluates to True if A and B have same position in
the same label.
.ne. represents the non-equality relation.
A .ne. B evaluates to True if A is not equal to B.
.in. represents the set inclusion relation.
A .in. B evaluates to True if A is a member of the set B.
Faltstrom Standards Track [Page 13]
RFC 5892 IDNA Code Points August 2010
A functional notation, Function_Name(cp), is used to express either
string positions within a label, Boolean character property tests of
a code point, or a regular expression match. When such function
names refer to Boolean character property tests, the function names
use the exact Unicode character property name for the property in
question, and "cp" is evaluated as the Unicode value of the code
point to be tested, rather than as its position in the label. When
such function names refer to string positions within a label, "cp" is
evaluated as its position in the label.
RegExpMatch(X) takes as its parameter X a schematic regular
expression consisting of a mix of Unicode character property values
and literal Unicode code points.
Script(cp) returns the value of the Unicode Script property, as
defined in Scripts.txt in the Unicode Character Database.
Canonical_Combining_Class(cp) returns the value of the Unicode
Canonical_Combining_Class property, as defined in UnicodeData.txt in
the Unicode Character Database.
Before(cp) returns the code point of the character immediately
preceding cp in logical order in the string representing the label.
Before(FirstChar) evaluates to Undefined.
After(cp) returns the code point of the character immediately
following cp in logical order in the string representing the label.
After(LastChar) evaluates to Undefined.
Note that "Before" and "After" do not refer to the visual display
order of the character in a label, which may be reversed or otherwise
modified by the bidirectional algorithm for labels including
characters from scripts written right to left. Instead, "Before" and
"After" refer to the network order of the character in the label.
The clauses "Then True" and "Then False" imply exit from the
pseudo-code routine with the corresponding result.
Repeated evaluation for all characters in a label makes use of the
special construct:
For All Characters:
Expression;
End For;
Faltstrom Standards Track [Page 14]
RFC 5892 IDNA Code Points August 2010
This construct requires repeated evaluation of "Expression" for each
code point in the label, starting from FirstChar and proceeding to
LastChar.
The different fields in the rules are to be interpreted as follows:
Code point:
The code point, or code points, to which this rule is to be
applied. Normally, this implies that if any of the code points in
a label is as defined, then the rules should be applied. If
evaluated to True, the code point is OK as used; if evaluated to
False, it is not OK.
Overview:
A description of the goal with the rule, in plain English.
Lookup:
True if application of this rule is recommended at lookup time;
False otherwise.
Rule Set:
The rule set itself, as described above.
Appendix A.1. ZERO WIDTH NON-JOINER
Code point:
U+200C
Overview:
This may occur in a formally cursive script (such as Arabic) in a
context where it breaks a cursive connection as required for
orthographic rules, as in the Persian language, for example. It
also may occur in Indic scripts in a consonant-conjunct context
(immediately following a virama), to control required display of
such conjuncts.
Lookup:
True
Rule Set:
False;
If Canonical_Combining_Class(Before(cp)) .eq. Virama Then True;
If RegExpMatch((Joining_Type:{L,D})(Joining_Type:T)*\u200C
(Joining_Type:T)*(Joining_Type:{R,D})) Then True;
Faltstrom Standards Track [Page 15]
RFC 5892 IDNA Code Points August 2010
Appendix A.2. ZERO WIDTH JOINER
Code point:
U+200D
Overview:
This may occur in Indic scripts in a consonant-conjunct context
(immediately following a virama), to control required display of
such conjuncts.
Lookup:
True
Rule Set:
False;
If Canonical_Combining_Class(Before(cp)) .eq. Virama Then True;
Appendix A.3. MIDDLE DOT
Code point:
U+00B7
Overview:
Between 'l' (U+006C) characters only, used to permit the Catalan
character ela geminada to be expressed.
Lookup:
False
Rule Set:
False;
If Before(cp) .eq. U+006C And
After(cp) .eq. U+006C Then True;
Faltstrom Standards Track [Page 16]
RFC 5892 IDNA Code Points August 2010
Appendix A.4. GREEK LOWER NUMERAL SIGN (KERAIA)
Code point:
U+0375
Overview:
The script of the following character MUST be Greek.
Lookup:
False
Rule Set:
False;
If Script(After(cp)) .eq. Greek Then True;
Appendix A.5. HEBREW PUNCTUATION GERESH
Code point:
U+05F3
Overview:
The script of the preceding character MUST be Hebrew.
Lookup:
False
Rule Set:
False;
If Script(Before(cp)) .eq. Hebrew Then True;
Faltstrom Standards Track [Page 17]
RFC 5892 IDNA Code Points August 2010
Appendix A.6. HEBREW PUNCTUATION GERSHAYIM
Code point:
U+05F4
Overview:
The script of the preceding character MUST be Hebrew.
Lookup:
False
Rule Set:
False;
If Script(Before(cp)) .eq. Hebrew Then True;
Appendix A.7. KATAKANA MIDDLE DOT
Code point:
U+30FB
Overview:
Note that the Script of Katakana Middle Dot is not any of
"Hiragana", "Katakana", or "Han". The effect of this rule is to
require at least one character in the label to be in one of those
scripts.
Lookup:
False
Rule Set:
False;
For All Characters:
If Script(cp) .in. {Hiragana, Katakana, Han} Then True;
End For;
Faltstrom Standards Track [Page 18]
RFC 5892 IDNA Code Points August 2010
Appendix A.8. ARABIC-INDIC DIGITS
Code point:
0660..0669
Overview:
Can not be mixed with Extended Arabic-Indic Digits.
Lookup:
False
Rule Set:
True;
For All Characters:
If cp .in. 06F0..06F9 Then False;
End For;
Appendix A.9. EXTENDED ARABIC-INDIC DIGITS
Code point:
06F0..06F9
Overview:
Can not be mixed with Arabic-Indic Digits.
Lookup:
False
Rule Set:
True;
For All Characters:
If cp .in. 0660..0669 Then False;
End For;
Faltstrom Standards Track [Page 19]
RFC 5892 IDNA Code Points August 2010
Appendix B. Code Points 0x0000 - 0x10FFFF
If one applies the rules (Section 3) to the code points 0x0000 to
0x10FFFF to Unicode 5.2, the result is as follows.
This list is non-normative, and only included for illustrative
purposes. Specifically, what is displayed in the third column is not
the formal name of the code point (as defined in Section 4.8 of The
Unicode Standard [Unicode52]). The differences exist, for example,
for the code points that have the code point value as part of the
name (for example, CJK UNIFIED IDEOGRAPH-4E00) and the naming of
Hangul syllables. For many code points, what you see is the official
name.
Appendix B.1. Code Points in Unicode Character Database (UCD) Format
0000..002C ; DISALLOWED # ..COMMA
002D ; PVALID # HYPHEN-MINUS
002E..002F ; DISALLOWED # FULL STOP..SOLIDUS
0030..0039 ; PVALID # DIGIT ZERO..DIGIT NINE
003A..0060 ; DISALLOWED # COLON..GRAVE ACCENT
0061..007A ; PVALID # LATIN SMALL LETTER A..LATIN SMALL LETTER Z
007B..00B6 ; DISALLOWED # LEFT CURLY BRACKET..PILCROW SIGN
00B7 ; CONTEXTO # MIDDLE DOT
00B8..00DE ; DISALLOWED # CEDILLA..LATIN CAPITAL LETTER THORN
00DF..00F6 ; PVALID # LATIN SMALL LETTER SHARP S..LATIN SMALL LETT
00F7 ; DISALLOWED # DIVISION SIGN
00F8..00FF ; PVALID # LATIN SMALL LETTER O WITH STROKE..LATIN SMAL
0100 ; DISALLOWED # LATIN CAPITAL LETTER A WITH MACRON
0101 ; PVALID # LATIN SMALL LETTER A WITH MACRON
0102 ; DISALLOWED # LATIN CAPITAL LETTER A WITH BREVE
0103 ; PVALID # LATIN SMALL LETTER A WITH BREVE
0104 ; DISALLOWED # LATIN CAPITAL LETTER A WITH OGONEK
0105 ; PVALID # LATIN SMALL LETTER A WITH OGONEK
0106 ; DISALLOWED # LATIN CAPITAL LETTER C WITH ACUTE
0107 ; PVALID # LATIN SMALL LETTER C WITH ACUTE
0108 ; DISALLOWED # LATIN CAPITAL LETTER C WITH CIRCUMFLEX
0109 ; PVALID # LATIN SMALL LETTER C WITH CIRCUMFLEX
010A ; DISALLOWED # LATIN CAPITAL LETTER C WITH DOT ABOVE
010B ; PVALID # LATIN SMALL LETTER C WITH DOT ABOVE
010C ; DISALLOWED # LATIN CAPITAL LETTER C WITH CARON
010D ; PVALID # LATIN SMALL LETTER C WITH CARON
010E ; DISALLOWED # LATIN CAPITAL LETTER D WITH CARON
010F ; PVALID # LATIN SMALL LETTER D WITH CARON
0110 ; DISALLOWED # LATIN CAPITAL LETTER D WITH STROKE
0111 ; PVALID # LATIN SMALL LETTER D WITH STROKE
0112 ; DISALLOWED # LATIN CAPITAL LETTER E WITH MACRON
0113 ; PVALID # LATIN SMALL LETTER E WITH MACRON
Faltstrom Standards Track [Page 20]
RFC 5892 IDNA Code Points August 2010
0114 ; DISALLOWED # LATIN CAPITAL LETTER E WITH BREVE
0115 ; PVALID # LATIN SMALL LETTER E WITH BREVE
0116 ; DISALLOWED # LATIN CAPITAL LETTER E WITH DOT ABOVE
0117 ; PVALID # LATIN SMALL LETTER E WITH DOT ABOVE
0118 ; DISALLOWED # LATIN CAPITAL LETTER E WITH OGONEK
0119 ; PVALID # LATIN SMALL LETTER E WITH OGONEK
011A ; DISALLOWED # LATIN CAPITAL LETTER E WITH CARON
011B ; PVALID # LATIN SMALL LETTER E WITH CARON
011C ; DISALLOWED # LATIN CAPITAL LETTER G WITH CIRCUMFLEX
011D ; PVALID # LATIN SMALL LETTER G WITH CIRCUMFLEX
011E ; DISALLOWED # LATIN CAPITAL LETTER G WITH BREVE
011F ; PVALID # LATIN SMALL LETTER G WITH BREVE
0120 ; DISALLOWED # LATIN CAPITAL LETTER G WITH DOT ABOVE
0121 ; PVALID # LATIN SMALL LETTER G WITH DOT ABOVE
0122 ; DISALLOWED # LATIN CAPITAL LETTER G WITH CEDILLA
0123 ; PVALID # LATIN SMALL LETTER G WITH CEDILLA
0124 ; DISALLOWED # LATIN CAPITAL LETTER H WITH CIRCUMFLEX
0125 ; PVALID # LATIN SMALL LETTER H WITH CIRCUMFLEX
0126 ; DISALLOWED # LATIN CAPITAL LETTER H WITH STROKE
0127 ; PVALID # LATIN SMALL LETTER H WITH STROKE
0128 ; DISALLOWED # LATIN CAPITAL LETTER I WITH TILDE
0129 ; PVALID # LATIN SMALL LETTER I WITH TILDE
012A ; DISALLOWED # LATIN CAPITAL LETTER I WITH MACRON
012B ; PVALID # LATIN SMALL LETTER I WITH MACRON
012C ; DISALLOWED # LATIN CAPITAL LETTER I WITH BREVE
012D ; PVALID # LATIN SMALL LETTER I WITH BREVE
012E ; DISALLOWED # LATIN CAPITAL LETTER I WITH OGONEK
012F ; PVALID # LATIN SMALL LETTER I WITH OGONEK
0130 ; DISALLOWED # LATIN CAPITAL LETTER I WITH DOT ABOVE
0131 ; PVALID # LATIN SMALL LETTER DOTLESS I
0132..0134 ; DISALLOWED # LATIN CAPITAL LIGATURE IJ..LATIN CAPITAL LET
0135 ; PVALID # LATIN SMALL LETTER J WITH CIRCUMFLEX
0136 ; DISALLOWED # LATIN CAPITAL LETTER K WITH CEDILLA
0137..0138 ; PVALID # LATIN SMALL LETTER K WITH CEDILLA..LATIN SMA
0139 ; DISALLOWED # LATIN CAPITAL LETTER L WITH ACUTE
013A ; PVALID # LATIN SMALL LETTER L WITH ACUTE
013B ; DISALLOWED # LATIN CAPITAL LETTER L WITH CEDILLA
013C ; PVALID # LATIN SMALL LETTER L WITH CEDILLA
013D ; DISALLOWED # LATIN CAPITAL LETTER L WITH CARON
013E ; PVALID # LATIN SMALL LETTER L WITH CARON
013F..0141 ; DISALLOWED # LATIN CAPITAL LETTER L WITH MIDDLE DOT..LATI
0142 ; PVALID # LATIN SMALL LETTER L WITH STROKE
0143 ; DISALLOWED # LATIN CAPITAL LETTER N WITH ACUTE
0144 ; PVALID # LATIN SMALL LETTER N WITH ACUTE
0145 ; DISALLOWED # LATIN CAPITAL LETTER N WITH CEDILLA
0146 ; PVALID # LATIN SMALL LETTER N WITH CEDILLA
0147 ; DISALLOWED # LATIN CAPITAL LETTER N WITH CARON
0148 ; PVALID # LATIN SMALL LETTER N WITH CARON
Faltstrom Standards Track [Page 21]
RFC 5892 IDNA Code Points August 2010
0149..014A ; DISALLOWED # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE.
014B ; PVALID # LATIN SMALL LETTER ENG
014C ; DISALLOWED # LATIN CAPITAL LETTER O WITH MACRON
014D ; PVALID # LATIN SMALL LETTER O WITH MACRON
014E ; DISALLOWED # LATIN CAPITAL LETTER O WITH BREVE
014F ; PVALID # LATIN SMALL LETTER O WITH BREVE
0150 ; DISALLOWED # LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
0151 ; PVALID # LATIN SMALL LETTER O WITH DOUBLE ACUTE
0152 ; DISALLOWED # LATIN CAPITAL LIGATURE OE
0153 ; PVALID # LATIN SMALL LIGATURE OE
0154 ; DISALLOWED # LATIN CAPITAL LETTER R WITH ACUTE
0155 ; PVALID # LATIN SMALL LETTER R WITH ACUTE
0156 ; DISALLOWED # LATIN CAPITAL LETTER R WITH CEDILLA
0157 ; PVALID # LATIN SMALL LETTER R WITH CEDILLA
0158 ; DISALLOWED # LATIN CAPITAL LETTER R WITH CARON
0159 ; PVALID # LATIN SMALL LETTER R WITH CARON
015A ; DISALLOWED # LATIN CAPITAL LETTER S WITH ACUTE
015B ; PVALID # LATIN SMALL LETTER S WITH ACUTE
015C ; DISALLOWED # LATIN CAPITAL LETTER S WITH CIRCUMFLEX
015D ; PVALID # LATIN SMALL LETTER S WITH CIRCUMFLEX
015E ; DISALLOWED # LATIN CAPITAL LETTER S WITH CEDILLA
015F ; PVALID # LATIN SMALL LETTER S WITH CEDILLA
0160 ; DISALLOWED # LATIN CAPITAL LETTER S WITH CARON
0161 ; PVALID # LATIN SMALL LETTER S WITH CARON
0162 ; DISALLOWED # LATIN CAPITAL LETTER T WITH CEDILLA
0163 ; PVALID # LATIN SMALL LETTER T WITH CEDILLA
0164 ; DISALLOWED # LATIN CAPITAL LETTER T WITH CARON
0165 ; PVALID # LATIN SMALL LETTER T WITH CARON
0166 ; DISALLOWED # LATIN CAPITAL LETTER T WITH STROKE
0167 ; PVALID # LATIN SMALL LETTER T WITH STROKE
0168 ; DISALLOWED # LATIN CAPITAL LETTER U WITH TILDE
0169 ; PVALID # LATIN SMALL LETTER U WITH TILDE
016A ; DISALLOWED # LATIN CAPITAL LETTER U WITH MACRON
016B ; PVALID # LATIN SMALL LETTER U WITH MACRON
016C ; DISALLOWED # LATIN CAPITAL LETTER U WITH BREVE
016D ; PVALID # LATIN SMALL LETTER U WITH BREVE
016E ; DISALLOWED # LATIN CAPITAL LETTER U WITH RING ABOVE
016F ; PVALID # LATIN SMALL LETTER U WITH RING ABOVE
0170 ; DISALLOWED # LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
0171 ; PVALID # LATIN SMALL LETTER U WITH DOUBLE ACUTE
0172 ; DISALLOWED # LATIN CAPITAL LETTER U WITH OGONEK
0173 ; PVALID # LATIN SMALL LETTER U WITH OGONEK
0174 ; DISALLOWED # LATIN CAPITAL LETTER W WITH CIRCUMFLEX
0175 ; PVALID # LATIN SMALL LETTER W WITH CIRCUMFLEX
0176 ; DISALLOWED # LATIN CAPITAL LETTER Y WITH CIRCUMFLEX
0177 ; PVALID # LATIN SMALL LETTER Y WITH CIRCUMFLEX
0178..0179 ; DISALLOWED # LATIN CAPITAL LETTER Y WITH DIAERESIS..LATIN
017A ; PVALID # LATIN SMALL LETTER Z WITH ACUTE
Faltstrom Standards Track [Page 22]
RFC 5892 IDNA Code Points August 2010
017B ; DISALLOWED # LATIN CAPITAL LETTER Z WITH DOT ABOVE
017C ; PVALID # LATIN SMALL LETTER Z WITH DOT ABOVE
017D ; DISALLOWED # LATIN CAPITAL LETTER Z WITH CARON
017E ; PVALID # LATIN SMALL LETTER Z WITH CARON
017F ; DISALLOWED # LATIN SMALL LETTER LONG S
0180 ; PVALID # LATIN SMALL LETTER B WITH STROKE
0181..0182 ; DISALLOWED # LATIN CAPITAL LETTER B WITH HOOK..LATIN CAPI
0183 ; PVALID # LATIN SMALL LETTER B WITH TOPBAR
0184 ; DISALLOWED # LATIN CAPITAL LETTER TONE SIX
0185 ; PVALID # LATIN SMALL LETTER TONE SIX
0186..0187 ; DISALLOWED # LATIN CAPITAL LETTER OPEN O..LATIN CAPITAL L
0188 ; PVALID # LATIN SMALL LETTER C WITH HOOK
0189..018B ; DISALLOWED # LATIN CAPITAL LETTER AFRICAN D..LATIN CAPITA
018C..018D ; PVALID # LATIN SMALL LETTER D WITH TOPBAR..LATIN SMAL
018E..0191 ; DISALLOWED # LATIN CAPITAL LETTER REVERSED E..LATIN CAPIT
0192 ; PVALID # LATIN SMALL LETTER F WITH HOOK
0193..0194 ; DISALLOWED # LATIN CAPITAL LETTER G WITH HOOK..LATIN CAPI
0195 ; PVALID # LATIN SMALL LETTER HV
0196..0198 ; DISALLOWED # LATIN CAPITAL LETTER IOTA..LATIN CAPITAL LET
0199..019B ; PVALID # LATIN SMALL LETTER K WITH HOOK..LATIN SMALL
019C..019D ; DISALLOWED # LATIN CAPITAL LETTER TURNED M..LATIN CAPITAL
019E ; PVALID # LATIN SMALL LETTER N WITH LONG RIGHT LEG
019F..01A0 ; DISALLOWED # LATIN CAPITAL LETTER O WITH MIDDLE TILDE..LA
01A1 ; PVALID # LATIN SMALL LETTER O WITH HORN
01A2 ; DISALLOWED # LATIN CAPITAL LETTER OI
01A3 ; PVALID # LATIN SMALL LETTER OI
01A4 ; DISALLOWED # LATIN CAPITAL LETTER P WITH HOOK
01A5 ; PVALID # LATIN SMALL LETTER P WITH HOOK
01A6..01A7 ; DISALLOWED # LATIN LETTER YR..LATIN CAPITAL LETTER TONE T
01A8 ; PVALID # LATIN SMALL LETTER TONE TWO
01A9 ; DISALLOWED # LATIN CAPITAL LETTER ESH
01AA..01AB ; PVALID # LATIN LETTER REVERSED ESH LOOP..LATIN SMALL
01AC ; DISALLOWED # LATIN CAPITAL LETTER T WITH HOOK
01AD ; PVALID # LATIN SMALL LETTER T WITH HOOK
01AE..01AF ; DISALLOWED # LATIN CAPITAL LETTER T WITH RETROFLEX HOOK..
01B0 ; PVALID # LATIN SMALL LETTER U WITH HORN
01B1..01B3 ; DISALLOWED # LATIN CAPITAL LETTER UPSILON..LATIN CAPITAL
01B4 ; PVALID # LATIN SMALL LETTER Y WITH HOOK
01B5 ; DISALLOWED # LATIN CAPITAL LETTER Z WITH STROKE
01B6 ; PVALID # LATIN SMALL LETTER Z WITH STROKE
01B7..01B8 ; DISALLOWED # LATIN CAPITAL LETTER EZH..LATIN CAPITAL LETT
01B9..01BB ; PVALID # LATIN SMALL LETTER EZH REVERSED..LATIN LETTE
01BC ; DISALLOWED # LATIN CAPITAL LETTER TONE FIVE
01BD..01C3 ; PVALID # LATIN SMALL LETTER TONE FIVE..LATIN LETTER R
01C4..01CD ; DISALLOWED # LATIN CAPITAL LETTER DZ WITH CARON..LATIN CA
01CE ; PVALID # LATIN SMALL LETTER A WITH CARON
01CF ; DISALLOWED # LATIN CAPITAL LETTER I WITH CARON
01D0 ; PVALID # LATIN SMALL LETTER I WITH CARON
Faltstrom Standards Track [Page 23]
RFC 5892 IDNA Code Points August 2010
01D1 ; DISALLOWED # LATIN CAPITAL LETTER O WITH CARON
01D2 ; PVALID # LATIN SMALL LETTER O WITH CARON
01D3 ; DISALLOWED # LATIN CAPITAL LETTER U WITH CARON
01D4 ; PVALID # LATIN SMALL LETTER U WITH CARON
01D5 ; DISALLOWED # LATIN CAPITAL LETTER U WITH DIAERESIS AND MA
01D6 ; PVALID # LATIN SMALL LETTER U WITH DIAERESIS AND MACR
01D7 ; DISALLOWED # LATIN CAPITAL LETTER U WITH DIAERESIS AND AC
01D8 ; PVALID # LATIN SMALL LETTER U WITH DIAERESIS AND ACUT
01D9 ; DISALLOWED # LATIN CAPITAL LETTER U WITH DIAERESIS AND CA
01DA ; PVALID # LATIN SMALL LETTER U WITH DIAERESIS AND CARO
01DB ; DISALLOWED # LATIN CAPITAL LETTER U WITH DIAERESIS AND GR
01DC..01DD ; PVALID # LATIN SMALL LETTER U WITH DIAERESIS AND GRAV
01DE ; DISALLOWED # LATIN CAPITAL LETTER A WITH DIAERESIS AND MA
01DF ; PVALID # LATIN SMALL LETTER A WITH DIAERESIS AND MACR
01E0 ; DISALLOWED # LATIN CAPITAL LETTER A WITH DOT ABOVE AND MA
01E1 ; PVALID # LATIN SMALL LETTER A WITH DOT ABOVE AND MACR
01E2 ; DISALLOWED # LATIN CAPITAL LETTER AE WITH MACRON
01E3 ; PVALID # LATIN SMALL LETTER AE WITH MACRON
01E4 ; DISALLOWED # LATIN CAPITAL LETTER G WITH STROKE
01E5 ; PVALID # LATIN SMALL LETTER G WITH STROKE
01E6 ; DISALLOWED # LATIN CAPITAL LETTER G WITH CARON
01E7 ; PVALID # LATIN SMALL LETTER G WITH CARON
01E8 ; DISALLOWED # LATIN CAPITAL LETTER K WITH CARON
01E9 ; PVALID # LATIN SMALL LETTER K WITH CARON
01EA ; DISALLOWED # LATIN CAPITAL LETTER O WITH OGONEK
01EB ; PVALID # LATIN SMALL LETTER O WITH OGONEK
01EC ; DISALLOWED # LATIN CAPITAL LETTER O WITH OGONEK AND MACRO
01ED ; PVALID # LATIN SMALL LETTER O WITH OGONEK AND MACRON
01EE ; DISALLOWED # LATIN CAPITAL LETTER EZH WITH CARON
01EF..01F0 ; PVALID # LATIN SMALL LETTER EZH WITH CARON..LATIN SMA
01F1..01F4 ; DISALLOWED # LATIN CAPITAL LETTER DZ..LATIN CAPITAL LETTE
01F5 ; PVALID # LATIN SMALL LETTER G WITH ACUTE
01F6..01F8 ; DISALLOWED # LATIN CAPITAL LETTER HWAIR..LATIN CAPITAL LE
01F9 ; PVALID # LATIN SMALL LETTER N WITH GRAVE
01FA ; DISALLOWED # LATIN CAPITAL LETTER A WITH RING ABOVE AND A
01FB ; PVALID # LATIN SMALL LETTER A WITH RING ABOVE AND ACU
01FC ; DISALLOWED # LATIN CAPITAL LETTER AE WITH ACUTE
01FD ; PVALID # LATIN SMALL LETTER AE WITH ACUTE
01FE ; DISALLOWED # LATIN CAPITAL LETTER O WITH STROKE AND ACUTE
01FF ; PVALID # LATIN SMALL LETTER O WITH STROKE AND ACUTE
0200 ; DISALLOWED # LATIN CAPITAL LETTER A WITH DOUBLE GRAVE
0201 ; PVALID # LATIN SMALL LETTER A WITH DOUBLE GRAVE
0202 ; DISALLOWED # LATIN CAPITAL LETTER A WITH INVERTED BREVE
0203 ; PVALID # LATIN SMALL LETTER A WITH INVERTED BREVE
0204 ; DISALLOWED # LATIN CAPITAL LETTER E WITH DOUBLE GRAVE
0205 ; PVALID # LATIN SMALL LETTER E WITH DOUBLE GRAVE
0206 ; DISALLOWED # LATIN CAPITAL LETTER E WITH INVERTED BREVE
0207 ; PVALID # LATIN SMALL LETTER E WITH INVERTED BREVE
Faltstrom Standards Track [Page 24]
RFC 5892 IDNA Code Points August 2010
0208 ; DISALLOWED # LATIN CAPITAL LETTER I WITH DOUBLE GRAVE
0209 ; PVALID # LATIN SMALL LETTER I WITH DOUBLE GRAVE
020A ; DISALLOWED # LATIN CAPITAL LETTER I WITH INVERTED BREVE
020B ; PVALID # LATIN SMALL LETTER I WITH INVERTED BREVE
020C ; DISALLOWED # LATIN CAPITAL LETTER O WITH DOUBLE GRAVE
020D ; PVALID # LATIN SMALL LETTER O WITH DOUBLE GRAVE
020E ; DISALLOWED # LATIN CAPITAL LETTER O WITH INVERTED BREVE
020F ; PVALID # LATIN SMALL LETTER O WITH INVERTED BREVE
0210 ; DISALLOWED # LATIN CAPITAL LETTER R WITH DOUBLE GRAVE
0211 ; PVALID # LATIN SMALL LETTER R WITH DOUBLE GRAVE
0212 ; DISALLOWED # LATIN CAPITAL LETTER R WITH INVERTED BREVE
0213 ; PVALID # LATIN SMALL LETTER R WITH INVERTED BREVE
0214 ; DISALLOWED # LATIN CAPITAL LETTER U WITH DOUBLE GRAVE
0215 ; PVALID # LATIN SMALL LETTER U WITH DOUBLE GRAVE
0216 ; DISALLOWED # LATIN CAPITAL LETTER U WITH INVERTED BREVE
0217 ; PVALID # LATIN SMALL LETTER U WITH INVERTED BREVE
0218 ; DISALLOWED # LATIN CAPITAL LETTER S WITH COMMA BELOW
0219 ; PVALID # LATIN SMALL LETTER S WITH COMMA BELOW
021A ; DISALLOWED # LATIN CAPITAL LETTER T WITH COMMA BELOW
021B ; PVALID # LATIN SMALL LETTER T WITH COMMA BELOW
021C ; DISALLOWED # LATIN CAPITAL LETTER YOGH
021D ; PVALID # LATIN SMALL LETTER YOGH
021E ; DISALLOWED # LATIN CAPITAL LETTER H WITH CARON
021F ; PVALID # LATIN SMALL LETTER H WITH CARON
0220 ; DISALLOWED # LATIN CAPITAL LETTER N WITH LONG RIGHT LEG
0221 ; PVALID # LATIN SMALL LETTER D WITH CURL
0222 ; DISALLOWED # LATIN CAPITAL LETTER OU
0223 ; PVALID # LATIN SMALL LETTER OU
0224 ; DISALLOWED # LATIN CAPITAL LETTER Z WITH HOOK
0225 ; PVALID # LATIN SMALL LETTER Z WITH HOOK
0226 ; DISALLOWED # LATIN CAPITAL LETTER A WITH DOT ABOVE
0227 ; PVALID # LATIN SMALL LETTER A WITH DOT ABOVE
0228 ; DISALLOWED # LATIN CAPITAL LETTER E WITH CEDILLA
0229 ; PVALID # LATIN SMALL LETTER E WITH CEDILLA
022A ; DISALLOWED # LATIN CAPITAL LETTER O WITH DIAERESIS AND MA
022B ; PVALID # LATIN SMALL LETTER O WITH DIAERESIS AND MACR
022C ; DISALLOWED # LATIN CAPITAL LETTER O WITH TILDE AND MACRON
022D ; PVALID # LATIN SMALL LETTER O WITH TILDE AND MACRON
022E ; DISALLOWED # LATIN CAPITAL LETTER O WITH DOT ABOVE
022F ; PVALID # LATIN SMALL LETTER O WITH DOT ABOVE
0230 ; DISALLOWED # LATIN CAPITAL LETTER O WITH DOT ABOVE AND MA
0231 ; PVALID # LATIN SMALL LETTER O WITH DOT ABOVE AND MACR
0232 ; DISALLOWED # LATIN CAPITAL LETTER Y WITH MACRON
0233..0239 ; PVALID # LATIN SMALL LETTER Y WITH MACRON..LATIN SMAL
023A..023B ; DISALLOWED # LATIN CAPITAL LETTER A WITH STROKE..LATIN CA
023C ; PVALID # LATIN SMALL LETTER C WITH STROKE
023D..023E ; DISALLOWED # LATIN CAPITAL LETTER L WITH BAR..LATIN CAPIT
023F..0240 ; PVALID # LATIN SMALL LETTER S WITH SWASH TAIL..LATIN
Faltstrom Standards Track [Page 25]
RFC 5892 IDNA Code Points August 2010
0241 ; DISALLOWED # LATIN CAPITAL LETTER GLOTTAL STOP
0242 ; PVALID # LATIN SMALL LETTER GLOTTAL STOP
0243..0246 ; DISALLOWED # LATIN CAPITAL LETTER B WITH STROKE..LATIN CA
0247 ; PVALID # LATIN SMALL LETTER E WITH STROKE
0248 ; DISALLOWED # LATIN CAPITAL LETTER J WITH STROKE
0249 ; PVALID # LATIN SMALL LETTER J WITH STROKE
024A ; DISALLOWED # LATIN CAPITAL LETTER SMALL Q WITH HOOK TAIL
024B ; PVALID # LATIN SMALL LETTER Q WITH HOOK TAIL
024C ; DISALLOWED # LATIN CAPITAL LETTER R WITH STROKE
024D ; PVALID # LATIN SMALL LETTER R WITH STROKE
024E ; DISALLOWED # LATIN CAPITAL LETTER Y WITH STROKE
024F..02AF ; PVALID # LATIN SMALL LETTER Y WITH STROKE..LATIN SMAL
02B0..02B8 ; DISALLOWED # MODIFIER LETTER SMALL H..MODIFIER LETTER SMA
02B9..02C1 ; PVALID # MODIFIER LETTER PRIME..MODIFIER LETTER REVER
02C2..02C5 ; DISALLOWED # MODIFIER LETTER LEFT ARROWHEAD..MODIFIER LET
02C6..02D1 ; PVALID # MODIFIER LETTER CIRCUMFLEX ACCENT..MODIFIER
02D2..02EB ; DISALLOWED # MODIFIER LETTER CENTRED RIGHT HALF RING..MOD
02EC ; PVALID # MODIFIER LETTER VOICING
02ED ; DISALLOWED # MODIFIER LETTER UNASPIRATED
02EE ; PVALID # MODIFIER LETTER DOUBLE APOSTROPHE
02EF..02FF ; DISALLOWED # MODIFIER LETTER LOW DOWN ARROWHEAD..MODIFIER
0300..033F ; PVALID # COMBINING GRAVE ACCENT..COMBINING DOUBLE OVE
0340..0341 ; DISALLOWED # COMBINING GRAVE TONE MARK..COMBINING ACUTE T
0342 ; PVALID # COMBINING GREEK PERISPOMENI
0343..0345 ; DISALLOWED # COMBINING GREEK KORONIS..COMBINING GREEK YPO
0346..034E ; PVALID # COMBINING BRIDGE ABOVE..COMBINING UPWARDS AR
034F ; DISALLOWED # COMBINING GRAPHEME JOINER
0350..036F ; PVALID # COMBINING RIGHT ARROWHEAD ABOVE..COMBINING L
0370 ; DISALLOWED # GREEK CAPITAL LETTER HETA
0371 ; PVALID # GREEK SMALL LETTER HETA
0372 ; DISALLOWED # GREEK CAPITAL LETTER ARCHAIC SAMPI
0373 ; PVALID # GREEK SMALL LETTER ARCHAIC SAMPI
0374 ; DISALLOWED # GREEK NUMERAL SIGN
0375 ; CONTEXTO # GREEK LOWER NUMERAL SIGN
0376 ; DISALLOWED # GREEK CAPITAL LETTER PAMPHYLIAN DIGAMMA
0377 ; PVALID # GREEK SMALL LETTER PAMPHYLIAN DIGAMMA
0378..0379 ; UNASSIGNED # ..
037A ; DISALLOWED # GREEK YPOGEGRAMMENI
037B..037D ; PVALID # GREEK SMALL REVERSED LUNATE SIGMA SYMBOL..GR
037E ; DISALLOWED # GREEK QUESTION MARK
037F..0383 ; UNASSIGNED # ..
0384..038A ; DISALLOWED # GREEK TONOS..GREEK CAPITAL LETTER IOTA WITH
038B ; UNASSIGNED #
038C ; DISALLOWED # GREEK CAPITAL LETTER OMICRON WITH TONOS
038D ; UNASSIGNED #
038E..038F ; DISALLOWED # GREEK CAPITAL LETTER UPSILON WITH TONOS..GRE
0390 ; PVALID # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND T
0391..03A1 ; DISALLOWED # GREEK CAPITAL LETTER ALPHA..GREEK CAPITAL LE
Faltstrom Standards Track [Page 26]
RFC 5892 IDNA Code Points August 2010
03A2 ; UNASSIGNED #
03A3..03AB ; DISALLOWED # GREEK CAPITAL LETTER SIGMA..GREEK CAPITAL LE
03AC..03CE ; PVALID # GREEK SMALL LETTER ALPHA WITH TONOS..GREEK S
03CF..03D6 ; DISALLOWED # GREEK CAPITAL KAI SYMBOL..GREEK PI SYMBOL
03D7 ; PVALID # GREEK KAI SYMBOL
03D8 ; DISALLOWED # GREEK LETTER ARCHAIC KOPPA
03D9 ; PVALID # GREEK SMALL LETTER ARCHAIC KOPPA
03DA ; DISALLOWED # GREEK LETTER STIGMA
03DB ; PVALID # GREEK SMALL LETTER STIGMA
03DC ; DISALLOWED # GREEK LETTER DIGAMMA
03DD ; PVALID # GREEK SMALL LETTER DIGAMMA
03DE ; DISALLOWED # GREEK LETTER KOPPA
03DF ; PVALID # GREEK SMALL LETTER KOPPA
03E0 ; DISALLOWED # GREEK LETTER SAMPI
03E1 ; PVALID # GREEK SMALL LETTER SAMPI
03E2 ; DISALLOWED # COPTIC CAPITAL LETTER SHEI
03E3 ; PVALID # COPTIC SMALL LETTER SHEI
03E4 ; DISALLOWED # COPTIC CAPITAL LETTER FEI
03E5 ; PVALID # COPTIC SMALL LETTER FEI
03E6 ; DISALLOWED # COPTIC CAPITAL LETTER KHEI
03E7 ; PVALID # COPTIC SMALL LETTER KHEI
03E8 ; DISALLOWED # COPTIC CAPITAL LETTER HORI
03E9 ; PVALID # COPTIC SMALL LETTER HORI
03EA ; DISALLOWED # COPTIC CAPITAL LETTER GANGIA
03EB ; PVALID # COPTIC SMALL LETTER GANGIA
03EC ; DISALLOWED # COPTIC CAPITAL LETTER SHIMA
03ED ; PVALID # COPTIC SMALL LETTER SHIMA
03EE ; DISALLOWED # COPTIC CAPITAL LETTER DEI
03EF ; PVALID # COPTIC SMALL LETTER DEI
03F0..03F2 ; DISALLOWED # GREEK KAPPA SYMBOL..GREEK LUNATE SIGMA SYMBO
03F3 ; PVALID # GREEK LETTER YOT
03F4..03F7 ; DISALLOWED # GREEK CAPITAL THETA SYMBOL..GREEK CAPITAL LE
03F8 ; PVALID # GREEK SMALL LETTER SHO
03F9..03FA ; DISALLOWED # GREEK CAPITAL LUNATE SIGMA SYMBOL..GREEK CAP
03FB..03FC ; PVALID # GREEK SMALL LETTER SAN..GREEK RHO WITH STROK
03FD..042F ; DISALLOWED # GREEK CAPITAL REVERSED LUNATE SIGMA SYMBOL..
0430..045F ; PVALID # CYRILLIC SMALL LETTER A..CYRILLIC SMALL LETT
0460 ; DISALLOWED # CYRILLIC CAPITAL LETTER OMEGA
0461 ; PVALID # CYRILLIC SMALL LETTER OMEGA
0462 ; DISALLOWED # CYRILLIC CAPITAL LETTER YAT
0463 ; PVALID # CYRILLIC SMALL LETTER YAT
0464 ; DISALLOWED # CYRILLIC CAPITAL LETTER IOTIFIED E
0465 ; PVALID # CYRILLIC SMALL LETTER IOTIFIED E
0466 ; DISALLOWED # CYRILLIC CAPITAL LETTER LITTLE YUS
0467 ; PVALID # CYRILLIC SMALL LETTER LITTLE YUS
0468 ; DISALLOWED # CYRILLIC CAPITAL LETTER IOTIFIED LITTLE YUS
0469 ; PVALID # CYRILLIC SMALL LETTER IOTIFIED LITTLE YUS
046A ; DISALLOWED # CYRILLIC CAPITAL LETTER BIG YUS
Faltstrom Standards Track [Page 27]
RFC 5892 IDNA Code Points August 2010
046B ; PVALID # CYRILLIC SMALL LETTER BIG YUS
046C ; DISALLOWED # CYRILLIC CAPITAL LETTER IOTIFIED BIG YUS
046D ; PVALID # CYRILLIC SMALL LETTER IOTIFIED BIG YUS
046E ; DISALLOWED # CYRILLIC CAPITAL LETTER KSI
046F ; PVALID # CYRILLIC SMALL LETTER KSI
0470 ; DISALLOWED # CYRILLIC CAPITAL LETTER PSI
0471 ; PVALID # CYRILLIC SMALL LETTER PSI
0472 ; DISALLOWED # CYRILLIC CAPITAL LETTER FITA
0473 ; PVALID # CYRILLIC SMALL LETTER FITA
0474 ; DISALLOWED # CYRILLIC CAPITAL LETTER IZHITSA
0475 ; PVALID # CYRILLIC SMALL LETTER IZHITSA
0476 ; DISALLOWED # CYRILLIC CAPITAL LETTER IZHITSA WITH DOUBLE
0477 ; PVALID # CYRILLIC SMALL LETTER IZHITSA WITH DOUBLE GR
0478 ; DISALLOWED # CYRILLIC CAPITAL LETTER UK
0479 ; PVALID # CYRILLIC SMALL LETTER UK
047A ; DISALLOWED # CYRILLIC CAPITAL LETTER ROUND OMEGA
047B ; PVALID # CYRILLIC SMALL LETTER ROUND OMEGA
047C ; DISALLOWED # CYRILLIC CAPITAL LETTER OMEGA WITH TITLO
047D ; PVALID # CYRILLIC SMALL LETTER OMEGA WITH TITLO
047E ; DISALLOWED # CYRILLIC CAPITAL LETTER OT
047F ; PVALID # CYRILLIC SMALL LETTER OT
0480 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOPPA
0481 ; PVALID # CYRILLIC SMALL LETTER KOPPA
0482 ; DISALLOWED # CYRILLIC THOUSANDS SIGN
0483..0487 ; PVALID # COMBINING CYRILLIC TITLO..COMBINING CYRILLIC
0488..048A ; DISALLOWED # COMBINING CYRILLIC HUNDRED THOUSANDS SIGN..C
048B ; PVALID # CYRILLIC SMALL LETTER SHORT I WITH TAIL
048C ; DISALLOWED # CYRILLIC CAPITAL LETTER SEMISOFT SIGN
048D ; PVALID # CYRILLIC SMALL LETTER SEMISOFT SIGN
048E ; DISALLOWED # CYRILLIC CAPITAL LETTER ER WITH TICK
048F ; PVALID # CYRILLIC SMALL LETTER ER WITH TICK
0490 ; DISALLOWED # CYRILLIC CAPITAL LETTER GHE WITH UPTURN
0491 ; PVALID # CYRILLIC SMALL LETTER GHE WITH UPTURN
0492 ; DISALLOWED # CYRILLIC CAPITAL LETTER GHE WITH STROKE
0493 ; PVALID # CYRILLIC SMALL LETTER GHE WITH STROKE
0494 ; DISALLOWED # CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK
0495 ; PVALID # CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK
0496 ; DISALLOWED # CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER
0497 ; PVALID # CYRILLIC SMALL LETTER ZHE WITH DESCENDER
0498 ; DISALLOWED # CYRILLIC CAPITAL LETTER ZE WITH DESCENDER
0499 ; PVALID # CYRILLIC SMALL LETTER ZE WITH DESCENDER
049A ; DISALLOWED # CYRILLIC CAPITAL LETTER KA WITH DESCENDER
049B ; PVALID # CYRILLIC SMALL LETTER KA WITH DESCENDER
049C ; DISALLOWED # CYRILLIC CAPITAL LETTER KA WITH VERTICAL STR
049D ; PVALID # CYRILLIC SMALL LETTER KA WITH VERTICAL STROK
049E ; DISALLOWED # CYRILLIC CAPITAL LETTER KA WITH STROKE
049F ; PVALID # CYRILLIC SMALL LETTER KA WITH STROKE
04A0 ; DISALLOWED # CYRILLIC CAPITAL LETTER BASHKIR KA
Faltstrom Standards Track [Page 28]
RFC 5892 IDNA Code Points August 2010
04A1 ; PVALID # CYRILLIC SMALL LETTER BASHKIR KA
04A2 ; DISALLOWED # CYRILLIC CAPITAL LETTER EN WITH DESCENDER
04A3 ; PVALID # CYRILLIC SMALL LETTER EN WITH DESCENDER
04A4 ; DISALLOWED # CYRILLIC CAPITAL LIGATURE EN GHE
04A5 ; PVALID # CYRILLIC SMALL LIGATURE EN GHE
04A6 ; DISALLOWED # CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK
04A7 ; PVALID # CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK
04A8 ; DISALLOWED # CYRILLIC CAPITAL LETTER ABKHASIAN HA
04A9 ; PVALID # CYRILLIC SMALL LETTER ABKHASIAN HA
04AA ; DISALLOWED # CYRILLIC CAPITAL LETTER ES WITH DESCENDER
04AB ; PVALID # CYRILLIC SMALL LETTER ES WITH DESCENDER
04AC ; DISALLOWED # CYRILLIC CAPITAL LETTER TE WITH DESCENDER
04AD ; PVALID # CYRILLIC SMALL LETTER TE WITH DESCENDER
04AE ; DISALLOWED # CYRILLIC CAPITAL LETTER STRAIGHT U
04AF ; PVALID # CYRILLIC SMALL LETTER STRAIGHT U
04B0 ; DISALLOWED # CYRILLIC CAPITAL LETTER STRAIGHT U WITH STRO
04B1 ; PVALID # CYRILLIC SMALL LETTER STRAIGHT U WITH STROKE
04B2 ; DISALLOWED # CYRILLIC CAPITAL LETTER HA WITH DESCENDER
04B3 ; PVALID # CYRILLIC SMALL LETTER HA WITH DESCENDER
04B4 ; DISALLOWED # CYRILLIC CAPITAL LIGATURE TE TSE
04B5 ; PVALID # CYRILLIC SMALL LIGATURE TE TSE
04B6 ; DISALLOWED # CYRILLIC CAPITAL LETTER CHE WITH DESCENDER
04B7 ; PVALID # CYRILLIC SMALL LETTER CHE WITH DESCENDER
04B8 ; DISALLOWED # CYRILLIC CAPITAL LETTER CHE WITH VERTICAL ST
04B9 ; PVALID # CYRILLIC SMALL LETTER CHE WITH VERTICAL STRO
04BA ; DISALLOWED # CYRILLIC CAPITAL LETTER SHHA
04BB ; PVALID # CYRILLIC SMALL LETTER SHHA
04BC ; DISALLOWED # CYRILLIC CAPITAL LETTER ABKHASIAN CHE
04BD ; PVALID # CYRILLIC SMALL LETTER ABKHASIAN CHE
04BE ; DISALLOWED # CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH D
04BF ; PVALID # CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DES
04C0..04C1 ; DISALLOWED # CYRILLIC LETTER PALOCHKA..CYRILLIC CAPITAL L
04C2 ; PVALID # CYRILLIC SMALL LETTER ZHE WITH BREVE
04C3 ; DISALLOWED # CYRILLIC CAPITAL LETTER KA WITH HOOK
04C4 ; PVALID # CYRILLIC SMALL LETTER KA WITH HOOK
04C5 ; DISALLOWED # CYRILLIC CAPITAL LETTER EL WITH TAIL
04C6 ; PVALID # CYRILLIC SMALL LETTER EL WITH TAIL
04C7 ; DISALLOWED # CYRILLIC CAPITAL LETTER EN WITH HOOK
04C8 ; PVALID # CYRILLIC SMALL LETTER EN WITH HOOK
04C9 ; DISALLOWED # CYRILLIC CAPITAL LETTER EN WITH TAIL
04CA ; PVALID # CYRILLIC SMALL LETTER EN WITH TAIL
04CB ; DISALLOWED # CYRILLIC CAPITAL LETTER KHAKASSIAN CHE
04CC ; PVALID # CYRILLIC SMALL LETTER KHAKASSIAN CHE
04CD ; DISALLOWED # CYRILLIC CAPITAL LETTER EM WITH TAIL
04CE..04CF ; PVALID # CYRILLIC SMALL LETTER EM WITH TAIL..CYRILLIC
04D0 ; DISALLOWED # CYRILLIC CAPITAL LETTER A WITH BREVE
04D1 ; PVALID # CYRILLIC SMALL LETTER A WITH BREVE
04D2 ; DISALLOWED # CYRILLIC CAPITAL LETTER A WITH DIAERESIS
Faltstrom Standards Track [Page 29]
RFC 5892 IDNA Code Points August 2010
04D3 ; PVALID # CYRILLIC SMALL LETTER A WITH DIAERESIS
04D4 ; DISALLOWED # CYRILLIC CAPITAL LIGATURE A IE
04D5 ; PVALID # CYRILLIC SMALL LIGATURE A IE
04D6 ; DISALLOWED # CYRILLIC CAPITAL LETTER IE WITH BREVE
04D7 ; PVALID # CYRILLIC SMALL LETTER IE WITH BREVE
04D8 ; DISALLOWED # CYRILLIC CAPITAL LETTER SCHWA
04D9 ; PVALID # CYRILLIC SMALL LETTER SCHWA
04DA ; DISALLOWED # CYRILLIC CAPITAL LETTER SCHWA WITH DIAERESIS
04DB ; PVALID # CYRILLIC SMALL LETTER SCHWA WITH DIAERESIS
04DC ; DISALLOWED # CYRILLIC CAPITAL LETTER ZHE WITH DIAERESIS
04DD ; PVALID # CYRILLIC SMALL LETTER ZHE WITH DIAERESIS
04DE ; DISALLOWED # CYRILLIC CAPITAL LETTER ZE WITH DIAERESIS
04DF ; PVALID # CYRILLIC SMALL LETTER ZE WITH DIAERESIS
04E0 ; DISALLOWED # CYRILLIC CAPITAL LETTER ABKHASIAN DZE
04E1 ; PVALID # CYRILLIC SMALL LETTER ABKHASIAN DZE
04E2 ; DISALLOWED # CYRILLIC CAPITAL LETTER I WITH MACRON
04E3 ; PVALID # CYRILLIC SMALL LETTER I WITH MACRON
04E4 ; DISALLOWED # CYRILLIC CAPITAL LETTER I WITH DIAERESIS
04E5 ; PVALID # CYRILLIC SMALL LETTER I WITH DIAERESIS
04E6 ; DISALLOWED # CYRILLIC CAPITAL LETTER O WITH DIAERESIS
04E7 ; PVALID # CYRILLIC SMALL LETTER O WITH DIAERESIS
04E8 ; DISALLOWED # CYRILLIC CAPITAL LETTER BARRED O
04E9 ; PVALID # CYRILLIC SMALL LETTER BARRED O
04EA ; DISALLOWED # CYRILLIC CAPITAL LETTER BARRED O WITH DIAERE
04EB ; PVALID # CYRILLIC SMALL LETTER BARRED O WITH DIAERESI
04EC ; DISALLOWED # CYRILLIC CAPITAL LETTER E WITH DIAERESIS
04ED ; PVALID # CYRILLIC SMALL LETTER E WITH DIAERESIS
04EE ; DISALLOWED # CYRILLIC CAPITAL LETTER U WITH MACRON
04EF ; PVALID # CYRILLIC SMALL LETTER U WITH MACRON
04F0 ; DISALLOWED # CYRILLIC CAPITAL LETTER U WITH DIAERESIS
04F1 ; PVALID # CYRILLIC SMALL LETTER U WITH DIAERESIS
04F2 ; DISALLOWED # CYRILLIC CAPITAL LETTER U WITH DOUBLE ACUTE
04F3 ; PVALID # CYRILLIC SMALL LETTER U WITH DOUBLE ACUTE
04F4 ; DISALLOWED # CYRILLIC CAPITAL LETTER CHE WITH DIAERESIS
04F5 ; PVALID # CYRILLIC SMALL LETTER CHE WITH DIAERESIS
04F6 ; DISALLOWED # CYRILLIC CAPITAL LETTER GHE WITH DESCENDER
04F7 ; PVALID # CYRILLIC SMALL LETTER GHE WITH DESCENDER
04F8 ; DISALLOWED # CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS
04F9 ; PVALID # CYRILLIC SMALL LETTER YERU WITH DIAERESIS
04FA ; DISALLOWED # CYRILLIC CAPITAL LETTER GHE WITH STROKE AND
04FB ; PVALID # CYRILLIC SMALL LETTER GHE WITH STROKE AND HO
04FC ; DISALLOWED # CYRILLIC CAPITAL LETTER HA WITH HOOK
04FD ; PVALID # CYRILLIC SMALL LETTER HA WITH HOOK
04FE ; DISALLOWED # CYRILLIC CAPITAL LETTER HA WITH STROKE
04FF ; PVALID # CYRILLIC SMALL LETTER HA WITH STROKE
0500 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI DE
0501 ; PVALID # CYRILLIC SMALL LETTER KOMI DE
0502 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI DJE
Faltstrom Standards Track [Page 30]
RFC 5892 IDNA Code Points August 2010
0503 ; PVALID # CYRILLIC SMALL LETTER KOMI DJE
0504 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI ZJE
0505 ; PVALID # CYRILLIC SMALL LETTER KOMI ZJE
0506 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI DZJE
0507 ; PVALID # CYRILLIC SMALL LETTER KOMI DZJE
0508 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI LJE
0509 ; PVALID # CYRILLIC SMALL LETTER KOMI LJE
050A ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI NJE
050B ; PVALID # CYRILLIC SMALL LETTER KOMI NJE
050C ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI SJE
050D ; PVALID # CYRILLIC SMALL LETTER KOMI SJE
050E ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI TJE
050F ; PVALID # CYRILLIC SMALL LETTER KOMI TJE
0510 ; DISALLOWED # CYRILLIC CAPITAL LETTER REVERSED ZE
0511 ; PVALID # CYRILLIC SMALL LETTER REVERSED ZE
0512 ; DISALLOWED # CYRILLIC CAPITAL LETTER EL WITH HOOK
0513 ; PVALID # CYRILLIC SMALL LETTER EL WITH HOOK
0514 ; DISALLOWED # CYRILLIC CAPITAL LETTER LHA
0515 ; PVALID # CYRILLIC SMALL LETTER LHA
0516 ; DISALLOWED # CYRILLIC CAPITAL LETTER RHA
0517 ; PVALID # CYRILLIC SMALL LETTER RHA
0518 ; DISALLOWED # CYRILLIC CAPITAL LETTER YAE
0519 ; PVALID # CYRILLIC SMALL LETTER YAE
051A ; DISALLOWED # CYRILLIC CAPITAL LETTER QA
051B ; PVALID # CYRILLIC SMALL LETTER QA
051C ; DISALLOWED # CYRILLIC CAPITAL LETTER WE
051D ; PVALID # CYRILLIC SMALL LETTER WE
051E ; DISALLOWED # CYRILLIC CAPITAL LETTER ALEUT KA
051F ; PVALID # CYRILLIC SMALL LETTER ALEUT KA
0520 ; DISALLOWED # CYRILLIC CAPITAL LETTER EL WITH MIDDLE HOOK
0521 ; PVALID # CYRILLIC SMALL LETTER EL WITH MIDDLE HOOK
0522 ; DISALLOWED # CYRILLIC CAPITAL LETTER EN WITH MIDDLE HOOK
0523 ; PVALID # CYRILLIC SMALL LETTER EN WITH MIDDLE HOOK
0524 ; DISALLOWED # CYRILLIC CAPITAL LETTER PE WITH DESCENDER
0525 ; PVALID # CYRILLIC SMALL LETTER PE WITH DESCENDER
0526..0530 ; UNASSIGNED # ..
0531..0556 ; DISALLOWED # ARMENIAN CAPITAL LETTER AYB..ARMENIAN CAPITA
0557..0558 ; UNASSIGNED # ..
0559 ; PVALID # ARMENIAN MODIFIER LETTER LEFT HALF RING
055A..055F ; DISALLOWED # ARMENIAN APOSTROPHE..ARMENIAN ABBREVIATION M
0560 ; UNASSIGNED #
0561..0586 ; PVALID # ARMENIAN SMALL LETTER AYB..ARMENIAN SMALL LE
0587 ; DISALLOWED # ARMENIAN SMALL LIGATURE ECH YIWN
0588 ; UNASSIGNED #
0589..058A ; DISALLOWED # ARMENIAN FULL STOP..ARMENIAN HYPHEN
058B..0590 ; UNASSIGNED # ..
0591..05BD ; PVALID # HEBREW ACCENT ETNAHTA..HEBREW POINT METEG
05BE ; DISALLOWED # HEBREW PUNCTUATION MAQAF
Faltstrom Standards Track [Page 31]
RFC 5892 IDNA Code Points August 2010
05BF ; PVALID # HEBREW POINT RAFE
05C0 ; DISALLOWED # HEBREW PUNCTUATION PASEQ
05C1..05C2 ; PVALID # HEBREW POINT SHIN DOT..HEBREW POINT SIN DOT
05C3 ; DISALLOWED # HEBREW PUNCTUATION SOF PASUQ
05C4..05C5 ; PVALID # HEBREW MARK UPPER DOT..HEBREW MARK LOWER DOT
05C6 ; DISALLOWED # HEBREW PUNCTUATION NUN HAFUKHA
05C7 ; PVALID # HEBREW POINT QAMATS QATAN
05C8..05CF ; UNASSIGNED # ..
05D0..05EA ; PVALID # HEBREW LETTER ALEF..HEBREW LETTER TAV
05EB..05EF ; UNASSIGNED # ..
05F0..05F2 ; PVALID # HEBREW LIGATURE YIDDISH DOUBLE VAV..HEBREW L
05F3..05F4 ; CONTEXTO # HEBREW PUNCTUATION GERESH..HEBREW PUNCTUATIO
05F5..05FF ; UNASSIGNED # ..
0600..0603 ; DISALLOWED # ARABIC NUMBER SIGN..ARABIC SIGN SAFHA
0604..0605 ; UNASSIGNED # ..
0606..060F ; DISALLOWED # ARABIC-INDIC CUBE ROOT..ARABIC SIGN MISRA
0610..061A ; PVALID # ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..AR
061B ; DISALLOWED # ARABIC SEMICOLON
061C..061D ; UNASSIGNED # ..
061E..061F ; DISALLOWED # ARABIC TRIPLE DOT PUNCTUATION MARK..ARABIC Q
0620 ; UNASSIGNED #
0621..063F ; PVALID # ARABIC LETTER HAMZA..ARABIC LETTER FARSI YEH
0640 ; DISALLOWED # ARABIC TATWEEL
0641..065E ; PVALID # ARABIC LETTER FEH..ARABIC FATHA WITH TWO DOT
065F ; UNASSIGNED #
0660..0669 ; CONTEXTO # ARABIC-INDIC DIGIT ZERO..ARABIC-INDIC DIGIT
066A..066D ; DISALLOWED # ARABIC PERCENT SIGN..ARABIC FIVE POINTED STA
066E..0674 ; PVALID # ARABIC LETTER DOTLESS BEH..ARABIC LETTER HIG
0675..0678 ; DISALLOWED # ARABIC LETTER HIGH HAMZA ALEF..ARABIC LETTER
0679..06D3 ; PVALID # ARABIC LETTER TTEH..ARABIC LETTER YEH BARREE
06D4 ; DISALLOWED # ARABIC FULL STOP
06D5..06DC ; PVALID # ARABIC LETTER AE..ARABIC SMALL HIGH SEEN
06DD..06DE ; DISALLOWED # ARABIC END OF AYAH..ARABIC START OF RUB EL H
06DF..06E8 ; PVALID # ARABIC SMALL HIGH ROUNDED ZERO..ARABIC SMALL
06E9 ; DISALLOWED # ARABIC PLACE OF SAJDAH
06EA..06EF ; PVALID # ARABIC EMPTY CENTRE LOW STOP..ARABIC LETTER
06F0..06F9 ; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT ZERO..EXTENDED A
06FA..06FF ; PVALID # ARABIC LETTER SHEEN WITH DOT BELOW..ARABIC L
0700..070D ; DISALLOWED # SYRIAC END OF PARAGRAPH..SYRIAC HARKLEAN AST
070E ; UNASSIGNED #
070F ; DISALLOWED # SYRIAC ABBREVIATION MARK
0710..074A ; PVALID # SYRIAC LETTER ALAPH..SYRIAC BARREKH
074B..074C ; UNASSIGNED # ..
074D..07B1 ; PVALID # SYRIAC LETTER SOGDIAN ZHAIN..THAANA LETTER N
07B2..07BF ; UNASSIGNED # ..
07C0..07F5 ; PVALID # NKO DIGIT ZERO..NKO LOW TONE APOSTROPHE
07F6..07FA ; DISALLOWED # NKO SYMBOL OO DENNEN..NKO LAJANYALAN
07FB..07FF ; UNASSIGNED # ..
Faltstrom Standards Track [Page 32]
RFC 5892 IDNA Code Points August 2010
0800..082D ; PVALID # SAMARITAN LETTER ALAF..SAMARITAN MARK NEQUDA
082E..082F ; UNASSIGNED # ..
0830..083E ; DISALLOWED # SAMARITAN PUNCTUATION NEQUDAA..SAMARITAN PUN
083F..08FF ; UNASSIGNED # ..
0900..0939 ; PVALID # DEVANAGARI SIGN INVERTED CANDRABINDU..DEVANA
093A..093B ; UNASSIGNED # ..
093C..094E ; PVALID # DEVANAGARI SIGN NUKTA..DEVANAGARI VOWEL SIGN
094F ; UNASSIGNED #
0950..0955 ; PVALID # DEVANAGARI OM..DEVANAGARI VOWEL SIGN CANDRA
0956..0957 ; UNASSIGNED # ..
0958..095F ; DISALLOWED # DEVANAGARI LETTER QA..DEVANAGARI LETTER YYA
0960..0963 ; PVALID # DEVANAGARI LETTER VOCALIC RR..DEVANAGARI VOW
0964..0965 ; DISALLOWED # DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA
0966..096F ; PVALID # DEVANAGARI DIGIT ZERO..DEVANAGARI DIGIT NINE
0970 ; DISALLOWED # DEVANAGARI ABBREVIATION SIGN
0971..0972 ; PVALID # DEVANAGARI SIGN HIGH SPACING DOT..DEVANAGARI
0973..0978 ; UNASSIGNED # ..
0979..097F ; PVALID # DEVANAGARI LETTER ZHA..DEVANAGARI LETTER BBA
0980 ; UNASSIGNED #
0981..0983 ; PVALID # BENGALI SIGN CANDRABINDU..BENGALI SIGN VISAR
0984 ; UNASSIGNED #
0985..098C ; PVALID # BENGALI LETTER A..BENGALI LETTER VOCALIC L
098D..098E ; UNASSIGNED # ..
098F..0990 ; PVALID # BENGALI LETTER E..BENGALI LETTER AI
0991..0992 ; UNASSIGNED # ..
0993..09A8 ; PVALID # BENGALI LETTER O..BENGALI LETTER NA
09A9 ; UNASSIGNED #
09AA..09B0 ; PVALID # BENGALI LETTER PA..BENGALI LETTER RA
09B1 ; UNASSIGNED #
09B2 ; PVALID # BENGALI LETTER LA
09B3..09B5 ; UNASSIGNED # ..
09B6..09B9 ; PVALID # BENGALI LETTER SHA..BENGALI LETTER HA
09BA..09BB ; UNASSIGNED # ..
09BC..09C4 ; PVALID # BENGALI SIGN NUKTA..BENGALI VOWEL SIGN VOCAL
09C5..09C6 ; UNASSIGNED # ..
09C7..09C8 ; PVALID # BENGALI VOWEL SIGN E..BENGALI VOWEL SIGN AI
09C9..09CA ; UNASSIGNED # ..
09CB..09CE ; PVALID # BENGALI VOWEL SIGN O..BENGALI LETTER KHANDA
09CF..09D6 ; UNASSIGNED # ..
09D7 ; PVALID # BENGALI AU LENGTH MARK
09D8..09DB ; UNASSIGNED # ..
09DC..09DD ; DISALLOWED # BENGALI LETTER RRA..BENGALI LETTER RHA
09DE ; UNASSIGNED #
09DF ; DISALLOWED # BENGALI LETTER YYA
09E0..09E3 ; PVALID # BENGALI LETTER VOCALIC RR..BENGALI VOWEL SIG
09E4..09E5 ; UNASSIGNED # ..
09E6..09F1 ; PVALID # BENGALI DIGIT ZERO..BENGALI LETTER RA WITH L
09F2..09FB ; DISALLOWED # BENGALI RUPEE MARK..BENGALI GANDA MARK
Faltstrom Standards Track [Page 33]
RFC 5892 IDNA Code Points August 2010
09FC..0A00 ; UNASSIGNED # ..
0A01..0A03 ; PVALID # GURMUKHI SIGN ADAK BINDI..GURMUKHI SIGN VISA
0A04 ; UNASSIGNED #
0A05..0A0A ; PVALID # GURMUKHI LETTER A..GURMUKHI LETTER UU
0A0B..0A0E ; UNASSIGNED # ..
0A0F..0A10 ; PVALID # GURMUKHI LETTER EE..GURMUKHI LETTER AI
0A11..0A12 ; UNASSIGNED # ..
0A13..0A28 ; PVALID # GURMUKHI LETTER OO..GURMUKHI LETTER NA
0A29 ; UNASSIGNED #
0A2A..0A30 ; PVALID # GURMUKHI LETTER PA..GURMUKHI LETTER RA
0A31 ; UNASSIGNED #
0A32 ; PVALID # GURMUKHI LETTER LA
0A33 ; DISALLOWED # GURMUKHI LETTER LLA
0A34 ; UNASSIGNED #
0A35 ; PVALID # GURMUKHI LETTER VA
0A36 ; DISALLOWED # GURMUKHI LETTER SHA
0A37 ; UNASSIGNED #
0A38..0A39 ; PVALID # GURMUKHI LETTER SA..GURMUKHI LETTER HA
0A3A..0A3B ; UNASSIGNED # ..
0A3C ; PVALID # GURMUKHI SIGN NUKTA
0A3D ; UNASSIGNED #
0A3E..0A42 ; PVALID # GURMUKHI VOWEL SIGN AA..GURMUKHI VOWEL SIGN
0A43..0A46 ; UNASSIGNED # ..
0A47..0A48 ; PVALID # GURMUKHI VOWEL SIGN EE..GURMUKHI VOWEL SIGN
0A49..0A4A ; UNASSIGNED # ..
0A4B..0A4D ; PVALID # GURMUKHI VOWEL SIGN OO..GURMUKHI SIGN VIRAMA
0A4E..0A50 ; UNASSIGNED # ..
0A51 ; PVALID # GURMUKHI SIGN UDAAT
0A52..0A58 ; UNASSIGNED # ..
0A59..0A5B ; DISALLOWED # GURMUKHI LETTER KHHA..GURMUKHI LETTER ZA
0A5C ; PVALID # GURMUKHI LETTER RRA
0A5D ; UNASSIGNED #
0A5E ; DISALLOWED # GURMUKHI LETTER FA
0A5F..0A65 ; UNASSIGNED # ..
0A66..0A75 ; PVALID # GURMUKHI DIGIT ZERO..GURMUKHI SIGN YAKASH
0A76..0A80 ; UNASSIGNED # ..
0A81..0A83 ; PVALID # GUJARATI SIGN CANDRABINDU..GUJARATI SIGN VIS
0A84 ; UNASSIGNED #
0A85..0A8D ; PVALID # GUJARATI LETTER A..GUJARATI VOWEL CANDRA E
0A8E ; UNASSIGNED #
0A8F..0A91 ; PVALID # GUJARATI LETTER E..GUJARATI VOWEL CANDRA O
0A92 ; UNASSIGNED #
0A93..0AA8 ; PVALID # GUJARATI LETTER O..GUJARATI LETTER NA
0AA9 ; UNASSIGNED #
0AAA..0AB0 ; PVALID # GUJARATI LETTER PA..GUJARATI LETTER RA
0AB1 ; UNASSIGNED #
0AB2..0AB3 ; PVALID # GUJARATI LETTER LA..GUJARATI LETTER LLA
0AB4 ; UNASSIGNED #
Faltstrom Standards Track [Page 34]
RFC 5892 IDNA Code Points August 2010
0AB5..0AB9 ; PVALID # GUJARATI LETTER VA..GUJARATI LETTER HA
0ABA..0ABB ; UNASSIGNED # ..
0ABC..0AC5 ; PVALID # GUJARATI SIGN NUKTA..GUJARATI VOWEL SIGN CAN
0AC6 ; UNASSIGNED #
0AC7..0AC9 ; PVALID # GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN C
0ACA ; UNASSIGNED #
0ACB..0ACD ; PVALID # GUJARATI VOWEL SIGN O..GUJARATI SIGN VIRAMA
0ACE..0ACF ; UNASSIGNED # ..
0AD0 ; PVALID # GUJARATI OM
0AD1..0ADF ; UNASSIGNED # ..
0AE0..0AE3 ; PVALID # GUJARATI LETTER VOCALIC RR..GUJARATI VOWEL S
0AE4..0AE5 ; UNASSIGNED # ..
0AE6..0AEF ; PVALID # GUJARATI DIGIT ZERO..GUJARATI DIGIT NINE
0AF0 ; UNASSIGNED #
0AF1 ; DISALLOWED # GUJARATI RUPEE SIGN
0AF2..0B00 ; UNASSIGNED # ..
0B01..0B03 ; PVALID # ORIYA SIGN CANDRABINDU..ORIYA SIGN VISARGA
0B04 ; UNASSIGNED #
0B05..0B0C ; PVALID # ORIYA LETTER A..ORIYA LETTER VOCALIC L
0B0D..0B0E ; UNASSIGNED # ..
0B0F..0B10 ; PVALID # ORIYA LETTER E..ORIYA LETTER AI
0B11..0B12 ; UNASSIGNED # ..
0B13..0B28 ; PVALID # ORIYA LETTER O..ORIYA LETTER NA
0B29 ; UNASSIGNED #
0B2A..0B30 ; PVALID # ORIYA LETTER PA..ORIYA LETTER RA
0B31 ; UNASSIGNED #
0B32..0B33 ; PVALID # ORIYA LETTER LA..ORIYA LETTER LLA
0B34 ; UNASSIGNED #
0B35..0B39 ; PVALID # ORIYA LETTER VA..ORIYA LETTER HA
0B3A..0B3B ; UNASSIGNED # ..
0B3C..0B44 ; PVALID # ORIYA SIGN NUKTA..ORIYA VOWEL SIGN VOCALIC R
0B45..0B46 ; UNASSIGNED # ..
0B47..0B48 ; PVALID # ORIYA VOWEL SIGN E..ORIYA VOWEL SIGN AI
0B49..0B4A ; UNASSIGNED # ..
0B4B..0B4D ; PVALID # ORIYA VOWEL SIGN O..ORIYA SIGN VIRAMA
0B4E..0B55 ; UNASSIGNED # ..
0B56..0B57 ; PVALID # ORIYA AI LENGTH MARK..ORIYA AU LENGTH MARK
0B58..0B5B ; UNASSIGNED # ..
0B5C..0B5D ; DISALLOWED # ORIYA LETTER RRA..ORIYA LETTER RHA
0B5E ; UNASSIGNED #
0B5F..0B63 ; PVALID # ORIYA LETTER YYA..ORIYA VOWEL SIGN VOCALIC L
0B64..0B65 ; UNASSIGNED # ..
0B66..0B6F ; PVALID # ORIYA DIGIT ZERO..ORIYA DIGIT NINE
0B70 ; DISALLOWED # ORIYA ISSHAR
0B71 ; PVALID # ORIYA LETTER WA
0B72..0B81 ; UNASSIGNED # ..
0B82..0B83 ; PVALID # TAMIL SIGN ANUSVARA..TAMIL SIGN VISARGA
0B84 ; UNASSIGNED #
Faltstrom Standards Track [Page 35]
RFC 5892 IDNA Code Points August 2010
0B85..0B8A ; PVALID # TAMIL LETTER A..TAMIL LETTER UU
0B8B..0B8D ; UNASSIGNED # ..
0B8E..0B90 ; PVALID # TAMIL LETTER E..TAMIL LETTER AI
0B91 ; UNASSIGNED #
0B92..0B95 ; PVALID # TAMIL LETTER O..TAMIL LETTER KA
0B96..0B98 ; UNASSIGNED # ..
0B99..0B9A ; PVALID # TAMIL LETTER NGA..TAMIL LETTER CA
0B9B ; UNASSIGNED #
0B9C ; PVALID # TAMIL LETTER JA
0B9D ; UNASSIGNED #
0B9E..0B9F ; PVALID # TAMIL LETTER NYA..TAMIL LETTER TTA
0BA0..0BA2 ; UNASSIGNED # ..
0BA3..0BA4 ; PVALID # TAMIL LETTER NNA..TAMIL LETTER TA
0BA5..0BA7 ; UNASSIGNED # ..
0BA8..0BAA ; PVALID # TAMIL LETTER NA..TAMIL LETTER PA
0BAB..0BAD ; UNASSIGNED # ..
0BAE..0BB9 ; PVALID # TAMIL LETTER MA..TAMIL LETTER HA
0BBA..0BBD ; UNASSIGNED # ..
0BBE..0BC2 ; PVALID # TAMIL VOWEL SIGN AA..TAMIL VOWEL SIGN UU
0BC3..0BC5 ; UNASSIGNED # ..
0BC6..0BC8 ; PVALID # TAMIL VOWEL SIGN E..TAMIL VOWEL SIGN AI
0BC9 ; UNASSIGNED #
0BCA..0BCD ; PVALID # TAMIL VOWEL SIGN O..TAMIL SIGN VIRAMA
0BCE..0BCF ; UNASSIGNED # ..
0BD0 ; PVALID # TAMIL OM
0BD1..0BD6 ; UNASSIGNED # ..
0BD7 ; PVALID # TAMIL AU LENGTH MARK
0BD8..0BE5 ; UNASSIGNED # ..
0BE6..0BEF ; PVALID # TAMIL DIGIT ZERO..TAMIL DIGIT NINE
0BF0..0BFA ; DISALLOWED # TAMIL NUMBER TEN..TAMIL NUMBER SIGN
0BFB..0C00 ; UNASSIGNED # ..
0C01..0C03 ; PVALID # TELUGU SIGN CANDRABINDU..TELUGU SIGN VISARGA
0C04 ; UNASSIGNED #
0C05..0C0C ; PVALID # TELUGU LETTER A..TELUGU LETTER VOCALIC L
0C0D ; UNASSIGNED #
0C0E..0C10 ; PVALID # TELUGU LETTER E..TELUGU LETTER AI
0C11 ; UNASSIGNED #
0C12..0C28 ; PVALID # TELUGU LETTER O..TELUGU LETTER NA
0C29 ; UNASSIGNED #
0C2A..0C33 ; PVALID # TELUGU LETTER PA..TELUGU LETTER LLA
0C34 ; UNASSIGNED #
0C35..0C39 ; PVALID # TELUGU LETTER VA..TELUGU LETTER HA
0C3A..0C3C ; UNASSIGNED # ..
0C3D..0C44 ; PVALID # TELUGU SIGN AVAGRAHA..TELUGU VOWEL SIGN VOCA
0C45 ; UNASSIGNED #
0C46..0C48 ; PVALID # TELUGU VOWEL SIGN E..TELUGU VOWEL SIGN AI
0C49 ; UNASSIGNED #
0C4A..0C4D ; PVALID # TELUGU VOWEL SIGN O..TELUGU SIGN VIRAMA
Faltstrom Standards Track [Page 36]
RFC 5892 IDNA Code Points August 2010
0C4E..0C54 ; UNASSIGNED # ..
0C55..0C56 ; PVALID # TELUGU LENGTH MARK..TELUGU AI LENGTH MARK
0C57 ; UNASSIGNED #
0C58..0C59 ; PVALID # TELUGU LETTER TSA..TELUGU LETTER DZA
0C5A..0C5F ; UNASSIGNED # ..
0C60..0C63 ; PVALID # TELUGU LETTER VOCALIC RR..TELUGU VOWEL SIGN
0C64..0C65 ; UNASSIGNED # ..
0C66..0C6F ; PVALID # TELUGU DIGIT ZERO..TELUGU DIGIT NINE
0C70..0C77 ; UNASSIGNED # ..
0C78..0C7F ; DISALLOWED # TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF
0C80..0C81 ; UNASSIGNED # ..
0C82..0C83 ; PVALID # KANNADA SIGN ANUSVARA..KANNADA SIGN VISARGA
0C84 ; UNASSIGNED #
0C85..0C8C ; PVALID # KANNADA LETTER A..KANNADA LETTER VOCALIC L
0C8D ; UNASSIGNED #
0C8E..0C90 ; PVALID # KANNADA LETTER E..KANNADA LETTER AI
0C91 ; UNASSIGNED #
0C92..0CA8 ; PVALID # KANNADA LETTER O..KANNADA LETTER NA
0CA9 ; UNASSIGNED #
0CAA..0CB3 ; PVALID # KANNADA LETTER PA..KANNADA LETTER LLA
0CB4 ; UNASSIGNED #
0CB5..0CB9 ; PVALID # KANNADA LETTER VA..KANNADA LETTER HA
0CBA..0CBB ; UNASSIGNED # ..
0CBC..0CC4 ; PVALID # KANNADA SIGN NUKTA..KANNADA VOWEL SIGN VOCAL
0CC5 ; UNASSIGNED #
0CC6..0CC8 ; PVALID # KANNADA VOWEL SIGN E..KANNADA VOWEL SIGN AI
0CC9 ; UNASSIGNED #
0CCA..0CCD ; PVALID # KANNADA VOWEL SIGN O..KANNADA SIGN VIRAMA
0CCE..0CD4 ; UNASSIGNED # ..
0CD5..0CD6 ; PVALID # KANNADA LENGTH MARK..KANNADA AI LENGTH MARK
0CD7..0CDD ; UNASSIGNED # ..
0CDE ; PVALID # KANNADA LETTER FA
0CDF ; UNASSIGNED #
0CE0..0CE3 ; PVALID # KANNADA LETTER VOCALIC RR..KANNADA VOWEL SIG
0CE4..0CE5 ; UNASSIGNED # ..
0CE6..0CEF ; PVALID # KANNADA DIGIT ZERO..KANNADA DIGIT NINE
0CF0 ; UNASSIGNED #
0CF1..0CF2 ; DISALLOWED # KANNADA SIGN JIHVAMULIYA..KANNADA SIGN UPADH
0CF3..0D01 ; UNASSIGNED # ..
0D02..0D03 ; PVALID # MALAYALAM SIGN ANUSVARA..MALAYALAM SIGN VISA
0D04 ; UNASSIGNED #
0D05..0D0C ; PVALID # MALAYALAM LETTER A..MALAYALAM LETTER VOCALIC
0D0D ; UNASSIGNED #
0D0E..0D10 ; PVALID # MALAYALAM LETTER E..MALAYALAM LETTER AI
0D11 ; UNASSIGNED #
0D12..0D28 ; PVALID # MALAYALAM LETTER O..MALAYALAM LETTER NA
0D29 ; UNASSIGNED #
0D2A..0D39 ; PVALID # MALAYALAM LETTER PA..MALAYALAM LETTER HA
Faltstrom Standards Track [Page 37]
RFC 5892 IDNA Code Points August 2010
0D3A..0D3C ; UNASSIGNED # ..
0D3D..0D44 ; PVALID # MALAYALAM SIGN AVAGRAHA..MALAYALAM VOWEL SIG
0D45 ; UNASSIGNED #
0D46..0D48 ; PVALID # MALAYALAM VOWEL SIGN E..MALAYALAM VOWEL SIGN
0D49 ; UNASSIGNED #
0D4A..0D4D ; PVALID # MALAYALAM VOWEL SIGN O..MALAYALAM SIGN VIRAM
0D4E..0D56 ; UNASSIGNED # ..
0D57 ; PVALID # MALAYALAM AU LENGTH MARK
0D58..0D5F ; UNASSIGNED # ..
0D60..0D63 ; PVALID # MALAYALAM LETTER VOCALIC RR..MALAYALAM VOWEL
0D64..0D65 ; UNASSIGNED # ..
0D66..0D6F ; PVALID # MALAYALAM DIGIT ZERO..MALAYALAM DIGIT NINE
0D70..0D75 ; DISALLOWED # MALAYALAM NUMBER TEN..MALAYALAM FRACTION THR
0D76..0D78 ; UNASSIGNED # ..
0D79 ; DISALLOWED # MALAYALAM DATE MARK
0D7A..0D7F ; PVALID # MALAYALAM LETTER CHILLU NN..MALAYALAM LETTER
0D80..0D81 ; UNASSIGNED # ..
0D82..0D83 ; PVALID # SINHALA SIGN ANUSVARAYA..SINHALA SIGN VISARG
0D84 ; UNASSIGNED #
0D85..0D96 ; PVALID # SINHALA LETTER AYANNA..SINHALA LETTER AUYANN
0D97..0D99 ; UNASSIGNED # ..
0D9A..0DB1 ; PVALID # SINHALA LETTER ALPAPRAANA KAYANNA..SINHALA L
0DB2 ; UNASSIGNED #
0DB3..0DBB ; PVALID # SINHALA LETTER SANYAKA DAYANNA..SINHALA LETT
0DBC ; UNASSIGNED #
0DBD ; PVALID # SINHALA LETTER DANTAJA LAYANNA
0DBE..0DBF ; UNASSIGNED # ..
0DC0..0DC6 ; PVALID # SINHALA LETTER VAYANNA..SINHALA LETTER FAYAN
0DC7..0DC9 ; UNASSIGNED # ..
0DCA ; PVALID # SINHALA SIGN AL-LAKUNA
0DCB..0DCE ; UNASSIGNED # ..
0DCF..0DD4 ; PVALID # SINHALA VOWEL SIGN AELA-PILLA..SINHALA VOWEL
0DD5 ; UNASSIGNED #
0DD6 ; PVALID # SINHALA VOWEL SIGN DIGA PAA-PILLA
0DD7 ; UNASSIGNED #
0DD8..0DDF ; PVALID # SINHALA VOWEL SIGN GAETTA-PILLA..SINHALA VOW
0DE0..0DF1 ; UNASSIGNED # ..
0DF2..0DF3 ; PVALID # SINHALA VOWEL SIGN DIGA GAETTA-PILLA..SINHAL
0DF4 ; DISALLOWED # SINHALA PUNCTUATION KUNDDALIYA
0DF5..0E00 ; UNASSIGNED # ..
0E01..0E32 ; PVALID # THAI CHARACTER KO KAI..THAI CHARACTER SARA A
0E33 ; DISALLOWED # THAI CHARACTER SARA AM
0E34..0E3A ; PVALID # THAI CHARACTER SARA I..THAI CHARACTER PHINTH
0E3B..0E3E ; UNASSIGNED # ..
0E3F ; DISALLOWED # THAI CURRENCY SYMBOL BAHT
0E40..0E4E ; PVALID # THAI CHARACTER SARA E..THAI CHARACTER YAMAKK
0E4F ; DISALLOWED # THAI CHARACTER FONGMAN
0E50..0E59 ; PVALID # THAI DIGIT ZERO..THAI DIGIT NINE
Faltstrom Standards Track [Page 38]
RFC 5892 IDNA Code Points August 2010
0E5A..0E5B ; DISALLOWED # THAI CHARACTER ANGKHANKHU..THAI CHARACTER KH
0E5C..0E80 ; UNASSIGNED # ..
0E81..0E82 ; PVALID # LAO LETTER KO..LAO LETTER KHO SUNG
0E83 ; UNASSIGNED #
0E84 ; PVALID # LAO LETTER KHO TAM
0E85..0E86 ; UNASSIGNED # ..
0E87..0E88 ; PVALID # LAO LETTER NGO..LAO LETTER CO
0E89 ; UNASSIGNED #
0E8A ; PVALID # LAO LETTER SO TAM
0E8B..0E8C ; UNASSIGNED # ..
0E8D ; PVALID # LAO LETTER NYO
0E8E..0E93 ; UNASSIGNED # ..
0E94..0E97 ; PVALID # LAO LETTER DO..LAO LETTER THO TAM
0E98 ; UNASSIGNED #
0E99..0E9F ; PVALID # LAO LETTER NO..LAO LETTER FO SUNG
0EA0 ; UNASSIGNED #
0EA1..0EA3 ; PVALID # LAO LETTER MO..LAO LETTER LO LING
0EA4 ; UNASSIGNED #