TOC |
|
This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.
This Internet-Draft will expire on May 5, 2004.
Copyright (C) The Internet Society (2003). All Rights Reserved.
This document contains test vectors for Nameprep and IDNA.
TOC |
TOC |
The Nameprep and IDNA specifications lack thorough examples that would have aided in implementing them. This document act as a complement to those specifications providing such examples.
It should be pointed out that this document is not normative, and thus any errors in this document should not be treated as gospel that defines Nameprep nor IDNA. When conforming to the specification and generating output corresponding to values in this document is in conflict, implementations should conform to the specification.
TOC |
The tests follow a certain syntax, described here by showing one complete example with comments intermixed. The comments are prefixed with the '#' character.
# First the (UTF-8) string is printed as a C octet string, with # characters [A-Za-z .0-9] shown inline and other characters shown # escaped with \xAB where AB is the hex sequence of that octet. The # number of octets are also shown. in (length 3 bytes): \xE1\xBE\xB7 # The input is also printed as Unicode codepoints. input (length 1): U+1fb7 # After printing the input, the nameprep steps starts. When the # string is modified, the specific operation that caused it is printed # along with the new string of Unicode code points. # 1) Map -- For each character in the input, check if it has a mapping # and, if so, replace it with its mapping. This is described in # section 3. Table B.2 maps U+1fb7 to U+03b1 U+0342 U+03b9. U+03b1 U+0342 U+03b9 # 2) Normalize -- Possibly normalize the result of step 1 using Unicode # normalization. This is described in section 4. Unicode normalization with form KC maps string into: U+1fb6 U+03b9 # 3) Prohibit -- Check for any characters that are not allowed in the # output. If any are found, return an error. This is described in # section 5. # 4) Check bidi -- Possibly check for right-to-left characters, and if # any are found, make sure that the whole string satisfies the # requirements for bidirectional strings. If the string does not # satisfy the requirements for bidirectional strings, return an # error. This is described in section 6. # # 1) The characters in section 5.8 MUST be prohibited. # 2) If a string contains any RandALCat character, the string MUST NOT # contain any LCat character. # 3) If a string contains any RandALCat character, a RandALCat # character MUST be the first character of the string, and a # RandALCat character MUST be the last character of the string. # The output is printed as Unicode codepoints. output (length 2): U+1fb6 U+03b9 # And finally the output is printed as UTF-8 out (length 5 bytes): \xE1\xBE\xB6\xCE\xB9
TOC |
The tests follow a certain syntax, described here by showing one complete example with comments intermixed. The comments are prefixed with the '#' character.
# First the (UTF-8) string is printed as a C octet string, with # characters [A-Za-z .0-9] shown inline and other characters shown # escaped with \xAB where AB is the hex sequence of that octet. The # number of octets are also shown. in (length 39 bytes): 'Hello\x2DAnother\x2DWa' 'y\x2D\xE3\x81\x9D\xE3\x82\x8C\xE3\x81\x9E\xE3\x82\x8C\xE3\x81' '\xAE\xE5\xA0\xB4\xE6\x89\x80 # The input is also printed as Unicode codepoints. input (length 39): U+0048 U+0065 U+006c U+006c U+006f U+002d U+0041 U+006e U+006f U+0074 U+0068 U+0065 U+0072 U+002d U+0057 U+0061 U+0079 U+002d U+305d U+308c U+305e U+308c U+306e U+5834 U+6240 # After printing the input, the IDNA ToASCII step starts. The output # is printed as an ASCII string. out: xn--hello-another-way--fc4qua05auwb3674vfr0b
TOC |
TOC |
in (length 34 bytes): '\xD9\x84\xD9\x8A\xD9\x87\xD9\x85\xD8\xA7\xD8\xA8\xD8\xAA\xD9\x83' '\xD9\x84\xD9\x85\xD9\x88\xD8\xB4\xD8\xB9\xD8\xB1\xD8\xA8\xD9\x8A' '\xD8\x9F input (length 34): U+0644 U+064a U+0647 U+0645 U+0627 U+0628 U+062a U+0643 U+0644 U+0645 U+0648 U+0634 U+0639 U+0631 U+0628 U+064a U+061f out: xn--egbpdaj6bu4bxfgehfvwxn
in (length 27 bytes): '\xE4\xBB\x96\xE4\xBB\xAC\xE4\xB8\xBA\xE4\xBB\x80\xE4\xB9\x88\xE4' '\xB8\x8D\xE8\xAF\xB4\xE4\xB8\xAD\xE6\x96\x87 input (length 27): U+4ed6 U+4eec U+4e3a U+4ec0 U+4e48 U+4e0d U+8bf4 U+4e2d U+6587 out: xn--ihqwcrb4cv8a8dqg056pqjye
in (length 27 bytes): '\xE4\xBB\x96\xE5\x80\x91\xE7\x88\xB2\xE4\xBB\x80\xE9\xBA\xBD\xE4' '\xB8\x8D\xE8\xAA\xAA\xE4\xB8\xAD\xE6\x96\x87 input (length 27): U+4ed6 U+5011 U+7232 U+4ec0 U+9ebd U+4e0d U+8aaa U+4e2d U+6587 out: xn--ihqwctvzc91f659drss3x8bo0yb
in (length 26 bytes): 'Pro\xC4\x8Dprost\xC4\x9Bneml' 'uv\xC3\xAD\xC4\x8Desky input (length 26): U+0050 U+0072 U+006f U+010d U+0070 U+0072 U+006f U+0073 U+0074 U+011b U+006e U+0065 U+006d U+006c U+0075 U+0076 U+00ed U+010d U+0065 U+0073 U+006b U+0079 out: xn--proprostnemluvesky-uyb24dma41a
in (length 44 bytes): '\xD7\x9C\xD7\x9E\xD7\x94\xD7\x94\xD7\x9D\xD7\xA4\xD7\xA9\xD7\x95' '\xD7\x98\xD7\x9C\xD7\x90\xD7\x9E\xD7\x93\xD7\x91\xD7\xA8\xD7\x99' '\xD7\x9D\xD7\xA2\xD7\x91\xD7\xA8\xD7\x99\xD7\xAA input (length 44): U+05dc U+05de U+05d4 U+05d4 U+05dd U+05e4 U+05e9 U+05d5 U+05d8 U+05dc U+05d0 U+05de U+05d3 U+05d1 U+05e8 U+05d9 U+05dd U+05e2 U+05d1 U+05e8 U+05d9 U+05ea out: xn--4dbcagdahymbxekheh6e0a7fei0b
in (length 90 bytes): '\xE0\xA4\xAF\xE0\xA4\xB9\xE0\xA4\xB2\xE0\xA5\x8B\xE0\xA4\x97\xE0' '\xA4\xB9\xE0\xA4\xBF\xE0\xA4\xA8\xE0\xA5\x8D\xE0\xA4\xA6\xE0\xA5' '\x80\xE0\xA4\x95\xE0\xA5\x8D\xE0\xA4\xAF\xE0\xA5\x8B\xE0\xA4\x82' '\xE0\xA4\xA8\xE0\xA4\xB9\xE0\xA5\x80\xE0\xA4\x82\xE0\xA4\xAC\xE0' '\xA5\x8B\xE0\xA4\xB2\xE0\xA4\xB8\xE0\xA4\x95\xE0\xA4\xA4\xE0\xA5' '\x87\xE0\xA4\xB9\xE0\xA5\x88\xE0\xA4\x82 input (length 90): U+092f U+0939 U+0932 U+094b U+0917 U+0939 U+093f U+0928 U+094d U+0926 U+0940 U+0915 U+094d U+092f U+094b U+0902 U+0928 U+0939 U+0940 U+0902 U+092c U+094b U+0932 U+0938 U+0915 U+0924 U+0947 U+0939 U+0948 U+0902 out: xn--i1baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd
in (length 54 bytes): '\xE3\x81\xAA\xE3\x81\x9C\xE3\x81\xBF\xE3\x82\x93\xE3\x81\xAA\xE6' '\x97\xA5\xE6\x9C\xAC\xE8\xAA\x9E\xE3\x82\x92\xE8\xA9\xB1\xE3\x81' '\x97\xE3\x81\xA6\xE3\x81\x8F\xE3\x82\x8C\xE3\x81\xAA\xE3\x81\x84' '\xE3\x81\xAE\xE3\x81\x8B input (length 54): U+306a U+305c U+307f U+3093 U+306a U+65e5 U+672c U+8a9e U+3092 U+8a71 U+3057 U+3066 U+304f U+308c U+306a U+3044 U+306e U+304b out: xn--n8jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa
in (length 56 bytes): '\xD0\xBF\xD0\xBE\xD1\x87\xD0\xB5\xD0\xBC\xD1\x83\xD0\xB6\xD0\xB5' '\xD0\xBE\xD0\xBD\xD0\xB8\xD0\xBD\xD0\xB5\xD0\xB3\xD0\xBE\xD0\xB2' '\xD0\xBE\xD1\x80\xD1\x8F\xD1\x82\xD0\xBF\xD0\xBE\xD1\x80\xD1\x83' '\xD1\x81\xD1\x81\xD0\xBA\xD0\xB8 input (length 56): U+043f U+043e U+0447 U+0435 U+043c U+0443 U+0436 U+0435 U+043e U+043d U+0438 U+043d U+0435 U+0433 U+043e U+0432 U+043e U+0440 U+044f U+0442 U+043f U+043e U+0440 U+0443 U+0441 U+0441 U+043a U+0438 out: xn--b1abfaaepdrnnbgefbadotcwatmq2g4l
in (length 42 bytes): 'Porqu\xC3\xA9nopuedens' 'implementehablar' 'enEspa\xC3\xB1ol input (length 42): U+0050 U+006f U+0072 U+0071 U+0075 U+00e9 U+006e U+006f U+0070 U+0075 U+0065 U+0064 U+0065 U+006e U+0073 U+0069 U+006d U+0070 U+006c U+0065 U+006d U+0065 U+006e U+0074 U+0065 U+0068 U+0061 U+0062 U+006c U+0061 U+0072 U+0065 U+006e U+0045 U+0073 U+0070 U+0061 U+00f1 U+006f U+006c out: xn--porqunopuedensimplementehablarenespaol-fmd56a
in (length 45 bytes): 'T\xE1\xBA\xA1isaoh\xE1\xBB\x8Dkh\xC3\xB4' 'ngth\xE1\xBB\x83ch\xE1\xBB\x89n\xC3\xB3i' 'ti\xE1\xBA\xBFngVi\xE1\xBB\x87t input (length 45): U+0054 U+1ea1 U+0069 U+0073 U+0061 U+006f U+0068 U+1ecd U+006b U+0068 U+00f4 U+006e U+0067 U+0074 U+0068 U+1ec3 U+0063 U+0068 U+1ec9 U+006e U+00f3 U+0069 U+0074 U+0069 U+1ebf U+006e U+0067 U+0056 U+0069 U+1ec7 U+0074 out: xn--tisaohkhngthchnitingvit-kjcr8268qyxafd2f1b9g
in (length 20 bytes): '3\xE5\xB9\xB4B\xE7\xB5\x84\xE9\x87\x91\xE5\x85\xAB\xE5\x85' '\x88\xE7\x94\x9F input (length 20): U+0033 U+5e74 U+0042 U+7d44 U+91d1 U+516b U+5148 U+751f out: xn--3b-ww4c5e180e575a65lsy2b
in (length 34 bytes): '\xE5\xAE\x89\xE5\xAE\xA4\xE5\xA5\x88\xE7\xBE\x8E\xE6\x81\xB5\x2D' 'with\x2DSUPER\x2DMONKE' 'YS input (length 34): U+5b89 U+5ba4 U+5948 U+7f8e U+6075 U+002d U+0077 U+0069 U+0074 U+0068 U+002d U+0053 U+0055 U+0050 U+0045 U+0052 U+002d U+004d U+004f U+004e U+004b U+0045 U+0059 U+0053 out: xn---with-super-monkeys-pc58ag80a8qai00g7n9n
in (length 39 bytes): 'Hello\x2DAnother\x2DWa' 'y\x2D\xE3\x81\x9D\xE3\x82\x8C\xE3\x81\x9E\xE3\x82\x8C\xE3\x81' '\xAE\xE5\xA0\xB4\xE6\x89\x80 input (length 39): U+0048 U+0065 U+006c U+006c U+006f U+002d U+0041 U+006e U+006f U+0074 U+0068 U+0065 U+0072 U+002d U+0057 U+0061 U+0079 U+002d U+305d U+308c U+305e U+308c U+306e U+5834 U+6240 out: xn--hello-another-way--fc4qua05auwb3674vfr0b
in (length 22 bytes): '\xE3\x81\xB2\xE3\x81\xA8\xE3\x81\xA4\xE5\xB1\x8B\xE6\xA0\xB9\xE3' '\x81\xAE\xE4\xB8\x8B2 input (length 22): U+3072 U+3068 U+3064 U+5c4b U+6839 U+306e U+4e0b U+0032 out: xn--2-u9tlzr9756bt3uc0v
in (length 23 bytes): 'Maji\xE3\x81\xA7Koi\xE3\x81\x99\xE3\x82\x8B' '5\xE7\xA7\x92\xE5\x89\x8D input (length 23): U+004d U+0061 U+006a U+0069 U+3067 U+004b U+006f U+0069 U+3059 U+308b U+0035 U+79d2 U+524d out: xn--majikoi5-783gue6qz075azm5e
in (length 23 bytes): '\xE3\x83\x91\xE3\x83\x95\xE3\x82\xA3\xE3\x83\xBCde\xE3\x83' '\xAB\xE3\x83\xB3\xE3\x83\x90 input (length 23): U+30d1 U+30d5 U+30a3 U+30fc U+0064 U+0065 U+30eb U+30f3 U+30d0 out: xn--de-jg4avhby1noc0d
in (length 21 bytes): '\xE3\x81\x9D\xE3\x81\xAE\xE3\x82\xB9\xE3\x83\x94\xE3\x83\xBC\xE3' '\x83\x89\xE3\x81\xA7 input (length 21): U+305d U+306e U+30b9 U+30d4 U+30fc U+30c9 U+3067 out: xn--d9juau41awczczp
in (length 16 bytes): '\xCE\xB5\xCE\xBB\xCE\xBB\xCE\xB7\xCE\xBD\xCE\xB9\xCE\xBA\xCE\xAC input (length 16): U+03b5 U+03bb U+03bb U+03b7 U+03bd U+03b9 U+03ba U+03ac out: xn--hxargifdar
in (length 13 bytes): 'bon\xC4\xA1usa\xC4\xA7\xC4\xA7a input (length 13): U+0062 U+006f U+006e U+0121 U+0075 U+0073 U+0061 U+0127 U+0127 U+0061 out: xn--bonusaa-5bb1da
in (length 56 bytes): '\xD0\xBF\xD0\xBE\xD1\x87\xD0\xB5\xD0\xBC\xD1\x83\xD0\xB6\xD0\xB5' '\xD0\xBE\xD0\xBD\xD0\xB8\xD0\xBD\xD0\xB5\xD0\xB3\xD0\xBE\xD0\xB2' '\xD0\xBE\xD1\x80\xD1\x8F\xD1\x82\xD0\xBF\xD0\xBE\xD1\x80\xD1\x83' '\xD1\x81\xD1\x81\xD0\xBA\xD0\xB8 input (length 56): U+043f U+043e U+0447 U+0435 U+043c U+0443 U+0436 U+0435 U+043e U+043d U+0438 U+043d U+0435 U+0433 U+043e U+0432 U+043e U+0440 U+044f U+0442 U+043f U+043e U+0440 U+0443 U+0441 U+0441 U+043a U+0438 out: xn--b1abfaaepdrnnbgefbadotcwatmq2g4l
TOC |
TOC |
These test vectors do not test Nameprep nor IDNA proper, rather they test the UTF-8 handling of software. Instead of outputting the indicated Unicode code point, they should raise an error that the input was invalid.
in (length 2 bytes): \xC3\xDF output (length 1): U+00df
in (length 2 bytes): \xC7\xF0 output (length 1): U+01f0
TOC |
The security considerations from Nameprep and IDNA are inherited.
These test vectors are not believed to introduce new security considerations nor disrupt the operation of the Internet, but may expose security weaknesses in existing implementations. Any such incident should not be regarded as a problem with this document, though, but rather taken as evidence that this document served its purpose.
TOC |
[1] | Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", RFC 3491, March 2003. |
[2] | Faltstrom, P., Hoffman, P. and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003. |
TOC |
[3] | Costello, A., "Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)", RFC 3492, March 2003. |
TOC |
Simon Josefsson | |
EMail: | [email protected] |
Some IDNA test vectors were borrowed from Punycode [3]. Several people pointed out invalid UTF-8 encodings used in earlier versions of the draft.
TOC |
In order to avoid having implementors type in the test vectors above, a C structure with the data is provided.
The comment field is the section titles used in this document. The in field contains UTF-8 encoded strings. The out field contains expected output, or NULL if the expected result is an error. The profile field can be ignored. The only significant setting for the flags field is STRINGPREP_NO_UNASSIGNED which signals to the Nameprep implementation that it should perform unassigned code point checking, aka the "AllowUnassigned" flag. The rc field contains expected error codes, where 0 indicates success and the other flags should be self explanatory.
struct stringprep { char *comment; char *in; char *out; char *profile; int flags; int rc; } strprep[] = { { "Map to nothing", "foo\xC2\xAD\xCD\x8F\xE1\xA0\x86\xE1\xA0\x8B" "bar""\xE2\x80\x8B\xE2\x81\xA0""baz\xEF\xB8\x80\xEF\xB8\x88" "\xEF\xB8\x8F\xEF\xBB\xBF", "foobarbaz" }, { "Case folding ASCII U+0043 U+0041 U+0046 U+0045", "CAFE", "cafe" }, { "Case folding 8bit U+00DF (german sharp s)", "\xC3\x9F", "ss" }, { "Case folding U+0130 (turkish capital I with dot)", "\xC4\xB0", "i\xcc\x87" }, { "Case folding multibyte U+0143 U+037A", "\xC5\x83\xCD\xBA", "\xC5\x84 \xCE\xB9" }, { "Case folding U+2121 U+33C6 U+1D7BB", "\xE2\x84\xA1\xE3\x8F\x86\xF0\x9D\x9E\xBB", "telc\xE2\x88\x95""kg\xCF\x83" }, { "Normalization of U+006a U+030c U+00A0 U+00AA", "\x6A\xCC\x8C\xC2\xA0\xC2\xAA", "\xC7\xB0 a" }, { "Case folding U+1FB7 and normalization", "\xE1\xBE\xB7", "\xE1\xBE\xB6\xCE\xB9" }, { "Self-reverting case folding U+01F0 and normalization", "\xC7\xF0", "\xC7\xB0" }, { "Self-reverting case folding U+0390 and normalization", "\xCE\x90", "\xCE\x90" }, { "Self-reverting case folding U+03B0 and normalization", "\xCE\xB0", "\xCE\xB0" }, { "Self-reverting case folding U+1E96 and normalization", "\xE1\xBA\x96", "\xE1\xBA\x96" }, { "Self-reverting case folding U+1F56 and normalization", "\xE1\xBD\x96", "\xE1\xBD\x96" }, { "ASCII space character U+0020", "\x20", "\x20" }, { "Non-ASCII 8bit space character U+00A0", "\xC2\xA0", "\x20" }, { "Non-ASCII multibyte space character U+1680", "\xE1\x9A\x80", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Non-ASCII multibyte space character U+2000", "\xE2\x80\x80", "\x20" }, { "Zero Width Space U+200b", "\xE2\x80\x8b", "" }, { "Non-ASCII multibyte space character U+3000", "\xE3\x80\x80", "\x20" }, { "ASCII control characters U+0010 U+007F", "\x10\x7F", "\x10\x7F" }, { "Non-ASCII 8bit control character U+0085", "\xC2\x85", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Non-ASCII multibyte control character U+180E", "\xE1\xA0\x8E", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Zero Width No-Break Space U+FEFF", "\xEF\xBB\xBF", "" }, { "Non-ASCII control character U+1D175", "\xF0\x9D\x85\xB5", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Plane 0 private use character U+F123", "\xEF\x84\xA3", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Plane 15 private use character U+F1234", "\xF3\xB1\x88\xB4", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Plane 16 private use character U+10F234", "\xF4\x8F\x88\xB4", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Non-character code point U+8FFFE", "\xF2\x8F\xBF\xBE", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Non-character code point U+10FFFF", "\xF4\x8F\xBF\xBF", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Surrogate code U+DF42", "\xED\xBD\x82", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Non-plain text character U+FFFD", "\xEF\xBF\xBD", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Ideographic description character U+2FF5", "\xE2\xBF\xB5", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Display property character U+0341", "\xCD\x81", "\xCC\x81" }, { "Left-to-right mark U+200E", "\xE2\x80\x8E", "\xCC\x81", "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Deprecated U+202A", "\xE2\x80\xAA", "\xCC\x81", "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Language tagging character U+E0001", "\xF3\xA0\x80\x81", "\xCC\x81", "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Language tagging character U+E0042", "\xF3\xA0\x81\x82", NULL, "Nameprep", 0, STRINGPREP_CONTAINS_PROHIBITED }, { "Bidi: RandALCat character U+05BE and LCat characters", "foo\xD6\xBE""bar", NULL, "Nameprep", 0, STRINGPREP_BIDI_BOTH_L_AND_RAL }, { "Bidi: RandALCat character U+FD50 and LCat characters", "foo\xEF\xB5\x90""bar", NULL, "Nameprep", 0, STRINGPREP_BIDI_BOTH_L_AND_RAL }, { "Bidi: RandALCat character U+FB38 and LCat characters", "foo\xEF\xB9\xB6""bar", "foo \xd9\x8e""bar" }, { "Bidi: RandALCat without trailing RandALCat U+0627 U+0031", "\xD8\xA7\x31", NULL, "Nameprep", 0, STRINGPREP_BIDI_LEADTRAIL_NOT_RAL} , { "Bidi: RandALCat character U+0627 U+0031 U+0628", "\xD8\xA7\x31\xD8\xA8", "\xD8\xA7\x31\xD8\xA8" }, { "Unassigned code point U+E0002", "\xF3\xA0\x80\x82", NULL, "Nameprep", STRINGPREP_NO_UNASSIGNED, STRINGPREP_CONTAINS_UNASSIGNED }, { "Larger test (shrinking)", "X\xC2\xAD\xC3\x9F\xC4\xB0\xE2\x84\xA1\x6a\xcc\x8c\xc2\xa0\xc2" "\xaa\xce\xb0\xe2\x80\x80", "xssi\xcc\x87""tel\xc7\xb0 a\xce\xb0 ", "Nameprep" }, { "Larger test (expanding)", "X\xC3\x9F\xe3\x8c\x96\xC4\xB0\xE2\x84\xA1\xE2\x92\x9F\xE3\x8c\x80", "xss\xe3\x82\xad\xe3\x83\xad\xe3\x83\xa1\xe3\x83\xbc\xe3\x83\x88" "\xe3\x83\xab""i\xcc\x87""tel\x28""d\x29\xe3\x82\xa2\xe3\x83\x91" "\xe3\x83\xbc\xe3\x83\x88" }, };
TOC |
In order to avoid having implementors type in the IDNA test vectors above, a C structure with the data is provided.
The name field is the section titles used in this document. The inlen and in field contains Unicode code points. The out field contains expected ToASCII output. The allowunassigned, and usestd3asciirules can be ignored. The toasciirc and tounicoderc field contains expected error codes, where 0 indicates success and the other flags should be self explanatory.
struct idna { char *name; size_t inlen; unsigned long in[100]; char *out; int allowunassigned; int usestd3asciirules; int toasciirc; int tounicoderc; } idna[] = { { "Arabic (Egyptian)", 17, { 0x0644, 0x064A, 0x0647, 0x0645, 0x0627, 0x0628, 0x062A, 0x0643, 0x0644, 0x0645, 0x0648, 0x0634, 0x0639, 0x0631, 0x0628, 0x064A, 0x061F}, IDNA_ACE_PREFIX "egbpdaj6bu4bxfgehfvwxn", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Chinese (simplified)", 9, { 0x4ED6, 0x4EEC, 0x4E3A, 0x4EC0, 0x4E48, 0x4E0D, 0x8BF4, 0x4E2D, 0x6587}, IDNA_ACE_PREFIX "ihqwcrb4cv8a8dqg056pqjye", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Chinese (traditional)", 9, { 0x4ED6, 0x5011, 0x7232, 0x4EC0, 0x9EBD, 0x4E0D, 0x8AAA, 0x4E2D, 0x6587}, IDNA_ACE_PREFIX "ihqwctvzc91f659drss3x8bo0yb", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Czech", 22, { 0x0050, 0x0072, 0x006F, 0x010D, 0x0070, 0x0072, 0x006F, 0x0073, 0x0074, 0x011B, 0x006E, 0x0065, 0x006D, 0x006C, 0x0075, 0x0076, 0x00ED, 0x010D, 0x0065, 0x0073, 0x006B, 0x0079}, IDNA_ACE_PREFIX "Proprostnemluvesky-uyb24dma41a", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Hebrew", 22, { 0x05DC, 0x05DE, 0x05D4, 0x05D4, 0x05DD, 0x05E4, 0x05E9, 0x05D5, 0x05D8, 0x05DC, 0x05D0, 0x05DE, 0x05D3, 0x05D1, 0x05E8, 0x05D9, 0x05DD, 0x05E2, 0x05D1, 0x05E8, 0x05D9, 0x05EA}, IDNA_ACE_PREFIX "4dbcagdahymbxekheh6e0a7fei0b", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Hindi (Devanagari)", 30, { 0x092F, 0x0939, 0x0932, 0x094B, 0x0917, 0x0939, 0x093F, 0x0928, 0x094D, 0x0926, 0x0940, 0x0915, 0x094D, 0x092F, 0x094B, 0x0902, 0x0928, 0x0939, 0x0940, 0x0902, 0x092C, 0x094B, 0x0932, 0x0938, 0x0915, 0x0924, 0x0947, 0x0939, 0x0948, 0x0902}, IDNA_ACE_PREFIX "i1baa7eci9glrd9b2ae1bj0hfcgg6iyaf8o0a1dig0cd", 0, 0, IDNA_SUCCESS}, { "Japanese (kanji and hiragana)", 18, { 0x306A, 0x305C, 0x307F, 0x3093, 0x306A, 0x65E5, 0x672C, 0x8A9E, 0x3092, 0x8A71, 0x3057, 0x3066, 0x304F, 0x308C, 0x306A, 0x3044, 0x306E, 0x304B}, IDNA_ACE_PREFIX "n8jok5ay5dzabd5bym9f0cm5685rrjetr6pdxa", 0, 0, IDNA_SUCCESS}, { "Russian (Cyrillic)", 28, { 0x043F, 0x043E, 0x0447, 0x0435, 0x043C, 0x0443, 0x0436, 0x0435, 0x043E, 0x043D, 0x0438, 0x043D, 0x0435, 0x0433, 0x043E, 0x0432, 0x043E, 0x0440, 0x044F, 0x0442, 0x043F, 0x043E, 0x0440, 0x0443, 0x0441, 0x0441, 0x043A, 0x0438}, IDNA_ACE_PREFIX "b1abfaaepdrnnbgefbadotcwatmq2g4l", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Spanish", 40, { 0x0050, 0x006F, 0x0072, 0x0071, 0x0075, 0x00E9, 0x006E, 0x006F, 0x0070, 0x0075, 0x0065, 0x0064, 0x0065, 0x006E, 0x0073, 0x0069, 0x006D, 0x0070, 0x006C, 0x0065, 0x006D, 0x0065, 0x006E, 0x0074, 0x0065, 0x0068, 0x0061, 0x0062, 0x006C, 0x0061, 0x0072, 0x0065, 0x006E, 0x0045, 0x0073, 0x0070, 0x0061, 0x00F1, 0x006F, 0x006C}, IDNA_ACE_PREFIX "PorqunopuedensimplementehablarenEspaol-fmd56a", 0, 0, IDNA_SUCCESS}, { "Vietnamese", 31, { 0x0054, 0x1EA1, 0x0069, 0x0073, 0x0061, 0x006F, 0x0068, 0x1ECD, 0x006B, 0x0068, 0x00F4, 0x006E, 0x0067, 0x0074, 0x0068, 0x1EC3, 0x0063, 0x0068, 0x1EC9, 0x006E, 0x00F3, 0x0069, 0x0074, 0x0069, 0x1EBF, 0x006E, 0x0067, 0x0056, 0x0069, 0x1EC7, 0x0074}, IDNA_ACE_PREFIX "TisaohkhngthchnitingVit-kjcr8268qyxafd2f1b9g", 0, 0, IDNA_SUCCESS}, { "Japanese", 8, { 0x0033, 0x5E74, 0x0042, 0x7D44, 0x91D1, 0x516B, 0x5148, 0x751F}, IDNA_ACE_PREFIX "3B-ww4c5e180e575a65lsy2b", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Japanese", 24, { 0x5B89, 0x5BA4, 0x5948, 0x7F8E, 0x6075, 0x002D, 0x0077, 0x0069, 0x0074, 0x0068, 0x002D, 0x0053, 0x0055, 0x0050, 0x0045, 0x0052, 0x002D, 0x004D, 0x004F, 0x004E, 0x004B, 0x0045, 0x0059, 0x0053}, IDNA_ACE_PREFIX "-with-SUPER-MONKEYS-pc58ag80a8qai00g7n9n", 0, 0, IDNA_SUCCESS}, { "Japanese", 25, { 0x0048, 0x0065, 0x006C, 0x006C, 0x006F, 0x002D, 0x0041, 0x006E, 0x006F, 0x0074, 0x0068, 0x0065, 0x0072, 0x002D, 0x0057, 0x0061, 0x0079, 0x002D, 0x305D, 0x308C, 0x305E, 0x308C, 0x306E, 0x5834, 0x6240}, IDNA_ACE_PREFIX "Hello-Another-Way--fc4qua05auwb3674vfr0b", 0, 0, IDNA_SUCCESS}, { "Japanese", 8, { 0x3072, 0x3068, 0x3064, 0x5C4B, 0x6839, 0x306E, 0x4E0B, 0x0032}, IDNA_ACE_PREFIX "2-u9tlzr9756bt3uc0v", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Japanese", 13, { 0x004D, 0x0061, 0x006A, 0x0069, 0x3067, 0x004B, 0x006F, 0x0069, 0x3059, 0x308B, 0x0035, 0x79D2, 0x524D}, IDNA_ACE_PREFIX "MajiKoi5-783gue6qz075azm5e", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Japanese", 9, { 0x30D1, 0x30D5, 0x30A3, 0x30FC, 0x0064, 0x0065, 0x30EB, 0x30F3, 0x30D0}, IDNA_ACE_PREFIX "de-jg4avhby1noc0d", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Japanese", 7, { 0x305D, 0x306E, 0x30B9, 0x30D4, 0x30FC, 0x30C9, 0x3067}, IDNA_ACE_PREFIX "d9juau41awczczp", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Greek", 8, {0x03b5, 0x03bb, 0x03bb, 0x03b7, 0x03bd, 0x03b9, 0x03ba, 0x03ac}, IDNA_ACE_PREFIX "hxargifdar", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Maltese (Malti)", 10, {0x0062, 0x006f, 0x006e, 0x0121, 0x0075, 0x0073, 0x0061, 0x0127, 0x0127, 0x0061}, IDNA_ACE_PREFIX "bonusaa-5bb1da", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, { "Russian (Cyrillic)", 28, {0x043f, 0x043e, 0x0447, 0x0435, 0x043c, 0x0443, 0x0436, 0x0435, 0x043e, 0x043d, 0x0438, 0x043d, 0x0435, 0x0433, 0x043e, 0x0432, 0x043e, 0x0440, 0x044f, 0x0442, 0x043f, 0x043e, 0x0440, 0x0443, 0x0441, 0x0441, 0x043a, 0x0438}, IDNA_ACE_PREFIX "b1abfaaepdrnnbgefbadotcwatmq2g4l", 0, 0, IDNA_SUCCESS, IDNA_SUCCESS}, };
TOC |
The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat.
The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director.
Copyright (C) The Internet Society (2003). All Rights Reserved.
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English.
The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assignees.
This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Funding for the RFC Editor function is currently provided by the Internet Society.