Next: GNU Free Documentation License, Previous: PR29 discussion, Up: GNU Libidn [Contents][Index]
Some strings contains characters whose NFKC normalized form contain the ASCII dot (0x2E, “.”). Examples of these characters are U+2024 (ONE DOT LEADER) and U+248C (DIGIT FIVE FULL STOP). The strings have the interesting property that their IDNA ToASCII output will contain embedded dots. For example:
ToASCII (hi U+248C com) = hi5.com ToASCII (räksmörgås U+2024 com) = xn--rksmrgs.com-l8as9u
This demonstrate the two general cases: The first where the ASCII dot
is part of an output that do not begin with the IDN prefix
xn--
. The second example illustrate when the dot is part of
IDN prefixed with xn--
.
The input strings are, from the DNS point of view, a single label.
The IDNA algorithm translate one label at a time. Thus, the output is
expected to be only one label. What is important here is to make sure
the DNS resolver receives the correct query. The DNS protocol does
not use the dot to delimit labels on the wire, rather it uses
length-value pairs. Thus the correct query would be for
{7}hi5.com
and {22}xn--rksmrgs.com-l8as9u
respectively.
Some implementations 1 have decided that
these inputs strings are potentially confusing for the user. The
string hi U+248C com
looks like hi5.com
on systems that
support Unicode properly. These implementations do not follow RFC
3490. They yield:
ToASCII (hi U+248C com) = hi5.com ToASCII (räksmörgås U+2024 com) = xn--rksmrgs-5wao1o.com
The DNS query they perform are {3}hi5{3}com
and
{18}xn--rksmrgs-5wao1o{3}com
respectively. Arguably, this
leads to a better user experience, and suggests that the IDNA
specification is sub-optimal in this area.
It has been suggested to normalize the entire input string using NFKC
before passing it to IDNA ToASCII. You may use
stringprep_utf8_nfkc_normalize
or
stringprep_ucs4_nfkc_normalize
. This appears to lead to
similar behaviour as IE/Firefox, which would avoid the problem, but
this needs to be confirmed. Feel free to discuss the issue with us.
Alternative workarounds are being considered. Eventually Libidn may
implement a new flag to the idna_*
functions that implements a
recommended way to work around this problem.
Next: GNU Free Documentation License, Previous: PR29 discussion, Up: GNU Libidn [Contents][Index]