The transcoder facilities are exported by (rnrs io ports)
.
Several different Unicode encoding schemes describe standard ways to encode characters and strings as byte sequences and to decode those sequences. Within this document, a codec is an immutable Scheme object that represents a Unicode or similar encoding scheme.
An end-of-line style is a symbol that, if it is not none
,
describes how a textual port transcodes representations of line endings.
A transcoder is an immutable Scheme object that combines a codec with an end-of-line style and a method for handling decoding errors. Each transcoder represents some specific bidirectional (but not necessarily lossless), possibly stateful translation between byte sequences and Unicode characters and strings. Every transcoder can operate in the input direction (bytes to characters) or in the output direction (characters to bytes). A transcoder parameter name means that the corresponding argument must be a transcoder.
A binary port is a port that supports binary I/O, does not have an associated transcoder and does not support textual I/O. A textual port is a port that supports textual I/O, and does not support binary I/O. A textual port may or may not have an associated transcoder.
These are predefined codecs for the ISO 8859-1, UTF-8, and UTF-16 encoding schemes.
A call to any of these procedures returns a value that is equal in the
sense of eqv?
to the result of any other call to the same
procedure.
eol-style-symbol should be a symbol whose name is one of
lf
, cr
, crlf
, nel
, crnel
, ls
,
and none
.
The form evaluates to the corresponding symbol. If the name of
eol-style-symbol is not one of these symbols, the effect and
result are implementation-dependent; in particular, the result may be an
eol-style symbol acceptable as an eol-style argument to
make-transcoder
. Otherwise, an exception is raised.
All eol-style symbols except none
describe a specific
line-ending encoding:
lf
linefeed
cr
carriage return
crlf
carriage return, linefeed
nel
next line
crnel
carriage return, next line
ls
line separator
For a textual port with a transcoder, and whose transcoder has an
eol-style symbol none
, no conversion occurs. For a textual input
port, any eol-style symbol other than none
means that all of the
above line-ending encodings are recognized and are translated into a
single linefeed. For a textual output port, none
and lf
are equivalent. Linefeed characters are encoded according to the
specified eol-style symbol, and all other characters that participate in
possible line endings are encoded as is.
Note: Only the name of eol-style-symbol is significant.
Returns the default end-of-line style of the underlying platform, e.g.,
lf
on Unix and crlf
on Windows.
This condition type could be defined by
(define-condition-type &i/o-decoding &i/o-port make-i/o-decoding-error i/o-decoding-error?)
An exception with this type is raised when one of the operations for textual input from a port encounters a sequence of bytes that cannot be translated into a character or string by the input direction of the port’s transcoder.
When such an exception is raised, the port’s position is past the invalid encoding.
This condition type could be defined by
(define-condition-type &i/o-encoding &i/o-port make-i/o-encoding-error i/o-encoding-error? (char i/o-encoding-error-char))
An exception with this type is raised when one of the operations for textual output to a port encounters a character that cannot be translated into bytes by the output direction of the port’s transcoder. char is the character that could not be encoded.
error-handling-mode-symbol should be a symbol whose name is one of
ignore
, raise
, and replace
. The form evaluates to
the corresponding symbol. If error-handling-mode-symbol is not
one of these identifiers, effect and result are
implementation-dependent: The result may be an error-handling-mode
symbol acceptable as a handling-mode argument to
make-transcoder
. If it is not acceptable as a
handling-mode argument to make-transcoder
, an exception is
raised.
Note: Only the name of error-handling-mode-symbol is significant.
The error-handling mode of a transcoder specifies the behavior of textual I/O operations in the presence of encoding or decoding errors.
If a textual input operation encounters an invalid or incomplete
character encoding, and the error-handling mode is ignore
, an
appropriate number of bytes of the invalid encoding are ignored and
decoding continues with the following bytes.
If the error-handling mode is replace
, the replacement
character U+FFFD is injected into the data stream, an appropriate
number of bytes are ignored, and decoding
continues with the following bytes.
If the error-handling mode is raise
, an exception with condition
type &i/o-decoding
is raised.
If a textual output operation encounters a character it cannot encode,
and the error-handling mode is ignore
, the character is ignored
and encoding continues with the next character. If the error-handling
mode is replace
, a codec-specific replacement character is
emitted by the transcoder, and encoding continues with the next
character. The replacement character is U+FFFD for transcoders whose
codec is one of the Unicode encodings, but is the ?
character
for the Latin-1 encoding. If the error-handling mode is raise
,
an exception with condition type &i/o-encoding
is raised.
codec must be a codec; eol-style, if present, an eol-style symbol; and handling-mode, if present, an error-handling-mode symbol.
eol-style may be omitted, in which case it defaults to the native
end-of-line style of the underlying platform. handling-mode may
be omitted, in which case it defaults to replace
. The result is
a transcoder with the behavior specified by its arguments.
Returns an implementation-dependent transcoder that represents a possibly locale-dependent “native” transcoding.
These are accessors for transcoder objects; when applied to a
transcoder returned by make-transcoder
, they return the
codec, eol-style, and handling-mode arguments,
respectively.
Returns the string that results from transcoding the bytevector according to the input direction of the transcoder.
Returns the bytevector that results from transcoding the string according to the output direction of the transcoder.