TextEncode

Description:	Takes text in the local code page and converts it into UTF-8 encoded text where a suitable conversion exists.
Returns:	String
Usage:	Script Only.
Function Groups:	String and Buffer
Related to:
Format:	TextEncode(InputText[, ErrMsgOut, InputCharacterSet, OutputCharacterSet]);
Parameters:

InputText

Required text. The information to be encoded.

ErrMsgOut

Optional return parameter. If provided, this will be set to a text representation of the error.

InputCharacterSet

Optional text. The IANA character set name to use. Defaults to the character set given by the local system code page.

OutputCharacterSet

Optional text. The IANA character set name to use for output. Defaults to "UTF-8".

Comments:

The return value will be the input string encoded as UTF-8, unless there are any sequences of bytes in the input string that cannot be represented as UTF-8, in which case the return value will be invalid.

If converting from ISO-8859-1 to UTF-8, you must make two conversions: first from ISO-8859-1 to UTF-16, then from UTF-16 to UTF-8. The intermediate step may also be required for other conversions to UTF-8.

Examples:

<
{=========================== TextEncodeTest =============================}
{ Example code to demonstrate TextEncode().                                 }
{========================================================================}
TextEncodeTest
[
  Protected Input             { Input string in local code page          };
  Protected UTF8Result        { Input encoded as UTF-8                   };
]

Main [
  If Watch(1);
  [
    Input = "This is a string in Windows-1252 encoding with symbols: £©";

    UTF8Result = TextEncode(Input);
  ]
]

{ End of TextEncodeTest }
>

If the system code page is configured as Windows-1252 by selecting "English (United States)" as per Using a Non-English Character Set, the UTF-8 result would contain "£" encoded as the bytes [0xC2, 0xA3] and "©" as [0xC2, 0xA9].

Example 2

(showing only the call to TextEncode)

 { Convert text encoded as utf-8 to utf-16 big-endian }
utf16result = TextEncode(Input, ErrorMessage, "UTF-8", "UTF-16BE");