Module utf8
Basic UTF8 character counting support for Luakit
This module provides a partial implementation of the Lua 5.3 UTF-8 library.
Functions
utf8.len (s, begin, end)
Return the number of characters (not bytes) of a UTF-8-encoded string.
If the optional parameters begin
and/or end
are given, then characters within s
will only be counted if they begin between positions begin
and end
(both inclusive).
An error is raised if s
(or the characters that start in the slice from begin
to end
) contains invalid UTF8 characters, of if begin
or end
point to byte indices not in s
.
Parameters
-
sType: stringThe string whose length is to be returned.
-
beginType: integerOptionalDefault: 1Only consider
s
from (1-based byte) indexbegin
onwards. If negative, count fromend
ofs
(with -1 being the last byte). -
endType: integerOptionalDefault: -1Only consider
s
up to and including (1-based byte) indexend
. If negative, count fromend
ofs
(with -1 being the last byte).
Return Values
-
integerThe length (in UTF8 characters) of
s
.
utf8.offset (string, woffset, base)
Convert an offset (in UTF8 characters) to a byte offset.
If optional parameter base
is given and positive, count characters starting from (byte) index base
.
An error is raised if base is smaller than 1
or larger than the (byte) length of string
, or if base
points to a byte inside string
that is not the starting byte of a UTF8 encoding.
Examples
utf8.offset("abc",2,2)
would return3
utf8.offset("abc",-3)
would return1
Parameters
-
stringType: stringThe string in which offsets should be converted.
-
woffsetType: integerThe offset (1-based, in UTF8 characters) which should be converted.
-
baseType: integerOptionalA (1-based byte) index in
string
. Defaults to 1 ifwoffset
is positive, and to the (byte) length ofstring
ifwoffset
is negative. See the description above.
Return Values
-
integerThe (1-based) byte offset of the
woffset
-th UTF8 character instring
.
Attribution
Copyright
- 2017 Dennis Hofheinz