Module `utf8`

Basic UTF8 character counting support for Luakit

This module provides a partial implementation of the Lua 5.3 UTF-8 library.

Functions

utf8.len (s, begin, end)

Return the number of characters (not bytes) of a UTF-8-encoded string.

If the optional parameters begin and/or end are given, then characters within s will only be counted if they begin between positions begin and end (both inclusive).

An error is raised if s (or the characters that start in the slice from begin to end) contains invalid UTF8 characters, of if begin or end point to byte indices not in s.

Parameters

s
Type: string

The string whose length is to be returned.
begin
Type: integer

Optional

Default: 1

Only consider s from (1-based byte) index begin onwards. If negative, count from end of s (with -1 being the last byte).
end
Type: integer

Optional

Default: -1

Only consider s up to and including (1-based byte) index end. If negative, count from end of s (with -1 being the last byte).

Return Values

integer
The length (in UTF8 characters) of s.

utf8.offset (string, woffset, base)

Convert an offset (in UTF8 characters) to a byte offset.

If optional parameter base is given and positive, count characters starting from (byte) index base.

An error is raised if base is smaller than 1 or larger than the (byte) length of string, or if base points to a byte inside string that is not the starting byte of a UTF8 encoding.

Examples

utf8.offset("abc",2,2) would return 3
utf8.offset("abc",-3) would return 1

Parameters

string
Type: string

The string in which offsets should be converted.
woffset
Type: integer

The offset (1-based, in UTF8 characters) which should be converted.
base
Type: integer

Optional

A (1-based byte) index in string. Defaults to 1 if woffset is positive, and to the (byte) length of string if woffset is negative. See the description above.

Return Values

integer
The (1-based) byte offset of the woffset-th UTF8 character in string.

Attribution

Copyright

2017 Dennis Hofheinz

Table of Contents

Pages

Modules

Classes

Module `utf8`

Functions

utf8.len (s, begin, end)

Parameters

Return Values

utf8.offset (string, woffset, base)

Examples

Parameters

Return Values

Attribution

Copyright

Table of Contents

Pages

Modules

Classes

Module utf8

Functions

utf8.len (s, begin, end)

Parameters

Return Values

utf8.offset (string, woffset, base)

Examples

Parameters

Return Values

Attribution

Copyright

Module `utf8`