UTF-8 processing libraryutf8proc is a library for processing UTF-8 encoded Unicode strings. Some features are Unicode normalization, stripping of default ignorable characters, case folding and detection of grapheme cluster boundaries. A special character mapping is available, which converts for example the characters "Hyphen" (U+2010), "Minus" (U+2212) and "Hyphen-Minus" (U+002D, ASCII Minus) all into the ASCII minus sign, to make them equal for comparisons.
The library can be used in C programs, but most of the functionality is also available as a ruby library. For PostgreSQL there is an extension, providing a function for preparing strings in case insensitive indicies.
The currently supported Unicode version is 5.0.0.
Intended Audience: Software developers
This software is now maintained by the Public Software Group.
Downloads are available at the new project page:
Project page of "utf8proc"