UTF-8 processing library
utf8proc is a library for processing UTF-8 encoded Unicode strings. Some features are Unicode normalization, stripping of default ignorable characters, case folding and detection of grapheme cluster boundaries. A special character mapping is available, which converts for example the characters "Hyphen" (U+2010), "Minus" (U+2212) and "Hyphen-Minus" (U+002D, ASCII Minus) all into the ASCII minus sign, to make them equal for comparisons.The library can be used in C programs, but most of the functionality is also available as a ruby library. For PostgreSQL there is an extension, providing a function for preparing strings in case insensitive indicies.
The currently supported Unicode version is 5.0.0.
Intended Audience
Software developersDownload
This software is now maintained by the
Public Software Group.Downloads are available at the new project page:
Project page of "utf8proc"


