UTF-8 processing library

utf8proc is a library for processing UTF-8 encoded Unicode strings. Some features are Unicode normalization, stripping of default ignorable characters, case folding and detection of grapheme cluster boundaries. A special character mapping is available, which converts for example the characters "Hyphen" (U+2010), "Minus" (U+2212) and "Hyphen-Minus" (U+002D, ASCII Minus) all into the ASCII minus sign, to make them equal for comparisons.
The library can be used in C programs, but most of the functionality is also available as a ruby library. For PostgreSQL there is an extension, providing a function for preparing strings in case insensitive indicies.
The currently supported Unicode version is 5.0.0.

Intended Audience

Software developers

Download

This software is now maintained by the Public Software GroupPublic Software Group.
Downloads are available at the new project page:

Project page of Project page of "utf8proc"