Ubuntu Manpage: Text::Soundex - Implementation of the soundex algorithm.

name
synopsis
description
examples
limitations
copyright and license
maintainer
history

Provided by: libtext-soundex-perl_3.05-2build4_amd64

NAME

       Text::Soundex - Implementation of the soundex algorithm.

SYNOPSIS

         use Text::Soundex;

         # Original algorithm.
         $code = soundex($name);    # Get the soundex code for a name.
         @codes = soundex(@names);  # Get the list of codes for a list of names.

         # American Soundex variant (NARA) - Used for US census data.
         $code = soundex_nara($name);    # Get the soundex code for a name.
         @codes = soundex_nara(@names);  # Get the list of codes for a list of names.

         # Redefine the value that soundex() will return if the input string
         # contains no identifiable sounds within it.
         $Text::Soundex::nocode = 'Z000';

DESCRIPTION

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for
names with the same pronunciation to be encoded to the same representation so that they can be matched
despite minor differences in spelling. Soundex is the most widely known of all phonetic algorithms and is
often used (incorrectly) as a synonym for "phonetic algorithm". Improvements to Soundex are the basis for
many modern phonetic algorithms. (Wikipedia, 2007)

This module implements the original soundex algorithm developed by Robert Russell and Margaret Odell,
patented in 1918 and 1922, as well as a variation called "American Soundex" used for US census data, and
current maintained by the National Archives and Records Administration (NARA).

The soundex algorithm may be recognized from Donald Knuth's The Art of Computer Programming. The
algorithm described by Knuth is the NARA algorithm.

The value returned for strings which have no soundex encoding is defined using $Text::Soundex::nocode.
The default value is "undef", however values such as 'Z000' are commonly used alternatives.

For backward compatibility with older versions of this module the $Text::Soundex::nocode is exported into
the caller's namespace as $soundex_nocode.

In scalar context, soundex() returns the soundex code of its first argument. In list context, a list is
returned in which each element is the soundex code for the corresponding argument passed to soundex().
For example, the following code assigns @codes the value "('M200', 'S320')":

@codes = soundex qw(Mike Stok);

To use "Text::Soundex" to generate codes that can be used to search one of the publically available US
Censuses, a variant of the soundex algorithm must be used:

use Text::Soundex;
$code = soundex_nara($name);

An example of where these algorithm differ follows:

use Text::Soundex;
print soundex("Ashcraft"), "\n"; # prints: A226
print soundex_nara("Ashcraft"), "\n"; # prints: A261

EXAMPLES

       Donald Knuth's examples of names and the soundex codes they map to are listed below:

         Euler, Ellery -> E460
         Gauss, Ghosh -> G200
         Hilbert, Heilbronn -> H416
         Knuth, Kant -> K530
         Lloyd, Ladd -> L300
         Lukasiewicz, Lissajous -> L222

       so:

         $code = soundex 'Knuth';         # $code contains 'K530'
         @list = soundex qw(Lloyd Gauss); # @list contains 'L300', 'G200'

LIMITATIONS

       As the soundex algorithm was originally used a long time ago in the US it considers only the English
       alphabet and pronunciation. In particular, non-ASCII characters will be ignored. The recommended method
       of dealing with characters that have accents, or other unicode characters, is to use the Text::Unidecode
       module available from CPAN. Either use the module explicitly:

           use Text::Soundex;
           use Text::Unidecode;

           print soundex(unidecode("Fran\xE7ais")), "\n"; # Prints "F652\n"

       Or use the convenient wrapper routine:

           use Text::Soundex 'soundex_unicode';

           print soundex_unicode("Fran\xE7ais"), "\n";    # Prints "F652\n"

       Since the soundex algorithm maps a large space (strings of arbitrary length) onto a small space (single
       letter plus 3 digits) no inference can be made about the similarity of two strings which end up with the
       same soundex code.  For example, both "Hilbert" and "Heilbronn" end up with a soundex code of "H416".

COPYRIGHT AND LICENSE

       This software is copyright (c) 1998-2003 by Mark Mielke.

       This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5
       programming language system itself.

MAINTAINER

       This module is currently maintain by Mark Mielke ("mark@mielke.cc").

HISTORY

       Version 3 is a significant update to provide support for versions of Perl later than Perl 5.004.
       Specifically, the XS version of the soundex() subroutine understands strings that are encoded using UTF-8
       (unicode strings).

       Version 2 of this module was a re-write by Mark Mielke ("mark@mielke.cc") to improve the speed of the
       subroutines. The XS version of the soundex() subroutine was introduced in 2.00.

       Version 1 of this module was written by Mike Stok ("mike@stok.co.uk") and was included into the Perl core
       library set.

       Dave Carlsen ("dcarlsen@csranet.com") made the request for the NARA algorithm to be included. The NARA
       soundex page can be viewed at: "http://www.archives.gov/research/census/soundex.html"

       Ian Phillips ("ian@pipex.net") and Rich Pinder ("rpinder@hsc.usc.edu") supplied ideas and spotted
       mistakes for v1.x.