use Text::Soundex; # Original algorithm. $code = soundex($name); # Get the soundex code for a name. @codes = soundex(@names); # Get the list of codes for a list of names. # American Soundex variant (NARA) - Used for US census data. $code = soundex_nara($name); # Get the soundex code for a name. @codes = soundex_nara(@names); # Get the list of codes for a list of names. # Redefine the value that soundex() will return if the input string # contains no identifiable sounds within it. $Text::Soundex::nocode = 'Z000';
This module implements the original soundex algorithm developed by Robert Russell and Margaret Odell, patented in 1918 and 1922, as well as a variation called ``American Soundex'' used for US census data, and current maintained by the National Archives and Records Administration (NARA).
The soundex algorithm may be recognized from Donald Knuth's The Art of Computer Programming. The algorithm described by Knuth is the NARA algorithm.
The value returned for strings which have no soundex encoding is defined using $Text::Soundex::nocode. The default value is "undef", however values such as 'Z000' are commonly used alternatives.
For backward compatibility with older versions of this module the $Text::Soundex::nocode is exported into the caller's namespace as $soundex_nocode.
In scalar context, "soundex()" returns the soundex code of its first argument. In list context, a list is returned in which each element is the soundex code for the corresponding argument passed to "soundex()". For example, the following code assigns @codes the value "('M200', 'S320')":
@codes = soundex qw(Mike Stok);
To use "Text::Soundex" to generate codes that can be used to search one of the publically available US Censuses, a variant of the soundex algorithm must be used:
use Text::Soundex; $code = soundex_nara($name);
An example of where these algorithm differ follows:
use Text::Soundex; print soundex("Ashcraft"), "\n"; # prints: A226 print soundex_nara("Ashcraft"), "\n"; # prints: A261
Euler, Ellery -> E460 Gauss, Ghosh -> G200 Hilbert, Heilbronn -> H416 Knuth, Kant -> K530 Lloyd, Ladd -> L300 Lukasiewicz, Lissajous -> L222
$code = soundex 'Knuth'; # $code contains 'K530' @list = soundex qw(Lloyd Gauss); # @list contains 'L300', 'G200'
use Text::Soundex; use Text::Unidecode; print soundex(unidecode("Fran\xE7ais")), "\n"; # Prints "F652\n"
Or use the convenient wrapper routine:
use Text::Soundex 'soundex_unicode'; print soundex_unicode("Fran\xE7ais"), "\n"; # Prints "F652\n"
Since the soundex algorithm maps a large space (strings of arbitrary length) onto a small space (single letter plus 3 digits) no inference can be made about the similarity of two strings which end up with the same soundex code. For example, both "Hilbert" and "Heilbronn" end up with a soundex code of "H416".
Version 2 of this module was a re-write by Mark Mielke ("[email protected]") to improve the speed of the subroutines. The XS version of the soundex() subroutine was introduced in 2.00.
Version 1 of this module was written by Mike Stok ("[email protected]") and was included into the Perl core library set.
Dave Carlsen ("[email protected]") made the request for the NARA algorithm to be included. The NARA soundex page can be viewed at: "http://www.nara.gov/genealogy/soundex/soundex.html"
Ian Phillips ("[email protected]") and Rich Pinder ("[email protected]") supplied ideas and spotted mistakes for v1.x.