Provided by: python-jellyfish-doc_0.5.6-3build2_all bug

NAME

       jellyfish - jellyfish Documentation

OVERVIEW

       jellyfish is a library of functions for approximate and phonetic matching of strings.

       The library provides implementations of the following algorithms:

   Phonetic Encoding
       These  algorithms convert a string to a normalized phonetic encoding, converting a word to
       a representation of its pronunciation.  Each takes a single string  and  returns  a  coded
       representation.

   American Soundex
       soundex(s)
              Calculate the American Soundex of the string s.

       Soundex  is  an algorithm to convert a word (typically a name) to a four digit code in the
       form 'A123' where 'A' is the first letter of the name and  the  digits  represent  similar
       sounds.

       For   example  soundex('Ann')  ==  soundex('Anne')  ==  'A500'  and  soundex('Rupert')  ==
       soundex('Robert') == 'R163'.

       See the Soundex article at Wikipedia for more details.

   Metaphone
       metaphone(s)
              Calculate the metaphone code for the string s.

       The metaphone algorithm was designed as an improvement on Soundex.  It transforms  a  word
       into  a  string consisting of '0BFHJKLMNPRSTWXY' where '0' is pronounced 'th' and 'X' is a
       '[sc]h' sound.

       For example metaphone('Klumpz') == metaphone('Clumps') == 'KLMPS'.

       See the Metaphone article at Wikipedia for more details.

   NYSIIS
       nysiis(s)
              Calculate the NYSIIS code for the string s.

       The NYSIIS algorithm is an algorithm developed by the New York  State  Identification  and
       Intelligence  System.   It  transforms  a  word  into  a  phonetic code.  Like soundex and
       metaphone it is primarily intended for use on  names  (as  they  would  be  pronounced  in
       English).

       For example nysiis('John') == nysiis('Jan') == JAN.

       See the NYSIIS article at Wikipedia for more details.

   Match Rating Approach (codex)
       match_rating_codex(s)
              Calculate the match rating approach value (also called PNI) for the string s.

       The  Match  rating  approach  algorithm is an algorithm for determining whether or not two
       names are pronounced similarly.  The algorithm consists of an encoding  function  (similar
       to soundex or nysiis) which is implemented here as well as match_rating_comparison() which
       does the actual comparison.

       See the Match Rating Approach article at Wikipedia for more details.

   Stemming
   Porter Stemmer
       porter_stem(s)
              Reduce the string s to its stem using the common Porter stemmer.

       Stemming is the process of reducing a word to its root  form,  for  example  'stemmed'  to
       'stem'.

       Martin  Porter's  algorithm  is  a  common algorithm used for stemming that works for many
       purposes.

       See the official homepage for the Porter Stemming Algorithm for more details.

   String Comparison
       These methods are all measures of the difference (aka edit distance) between two strings.

   Levenshtein Distance
       levenshtein_distance(s1, s2)
              Compute the Levenshtein distance between s1 and s2.

       Levenshtein distance represents the number of insertions,  deletions,  and  subsititutions
       required to change one word to another.

       For example: levenshtein_distance('berne', 'born') == 2 representing the transformation of
       the first e to o and the deletion of the second e.

       See the Levenshtein distance article at Wikipedia for more details.

   Damerau-Levenshtein Distance
       damerau_levenshtein_distance(s1, s2)
              Compute the Damerau-Levenshtein distance between s1 and s2.

       A modification of Levenshtein distance, Damerau-Levenshtein distance counts transpositions
       (such as ifhs for fish) as a single edit.

       Where  levenshtein_distance('fish',  'ifsh')  ==  2  as it would require a deletion and an
       insertion, though damerau_levenshtein_distance('fish', 'ifsh') == 1 as this  counts  as  a
       transposition.

       See the Damerau-Levenshtein distance article at Wikipedia for more details.

   Hamming Distance
       hamming_distance(s1, s2)
              Compute the Hamming distance between s1 and s2.

       Hamming  distance  is  the  measure  of  the  number of characters that differ between two
       strings.

       Typically Hamming distance is undefined when strings are of  different  length,  but  this
       implementation     considers    extra    characters    as    differing.     For    example
       hamming_distance('abc', 'abcd') == 1.

       See the Hamming distance article at Wikipedia for more details.

   Jaro Distance
       jaro_distance(s1, s2)
              Compute the Jaro distance between s1 and s2.

       Jaro distance is a string-edit distance that gives a  floating  point  response  in  [0,1]
       where 0 represents two completely dissimilar strings and 1 represents identical strings.

   Jaro-Winkler Distance
       jaro_winkler(s1, s2)
              Compute the Jaro-Winkler distance between s1 and s2.

       Jaro-Winkler is a modification/improvement to Jaro distance, like Jaro it gives a floating
       point response in [0,1] where  0  represents  two  completely  dissimilar  strings  and  1
       represents identical strings.

       See the Jaro-Winkler distance article at Wikipedia for more details.

   Match Rating Approach (comparison)
       match_rating_comparison(s1, s2)
              Compare  s1  and  s2  using  the  match  rating approach algorithm, returns True if
              strings are considered equivalent or False if not.  Can also return None if s1  and
              s2 are not comparable (length differs by more than 3).

       The  Match  rating  approach  algorithm is an algorithm for determining whether or not two
       names are pronounced similarly.  Strings are first encoded using match_rating_codex() then
       compared according to the MRA algorithm.

       See the Match Rating Approach article at Wikipedia for more details.

   Changelog
   0.5.6 - June 23 2016
       • bugfix for metaphone & soundex raising unexpected TypeErrors on Windows (#54)

   0.5.5 - June 21 2016
       • bugfix for metaphone WH case

   0.5.4 - May 13 2016
       • bugfix for C version of damerau_levenshtein thanks to Tyler Sellon

   0.5.3 - March 15 2016
       • style/packaging changes

   0.5.2 - February 3 2016
       • testing fixes for Python 3.5

       • bugfix for Metaphone w/ silent H thanks to Jeremy Carbaugh

   0.5.1 - July 12 2015
       • bugfixes for NYSIIS

       • bugfixes for metaphone

       • bugfix for C version of jaro_winkler

   0.5.0 - April 23 2015
       • consistent unicode behavior, all functions take unicode and reject bytes on Py2 and 3, C
         and Python

       • parametrize tests

       • Windows compiler support

   0.4.0 - March 27 2015
       • tons of new tests

       • documentation

       • split out cjellyfish

       • test all w/ unicode and plenty of fixes to accommodate

       • 100% test coverage

   0.3.4 - February 4 2015
       • fix segfaults and memory leaks via Danrich Parrol

   0.3.3 - November 20 2014
       • fix bugs in damerau and NYSIIS

   0.3.2 - August 11 2014
       • fix for jaro-winkler from David McKean

       • more packaging fixes

   0.3.1 - July 16 2014
       • packaging fix for C/Python alternative

   0.3.0 - July 15 2014
       • python alternatives where C isn't available

   0.2.2 - March 14 2014
       • testing fixes

       • assorted bugfixes in NYSIIS

   0.2.0 - January 26 2012
       • incorporate some speed changes from Peter Scott

       • segfault bugfixes.

   0.1.2 - September 16 2010
       • initial working release

IMPLEMENTATION

       Each algorithm has C and Python implementations.

       On a typical CPython install the C implementation will be used.  The Python  versions  are
       available for PyPy and systems where compiling the CPython extension is not possible.

       • genindex

       • modindex

       • search

AUTHOR

       James Turk

COPYRIGHT

       2017, James Turk