Ubuntu Manpage: langident - identifies the language files are written in

Provided by: liblingua-identify-perl_0.56-2_all

NAME

       langident - identifies the language files are written in

SYNOPSIS

         langident [OPTIONS] file1 [file2 ...]

DESCRIPTION

Identifies the language files are written in using Perl module Lingua::Identify.

OPTIONS
-a
Show all results (not just the most probable language).

-c
Show confidence level for most probable language (it will be the first value right after
the most probable language).

-d
Debug (development only).

-E ENCODING
Select an input encoding. Defaults to UTF-8.

# use ISO-8859-1 (latin1)
langident -E ISO-8859-1 file

-e METHODS
Select the method(s) to use. There are three ways of doing this:

# simply using a method
langident -e ngrams3 file

# using several methods (separate them with a comma)
langident -e prefixes3,suffixes3

# using several methods and assign different weights to each of them
langident -e smallwords=2,prefixes=1,ngrams3=1.3

The available methods are the following: smallwords, prefixes1, prefixes2, prefixes3,
prefixes4, suffixes1, suffixes2, suffixes3, suffixes4, ngrams1, ngrams2, ngrams3 and
ngrams4.

-h
Display help message and exit.

-l
List all available languages and exit.

-m NUMBER
Set maximum number of results (languages) to display (shows the N most probable languages,
by descending order of probability).

Overrides the -a switch.

-o LANGUAGES
Only work with specified languages.

# identify between Portuguese and English only
langident -o pt,en *

-p
Also show percentages.

-s SIZE
Maximum size to examine.

-v
Show version and exit.

EXAMPLES

       Use methods ngrams2 and ngrams1, assigning the double of importance to ngrams2 (-e
       switch); output will include the three most probable languages (-m switch) with its
       percentages (-p switch) and also the confidence level (-c switch) of the first result.

         $ langident -e ngrams2=2,ngrams1 -c -p -m 3 README
         README:en 65.7209505939491 7.8971987481393 ga 4.11905889385895 tr 4.08487011400505
         $

TO DO

       •     Add a switch to ignore HTML tags (and maybe other formats too)

AUTHOR

       Jose Alves de Castro, <cog@cpan.org>

COPYRIGHT AND LICENSE

       Copyright 2004 by Jose Alves de Castro

       This library is free software; you can redistribute it and/or modify it under the same
       terms as Perl itself.