Ubuntu Manpage: estcmd - command line interface of the core API

Provided by: hyperestraier_1.4.13-12ubuntu1_amd64

NAME

       estcmd - command line interface of the core API

SYNOPSIS

       estcmd create [-tr] [-apn|-acc] [-xs|-xl|-xh|-xh2|-xh3] [-sv|-si|-sa] [-attr name type] db

       estcmd put [-tr] [-cl] [-ws] [-apn|-acc] [-xs|-xl|-xh||-xh2|-xh3] [-sv|-si|-sa] db [file]

       estcmd out [-cl] [-pc enc] db expr

       estcmd edit [-pc enc] db expr name [value]

       estcmd get [-nl|-nb] [-pidx path] [-pc enc] db expr [attr]

       estcmd list [-nl|-nb] [-lp] db

       estcmd uriid [-nl|-nb] [-pidx path] [-pc enc] db expr

       estcmd meta db [name [value]]

       estcmd inform [-nl|-nb] db

       estcmd optimize [-onp] [-ond] db

       estcmd merge [-cl] db target

       estcmd repair [-rst|-rsh] db

       estcmd  search  [-nl|-nb]  [-pidx  path] [-ic enc] [-vu|-va|-vf|-vs|-vh|-vx|-dd] [-sn wnum
       hnum anum] [-kn num] [-um] [-ec rn] [-gs|-gf|-ga] [-cd] [-ni]  [-sf|-sfr|-sfu|-sfi]  [-hs]
       [-attr expr] [-ord expr] [-max num] [-sk num] [-aux num] [-dis name] [-sim id] db [phrase]

       estcmd  gather  [-tr]  [-cl] [-ws] [-no] [-fe|-ft|-fh|-fm] [-fx sufs cmd] [-fz] [-fo] [-rm
       sufs] [-ic enc] [-il lang] [-bc] [-lt num] [-lf num] [-pc enc] [-px name] [-aa name value]
       [-apn|-acc]  [-xs|-xl|-xh|-xh2|-xh3] [-sv|-si|-sa] [-ss name] [-sd] [-cm] [-cs num] [-ncm]
       [-kn num] [-um] db [file|dir]

       estcmd purge [-cl] [-no] [-fc] [-pc enc] [-attr expr] db [prefix]

       estcmd extkeys [-no] [-fc] [-dfdb file] [-ncm] [-ni]  [-kn  num]  [-um]  [-attr  expr]  db
       [prefix]

       estcmd words [-nl|-nb] [-dfdb file] [-kw|-kt] db

       estcmd draft [-ft|-fh|-fm] [-ic enc] [-il lang] [-bc] [-lt num] [-kn num] [-um] [file]

       estcmd break [-ic enc] [-il lang] [-apn|-acc] [-wt] [file]

       estcmd iconv [-ic enc] [-il lang] [-oc enc] [file]

       estcmd regex [-inv] [-repl str] expr [file]

       estcmd scandir [-tf|-td] [-pa|-pu] [dir]

       estcmd  multi  [-db db] [-nl|-nb] [-ic enc] [-gs|-gf|-ga] [-cd] [-ni] [-sf|-sfr|-sfu|-sfi]
       [-hs] [-hu] [-attr expr] [-ord expr] [-max num] [-sk num] [-aux num] [-dis name] [phrase]

       estcmd randput [-ren|-rla|-reu|-ror|-rjp|-rch] [-cs num] db dnum

       estcmd wicked db dnum

       estcmd regression db

       estcmd version

DESCRIPTION

       estcmd is an aggregation of sub commands.  The name of a sub command is specified  by  the
       first  argument.   Other arguments are parsed according to each sub command.  The argument
       db specifies the path of an index.

       estcmd create [-tr] [-apn|-acc] [-xs|-xl|-xh|-xh2|-xh3] [-sv|-si|-sa] [-attr name type] db
              Create an index.
              If -tr is specified, a new index is created regardless if one exists.
              If -apn is specified, N-gram analysis is performed against European text also.
              If -acc is specified, character category analysis is performed  instead  of  N-gram
              analysis.
              If -xs is specified, the index is tuned to register less than 50000 documents.
              If -xl is specified, the index is tuned to register more than 300000 documents.
              If -xh is specified, the index is tuned to register more than 1000000 documents.
              If -xh2 is specified, the index is tuned to register more than 5000000 documents.
              If -xh3 is specified, the index is tuned to register more than 10000000 documents.
              If -sv is specified, scores are stored as void.
              If -si is specified, scores are stored as 32-bit integer.
              If  -sa  is  specified,  scores  are  stored  as-is and marked not to be tuned when
              search.
              -attr specifies an attribute index and its data type.  This option can be specified
              multiple times.

       estcmd put [-tr] [-cl] [-apn|-acc] [-xs|-xl|-xh|-xh2|-xh3] [-sv|-si|-sa] db [file]
              Register a document of document draft to an index.
              file specifies a target file.  If it is omitted, the standard input is read.
              If -tr is specified, a new index is created regardless if one exists.
              If -cl is specified, regions of a overwritten document are cleaned up.
              If -ws is specified, scores are weighted statically with score weighting attribute.
              If -apn is specified, N-gram analysis is performed against European text also.
              If  -acc  is  specified, character category analysis is performed instead of N-gram
              analysis.
              If -xs is specified, the index is tuned to register less than 50000 documents.
              If -xl is specified, the index is tuned to register more than 300000 documents.
              If -xh is specified, the index is tuned to register more than 1000000 documents.
              If -xh2 is specified, the index is tuned to register more than 5000000 documents.
              If -xh3 is specified, the index is tuned to register more than 10000000 documents.
              If -sv is specified, scores are stored as void.
              If -si is specified, scores are stored as 32-bit integer.
              If -sa is specified, scores are stored as-is  and  marked  not  to  be  tuned  when
              search.

       estcmd out [-pc enc] [-cl] db expr
              Remove information of a document from an index.
              expr specifies the ID number, the URI, or the local path of a document.
              If -cl is specified, regions of the document are cleaned up.
              -pc specifies the encoding of file paths.  By default, it is ISO-8859-1.

       estcmd edit [-pc enc] db expr name [value]
              Edit an attribute of a document in an index.
              expr specifies the ID number, the URI, or the local path of a document.
              name specifies the name of an attribute.
              value  specifies  the  value  of the attribute.  If it is omitted, the attribute is
              removed.
              -pc specifies the encoding of the file path and the attribute value.   By  default,
              it is ISO-8859-1.

       estcmd get [-nl|-nb] [-pidx path] [-pc enc] db expr [attr]
              Output document draft of a document in an index.
              expr specifies the ID number, the URI, or the local path of a document.
              If attr is specified, only the value of the attribute is output.
              If -nl is specified, the index is opened without file locking.
              If -nb is specified, file locking is performed without blocking.
              -pidx  specifies the path of a pseudo index.  This option can be specified multiple
              times.
              -pc specifies the encoding of file paths.  By default, it is ISO-8859-1.

       estcmd list [-nl|-nb] [-lp] db
              Output a list of all document in an index.
              If -nl is specified, the index is opened without file locking.
              If -nb is specified, file locking is performed without blocking.
              If -lp is specified, local path equivalent to URL of "file://" is output.

       estcmd uriid [-nl|-nb] [-pidx path] [-pc enc] db expr
              Output the ID number of a document specified by URI.
              expr specifies the URI or the local path of a document.
              If -nl is specified, the index is opened without file locking.
              If -nb is specified, file locking is performed without blocking.
              -pidx specifies the path of a pseudo index.  This option can be specified  multiple
              times.
              -pc specifies the encoding of file paths.  By default, it is ISO-8859-1.

       estcmd meta db [name [value]]
              Handle meta data.
              name  specifies  the name of a piece of meta data.  If it is omitted, a list of all
              names is output.
              value specifies the value of the meta data to be recorded.  If it is  omitted,  the
              current value is output.  If it is an empty string, the meta data is removed.

       estcmd inform [-nl|-nb] db
              Output the number of documents and the number of unique words in an index.
              If -nl is specified, the index is opened without file locking.
              If -nb is specified, file locking is performed without blocking.

       estcmd optimize [-onp] [-ond] db
              Optimize an index and clean up dispensable regions.
              If -onp is specified, it is omitted to clean up dispensable regions.
              If -ond is specified, it is omitted to optimize the database files.

       estcmd merge [-cl] db target
              Merge another index.
              target specifies the path of another index.
              If -cl is specified, regions of overwritten documents are cleaned up.

       estcmd repair [-rst|-rsh] db
              Repair a broken index.
              If -rst is specified, strict consistency check is performed.
              If -rsh is specified, consistency check is omitted.

       estcmd  search  [-nl|-nb]  [-pidx  path] [-ic enc] [-vu|-va|-vf|-vs|-vh|-vx|-dd] [-sn wnum
       hnum anum] [-kn num] [-um] [-ec rn] [-gs|-gf|-ga] [-cd] [-ni]  [-sf|-sfr|-sfu|-sfi]  [-hs]
       [-attr expr] [-ord expr] [-max num] [-sk num] [-aux num] [-dis name] [-sim id] db [phrase]
              Search an index for documents.
              phrase specifies the search phrase.
              If -nl is specified, the index is opened without file locking.
              If -nb is specified, file locking is performed without blocking.
              -pidx  specifies the path of a pseudo index.  This option can be specified multiple
              times.
              -ic specifies the input encoding.  By default, it is UTF-8.
              If -vu is specified, TSV of ID number and URI are output.
              If -va is specified, multipart format including attributes is output.
              If -vf is specified, multipart format including document draft is output.
              If -vs is specified, multipart format including attributes and snippets is output.
              If -vh is specified, human readable format including  attributes  and  snippets  is
              output.
              If -vx is specified, XML including including attributes and snippets is output.
              If -dd is specified, document draft data are dumped and saved into separated files.
              -sn  specifies  the number of whole width of snippet and width of strings picked up
              from the beginning of  the  text  and  width  of  strings  picked  up  around  each
              highlighted word.
              -kn  specifies  the  number  of  keywords  to  be  extracted.   By default, keyword
              extraction is not performed.
              If -um is specified, morphological analyzers are used for keyword extraction.
              -ec specifies lower limit of similarity eclipse.
              If -gs  is  specified,  every  key  of  N-gram  is  checked.   By  default,  it  is
              alternately.
              If -gf is specified, keys of N-gram are checked every three.
              If -ga is specified, keys of N-gram are checked every four.
              If  -cd  is  specified,  whether  documents  match  the search phrase definitely is
              checked.
              If -ni is specified, TF-IDF tuning is omitted.
              If -sf is specified, the phrase is treated as a simplified form.
              If -sfr is specified, the phrase is treated as a rough form.
              If -sfu is specified, the phrase is treated as a union form.
              If -sfi is specified, the phrase is treated as an intersection form.
              If -hs is specified, score information is output as an attribute.
              -attr specifies an attribute  search  condition.   This  option  can  be  specified
              multiple times.
              -ord specifies the order expression.  By default, it is descending by score.
              -max  specifies  the  maximum number of shown documents.  Negative means unlimited.
              By default, it is 10.
              -sk specifies the number of documents to be skipped.  By default, it is 0.
              -aux specifies permission to adopt result of the auxiliary index.   If  it  is  not
              more than 0, the auxiliary index is not used.  By default, it is 32.
              -dis specifies the name of the distinct attribute.
              -sim specifies the ID number of the seed document for similarity search.

       estcmd  gather  [-tr]  [-cl] [-ws] [-no] [-fe|-ft|-fh|-fm] [-fx sufs cmd] [-fz] [-fo] [-rm
       sufs] [-ic enc] [-il lang] [-bc] [-lt num] [-lf num] [-pc enc] [-px name] [-aa name value]
       [-apn|-acc]  [-xs|-xl|-xh|-xh2|-xh3] [-sv|-si|-sa] [-ss name] [-sd] [-cm] [-cs num] [-ncm]
       [-kn num] [-um] db [file|dir]
              Scan the local file system and register documents into an index.
              If the third argument is the name of a file, a list of paths  of  target  documents
              are read from it.  If it is "-", the standard input is specified.
              If  the  third  argument is the name of a directory.  All files under the directory
              are treated as target documents.
              If -tr is specified, a new index is created regardless if one exists.
              If -cl is specified, regions of overwritten documents are cleaned up.
              If -ws is specified, scores are weighted statically with score weighting attribute.
              If -no is specified, operations are printed but not executed actually.
              If -fe is specified, target files are treated as document draft.  By  default,  the
              format is detected by the suffix of each document.
              If -ft is specified, target files are treated as plain text.
              If -fh is specified, target files are treated as HTML.
              If -fm is specified, target files are treated as MIME.
              If  -fx is specified, target files with the specified suffixes are processed by the
              specified outer command.  "*" matches any file.  If the command is leaded by  "T@",
              the  output  of  the command is treated as plain text.  If the command is leaded by
              "H@", the output of the command is treated as HTML.  If the command  is  leaded  by
              "M@", the output of the command is treated as MIME.  Else, the output is treated as
              document draft.  This option can be specified multiple times.
              If -fz is specified, documents which do not corresponding to the condition  of  -fx
              are ignored.
              If -fo is specified, target files are not read.  It is useful for efficient process
              of the outer command.
              If -rm is specified, target files with the specified  suffixes  are  removed.   "*"
              matches any file.  This option can be specified multiple times.
              -ic specifies the input encoding.  By default, it is detected automatically.
              -il specifies the preferred input language.  By default, English is preferred.
              If -bc is specified, binary files are detected and ignored.
              -lt specifies the text size limitation by kilo bytes.  By default, it is 128KB.  If
              it is negative, the size is unlimited.
              -lf specifies the file size limitation by mega bytes.  By default, it is 32MB.   If
              it is negative, the size is unlimited.
              -pc specifies the encoding of file paths.  By default, it is ISO-8859-1.
              -px specifies the name of an attribute read from the list of paths.  As the list of
              paths can be in TSV format, the first field is treated as  the  path  of  a  target
              document,  the  second field and the followers are definitions of attribute values.
              -px specifies the name of each values of the second field and the followers.   This
              option can be specified multiple times.
              -aa  specifies  the name and the value of an additional attribute.  This option can
              be specified multiple times.
              If -apn is specified, N-gram analysis is performed against European text also.
              If -acc is specified, character category analysis is performed  instead  of  N-gram
              analysis.
              If -xs is specified, the index is tuned to register less than 50000 documents.
              If -xl is specified, the index is tuned to register more than 300000 documents.
              If -xh is specified, the index is tuned to register more than 1000000 documents.
              If -xh2 is specified, the index is tuned to register more than 5000000 documents.
              If -xh3 is specified, the index is tuned to register more than 10000000 documents.
              If -sv is specified, scores are stored as void.
              If -si is specified, scores are stored as 32-bit integer.
              If  -sa  is  specified,  scores  are  stored  as-is and marked not to be tuned when
              search.
              -ss specifies the name of an attribute for substitute score.
              If -sd is specified,  the  modification  date  of  each  file  is  recorded  as  an
              attribute.
              If -cm is specified, documents whose modification date has not changed are ignored.
              -cs specifies the size of cache memory by mega bytes.  By default, it is 64MB.
              If -ncm is specified, checking availability of the virtual memory is omitted.
              -kn  specifies  the  number  of  keywords  to  be  extracted.   By default, keyword
              extraction is not performed.
              If -um is specified, morphological analyzers are used for keyword extraction.

       estcmd purge [-cl] [-no] [-fc] [-pc enc] [-attr expr] db [prefix]
              Purge information of documents which do not exist on the file system.
              If prefix is specified, only documents whose URIs are begins with it.   It  can  be
              specified by the local path of a directory.
              If -cl is specified, regions of the deleted documents are cleaned up.
              If -no is specified, operations are printed but not executed actually.
              If -fc is specified, information of all target documents are deleted.
              -pc specifies the encoding of file paths.  By default, it is ISO-8859-1.
              -attr  specifies  an  attribute  search  condition.   This  option can be specified
              multiple times.

       estcmd extkeys [-no] [-fc] [-dfdb file] [-ncm] [-ni]  [-kn  num]  [-um]  [-attr  expr]  db
       [prefix]
              Create a database of keywords extracted from documents.
              If prefix is specified, only documents whose URIs are begins with it.
              If -no is specified, operations are printed but not executed actually.
              If  -fc  is  specified,  all  target  documents  are  processed whichever they have
              existing records or not.
              -dfdb specifies an outher database of document  frequency.   By  default,  document
              frequency is calculated dynamically according to the index.
              If -ncm is specified, checking availability of the virtual memory is omitted.
              If -ni is specified, TF-IDF tuning is omitted.
              -kn specifies the number of keywords to be extracted.  By default, it is 32.
              If -um is specified, morphological analyzers are used for keyword extraction.
              -attr  specifies  an  attribute  search  condition.   This  option can be specified
              multiple times.

       estcmd words [-nl|-nb] [-dfdb file] [-kw|-kt] db
              Output a list of all unique words and each record size which is treated as docuemnt
              frequency.
              If -nl is specified, the index is opened without file locking.
              If -nb is specified, file locking is performed without blocking.
              -dfdb  specifies  an  outer  database  where the result is stored.  By default, the
              result is output to the standard output as TSV.   If  the  outer  database  already
              exists, the value of each record is incremented.
              If -kw is specified, keywords and numbers of corresponding documents are output.
              If -kt is specified, keywords and their related terms are output.

       estcmd draft [-ft|-fh|-fm] [-ic enc] [-il lang] [-bc] [-lt num] [-kn num] [-um] [file]
              For test and debug.

       estcmd break [-ic enc] [-il lang] [-apn|-acc] [-wt] [file]
              For test and debug.

       estcmd iconv [-ic enc] [-il lang] [-oc enc] [file]
              For test and debug.

       estcmd regex [-inv] [-repl str] expr [file]
              For test and debug.

       estcmd scandir [-tf|-td] [-pa|-pu] [dir]
              For test and debug.

       estcmd  multi  [-db db] [-nl|-nb] [-ic enc] [-gs|-gf|-ga] [-cd] [-ni] [-sf|-sfr|-sfu|-sfi]
       [-hs] [-hu] [-attr expr] [-ord expr] [-max num] [-sk num] [-aux num] [-dis name] [phrase]
              For test and debug.

       estcmd randput [-ren|-rla|-reu|-ror|-rjp|-rch] [-cs num] db dnum
              For test and debug.

       estcmd wicked db dnum
              For test and debug.

       estcmd regression db
              For test and debug.

       estcmd version
              Show the version information.

       All sub commands return 0 if the operation is success, else return 1.  As  for  put,  out,
       gather, purge, randput, wicked, and regression, they finish with closing the database when
       they catch the signal 1 (SIGHUP), 2 (SIGINT), 3 (SIGQUIT), 13 (SIGPIPE), or 15 (SIGTERM).

       The data type of attribute indexes specified by -attr option of create sub command  should
       be "seq" for sequential type, "str" for string type, or "num" for number type.

       Each pseudo index specified by -pidx option of search sub command and so on is a directory
       containing files of document draft.  If you search a main index with pseudo indexes,  meta
       search of the main index and pseudo indexes is performed.

       The encoding name specified by -ic option should be such name registered to IETF as UTF-8,
       ISO-8859-1, and so on.  The language name specified by -il option should be  one  of  "en"
       (English), "ja" (Japanese, "zh" (Chinese), "ko" (Korean).

       The  outer  command  specified  by  -fx  option  of gather receives the path of the target
       document by the first argument and the path  for  output  by  the  second  argument.   The
       original  path  of  the  target document is given as the value of the environment variable
       `ESTORIGFILE'.

       Note that similarity search is very slow, by  default.   To  improve  the  performance  of
       similarity search, running "estcmd extkeys" beforehand is strongly recommended.

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO