lunar (1) macsyfinder.1.gz

Provided by: macsyfinder_2.0-2_amd64 bug

NAME

       macsyfinder - detection of macromolecular systems in protein datasets

SYNOPSIS

       macsyfinder          [-h]          [--sequence-db          SEQUENCE_DB]         [--db-type
       {unordered_replicon,ordered_replicon,gembase,unordered}]              [--replicon-topology
       {linear,circular}]   [--topology-file   TOPOLOGY_FILE]   [--idx]   [--inter-gene-max-space
       INTER_GENE_MAX_SPACE         INTER_GENE_MAX_SPACE]         [--min-mandatory-genes-required
       MIN_MANDATORY_GENES_REQUIRED      MIN_MANDATORY_GENES_REQUIRED]      [--min-genes-required
       MIN_GENES_REQUIRED   MIN_GENES_REQUIRED]   [--max-nb-genes   MAX_NB_GENES    MAX_NB_GENES]
       [--multi-loci  MULTI_LOCI] [--hmmer HMMER_EXE] [--index-db INDEX_DB_EXE] [--e-value-search
       E_VALUE_RES] [--i-evalue-select I_EVALUE_SEL]  [--coverage-profile  COVERAGE_PROFILE]  [-d
       DEF_DIR]   [-o   OUT_DIR]   [-r  RES_SEARCH_DIR]  [--res-search-suffix  RES_SEARCH_SUFFIX]
       [--res-extract-suffix    RES_EXTRACT_SUFFIX]    [-p     PROFILE_DIR]     [--profile-suffix
       PROFILE_SUFFIX]  [-w  WORKER_NB] [-v] [--log LOG_FILE] [--config CFG_FILE] [--previous-run
       PREVIOUS_RUN] systems [systems ...]

DESCRIPTION

       MacSyFinder is a program to model and detect macromolecular systems,  genetic  pathways...
       in  protein  datasets.  In  prokaryotes, these systems have often evolutionarily conserved
       properties: they are made of  conserved  components,  and  are  encoded  in  compact  loci
       (conserved  genetic  architecture).   The  user  models  these systems with MacSyFinder to
       reflect these conserved features, and to allow their efficient detection

OPTIONS

   positional arguments:
       systems
              The systems to detect. This is an obligatory option with no keyword  associated  to
              it.  To  detect  all  the  protein secretion systems and related appendages: set to
              "all" (case insensitive). Otherwise, a single or multiple systems can be specified.
              For example: "T2SS T4P".

   optional arguments:
       -h, --help
              show this help message and exit

   Input dataset options:
       --sequence-db SEQUENCE_DB
              Path to the sequence dataset in fasta format.

       --db-type {unordered_replicon,ordered_replicon,gembase,unordered}
              The   type   of  dataset  to  deal  with.  "unordered_replicon"  corresponds  to  a
              non-assembled genome, "unordered" to a metagenomic dataset,  "ordered_replicon"  to
              an assembled genome, and "gembase" to a set of replicons where sequence identifiers
              follow this convention: ">RepliconName SequenceID".

       --replicon-topology {linear,circular}
              The topology of the replicons (this option is meaningful only  if  the  db_type  is
              'ordered_replicon' or 'gembase'.

       --topology-file TOPOLOGY_FILE
              Topology  file  path. The topology file allows one to specify a topology (linear or
              circular) for each replicon (this option is  meaningful  only  if  the  db_type  is
              'ordered_replicon'  or  'gembase'.  A  topology  file  is  a  tabular file with two
              columns: the 1st is the replicon name, and  the  2nd  the  corresponding  topology:
              "RepliconA linear"

       --idx  Forces  to build the indexes for the sequence dataset even if they were presviously
              computed and present at the dataset location (default = False)

   Systems detection options:
       --inter-gene-max-space INTER_GENE_MAX_SPACE INTER_GENE_MAX_SPACE
              Co-localization criterion: maximum number of components non-matched  by  a  profile
              allowed between two matched components for them to be considered contiguous. Option
              only meaningful for 'ordered' datasets. The first value must match to a system, the
              second  to  a  number  of  components.  This  option can be repeated several times:
              "--inter-gene-max-space T2SS 12 --inter-gene-max-space Flagellum 20"

       --min-mandatory-genes-required MIN_MANDATORY_GENES_REQUIRED MIN_MANDATORY_GENES_REQUIRED
              The minimal number of mandatory genes required for  system  assessment.  The  first
              value must correspond to a system name, the second value to an integer. This option
              can   be   repeated   several   times:   "--minmandatory-genes-required   T2SS   15
              --min-mandatorygenes-required Flagellum 10"

       --min-genes-required MIN_GENES_REQUIRED MIN_GENES_REQUIRED
              The  minimal  number  of  genes  required  for  system  assessment  (includes  both
              'mandatory' and 'accessory' components). The  first  value  must  correspond  to  a
              system  name,  the  second value to an integer. This option can be repeated several
              times: "--min-genesrequired T2SS 15 --min-genes-required Flagellum 10"

       --max-nb-genes MAX_NB_GENES MAX_NB_GENES
              The maximal number of genes required for system assessment. The  first  value  must
              correspond  to  a  system  name, the second value to an integer. This option can be
              repeated several times: "--max-nb-genes T2SS 5 --max-nb-genes Flagellum 10

       --multi-loci MULTI_LOCI
              Allow the storage of multi-loci systems for the specified systems. The systems  are
              specified as a comma separated list (--multi-loci sys1,sys2) default is False

   Options for Hmmer execution and hits filtering:
       --hmmer HMMER_EXE
              Path to the Hmmer program.

       --index-db INDEX_DB_EXE
              The  indexer  to  be  used  for  Hmmer.  The  value  can be either 'makeblastdb' or
              'formatdb' or the path to one of these binary (default = makeblastb)

       --e-value-search E_VALUE_RES
              Maximal e-value for hits to be reported during Hmmer search. (default = 1)

       --i-evalue-select I_EVALUE_SEL
              Maximal independent e-value for Hmmer hits to be  selected  for  system  detection.
              (default = 0.001)

       --coverage-profile COVERAGE_PROFILE
              Minimal  profile  coverage required in the hit alignment to allow the hit selection
              for system detection.  (default = 0.5)

   Path options:
       -d DEF_DIR, --def DEF_DIR
              Path to the systems definition files.

       -o OUT_DIR, --out-dir OUT_DIR
              Path to the directory where to store results. if outdir is specified res-search-dir
              will be ignored.

       -r RES_SEARCH_DIR, --res-search-dir RES_SEARCH_DIR
              Path  to  the  directory  where  to  store  MacSyFinder  search results directories
              (default current working directory).

       --res-search-suffix RES_SEARCH_SUFFIX
              The suffix to give to Hmmer raw output files.

       --res-extract-suffix RES_EXTRACT_SUFFIX
              The suffix to give to filtered hits output files.

       -p PROFILE_DIR, --profile-dir PROFILE_DIR
              Path to the profiles directory.

       --profile-suffix PROFILE_SUFFIX
              The suffix of profile files. For each 'Gene' element, the corresponding profile  is
              searched in the 'profile_dir', in a file which name is based on the Gene name + the
              profile suffix. For instance, if the  Gene  is  named  'gspG'  and  the  suffix  is
              '.hmm3',  then  the profile should be placed at the specified location and be named
              'gspG.hmm3'

   General options:
       -w WORKER_NB, --worker WORKER_NB
              Number of workers to be used by MacSyFinder. In the case  the  user  wants  to  run
              MacSyFinder in a multithread mode. (0 mean all cores will be used, default 1)

       -v, --verbosity
              Increases  the  verbosity  level.  There  are  4  levels: Error messages (default),
              Warning (-v), Info (-vv) and Debug.(-vvv)

       --log LOG_FILE
              Path to the directory where to store the 'macsyfinder.log' log file.

       --config CFG_FILE
              Path to a putative MacSyFinder configuration file to be used.

       --previous-run PREVIOUS_RUN
              Path to a previous MacSyFinder run directory. It  allows  one  to  skip  the  Hmmer
              search  step  on  same dataset, as it uses previous run results and thus parameters
              regarding Hmmer detection. The configuration file from this previous  run  will  be
              used.   (conflict   with   options   --config,   --sequence-db,   --profile-suffix,
              --resextract-suffix, --e-value-res, --db-type, --hmmer)

       For more details, visit the MacSyFinder website and see the MacSyFinder documentation.