Provided by: metaphlan2_2.9.22-1_all bug

NAME

       metaphlan2_strainer   -   METAgenomic  PHyLogenetic  ANalysis  for  metagenomic  taxonomic
       profiling (strainer)

SYNOPSIS

       metaphlan2_strainer.py [-h] --ifn_samples IFN_SAMPLES [IFN_SAMPLES ...]  --mpa_pkl MPA_PKL
       --output_dir    OUTPUT_DIR   [--ifn_markers   IFN_MARKERS]   [--nprocs_main   NPROCS_MAIN]
       [--nprocs_load_samples  NPROCS_LOAD_SAMPLES]   [--nprocs_align_clean   NPROCS_ALIGN_CLEAN]
       [--nprocs_raxml   NPROCS_RAXML]   [--bootstrap_raxml  BOOTSTRAP_RAXML]  [--ifn_ref_genomes
       IFN_REF_GENOMES [IFN_REF_GENOMES ...]]  [--N_in_marker N_IN_MARKER] [--marker_strip_length
       MARKER_STRIP_LENGTH]      [--marker_in_clade      MARKER_IN_CLADE]      [--sample_in_clade
       SAMPLE_IN_CLADE]     [--sample_in_marker     SAMPLE_IN_MARKER]      [--gap_in_trailing_col
       GAP_IN_TRAILING_COL]           [--gap_trailing_col_limit           GAP_TRAILING_COL_LIMIT]
       [--gap_in_internal_col  GAP_IN_INTERNAL_COL]  [--gap_in_sample   GAP_IN_SAMPLE]   [--N_col
       N_COL]  [--N_count  N_COUNT]  [--long_gap_length  LONG_GAP_LENGTH]  [--long_gap_percentage
       LONG_GAP_PERCENTAGE] [--p_value P_VALUE] [--clades CLADES [CLADES ...]]  [--marker_list_fn
       MARKER_LIST_FN]       [--print_clades_only]      [--alignment_program      {muscle,mafft}]
       [--relaxed_parameters]          [--relaxed_parameters2]           [--keep_alignment_files]
       [--keep_full_alignment_files] [--save_sample2fullfreq] [--use_threads]

DESCRIPTION

       Metaphlan2_strainer  is  a computational tool for tracking individual strains across large
       set of samples. The input of metaphlan2_strainer is a set of metagenomic samples  and  the
       output is a set of phylogenetic.  For each sample, metaphlan2_strainer extracts the strain
       of a specific species by merging and concatenating all reads mapped against  that  species
       markers in the MetaPhlAn2 database.

OPTIONS

   optional arguments
       -h, --help
              show this help message and exit

       --ifn_samples IFN_SAMPLES [IFN_SAMPLES ...]
              The list of sample files (space separated).The wildcard can also be used.

       --mpa_pkl MPA_PKL
              The database of metaphlan3.py.

       --output_dir OUTPUT_DIR
              The output directory.

       --ifn_markers IFN_MARKERS
              The marker file in fasta format.

       --nprocs_main NPROCS_MAIN
              The number of processors are used for the main threads. Default 1.

       --nprocs_load_samples NPROCS_LOAD_SAMPLES
              The number of processors are used for loading samples.  Default nprocs_main.

       --nprocs_align_clean NPROCS_ALIGN_CLEAN
              The  number  of  processors  are  used  for  aligning and cleaning markers. Default
              nprocs_main.

       --nprocs_raxml NPROCS_RAXML
              The number of processors are used for running raxml.  Default nprocs_main.

       --bootstrap_raxml BOOTSTRAP_RAXML
              The number of runs for bootstraping when building the tree. Default 0.

       --ifn_ref_genomes IFN_REF_GENOMES [IFN_REF_GENOMES ...]
              The reference genome file names. They are separated by spaces.

       --N_in_marker N_IN_MARKER
              The consensus markers with the rate of N nucleotides greater  than  this  threshold
              are removed. Default 0.2.

       --marker_strip_length MARKER_STRIP_LENGTH
              The  number  of  nucleotides  will  be  deleted  from each of two ends of a marker.
              Default 50.

       --marker_in_clade MARKER_IN_CLADE
              In each sample, the clades  with  the  rate  of  present  markers  less  than  this
              threshold are removed. Default 0.8.

       --sample_in_clade SAMPLE_IN_CLADE
              Only clades present in at least sample_in_clade samples are kept. Default 2.

       --sample_in_marker SAMPLE_IN_MARKER
              If  the percentage of samples that a marker present in is less than this threshold,
              that marker is removed.  Default 0.8.

       --gap_in_trailing_col GAP_IN_TRAILING_COL
              If the number of the trailing  nucleotide  columns  in  aligned  markers  with  the
              percentage    of    gaps    greater   than   gap_in_trailing_col   is   less   than
              gap_trailing_col_limit, these columns will be removed.  Default 0.2.

       --gap_trailing_col_limit GAP_TRAILING_COL_LIMIT
              If the number of the trailing  nucleotide  columns  in  aligned  markers  with  the
              percentage    of    gaps    greater   than   gap_in_trailing_col   is   less   than
              gap_trailing_col_limit, these columns will be removed.  Default 101.

       --gap_in_internal_col GAP_IN_INTERNAL_COL
              The internal nucleotide columns in aligned markers  with  the  percentage  of  gaps
              greater than gap_in_internal_col will be removed. Default 0.3.

       --gap_in_sample GAP_IN_SAMPLE
              The  samples with full sequences from all markers and having the percentage of gaps
              greater than this threshold will be removed. Default 0.2.

       --N_col N_COL
              In aligned markers, if the percentage of nucleotide columns  containing  more  than
              N_count Ns less than this threshold, these columns will be removed. Default 0.8.

       --N_count N_COUNT
              In  aligned  markers,  if the percentage of nucleotide columns containing more than
              N_count Ns less than N_col threshold, these columns will be removed.  Default 0.

       --long_gap_length LONG_GAP_LENGTH
              In each concatenated sequence of a sample, sequential gap positions is a gap group.
              A  gap  group  with  length greater than this threshold is considered as a long gap
              group. If the ratio between the number of unique positions in all long  gap  groups
              and  the  concatenated  sequence  length  is  less  than long_gap_percentage, these
              positions will be removed from all concatenated sequences. Default 2.

       --long_gap_percentage LONG_GAP_PERCENTAGE
              Combining this threshold with long_gap_length to removed long gaps. Default 0.8.

       --p_value P_VALUE
              The p_value to reject a non-polymorphic site.Default 0.05.

       --clades CLADES [CLADES ...]
              The clades  (space  separated)  for  which  the  script  will  compute  the  marker
              alignments in fasta format and the phylogenetic trees. If a file name is specified,
              the clade list in that file where each clade name is on a line will be read.Default
              "automatically identify all clades".

       --marker_list_fn MARKER_LIST_FN
              The  file name containing the list of considered markers. The other markers will be
              discarded. Default "None".

       --print_clades_only
              Only print the potential clades and stop without building any tree. This option  is
              useful  when  you want to check quickly all possible clades and rerun only for some
              specific ones. Default "False".

       --alignment_program {muscle,mafft}
              The alignment program. Default "muscle".

       --relaxed_parameters
              Set marker_in_clade=0.5, sample_in_marker=0.5, N_in_marker=0.5,  gap_in_sample=0.5.
              Default "False".

       --relaxed_parameters2
              Set  marker_in_clade=0.2, sample_in_marker=0.2, N_in_marker=0.8, gap_in_sample=0.8.
              Default "False".

       --keep_alignment_files
              Keep the alignment files of all markers before cleaning step.

       --keep_full_alignment_files
              Keep the alignment files of all markers before truncating the starting  and  ending
              parts,   and   cleaning   step.   This   is  equivalent  to  --keep_alignment_files
              --marker_strip_length 0

       --save_sample2fullfreq
              Save sample2fullfreq to a msgpack file sample2fullfreq.msgpack.

       --use_threads
              Use multithreading. Default "Use multiprocessing".

AUTHOR

       This manpage was written by Andreas Tille for the Debian distribution and can be used  for
       any other usage of the program.