Ubuntu Manpage: metaphlan2_strainer - METAgenomic PHyLogenetic ANalysis for metagenomic taxonomic profiling (strainer)

NAME

       metaphlan2_strainer - METAgenomic PHyLogenetic ANalysis for metagenomic taxonomic profiling (strainer)

SYNOPSIS

       metaphlan2_strainer.py  [-h]  --ifn_samples IFN_SAMPLES [IFN_SAMPLES ...]  --mpa_pkl MPA_PKL --output_dir
       OUTPUT_DIR    [--ifn_markers    IFN_MARKERS]    [--nprocs_main    NPROCS_MAIN]     [--nprocs_load_samples
       NPROCS_LOAD_SAMPLES]     [--nprocs_align_clean    NPROCS_ALIGN_CLEAN]    [--nprocs_raxml    NPROCS_RAXML]
       [--bootstrap_raxml   BOOTSTRAP_RAXML]   [--ifn_ref_genomes   IFN_REF_GENOMES    [IFN_REF_GENOMES    ...]]
       [--N_in_marker     N_IN_MARKER]     [--marker_strip_length     MARKER_STRIP_LENGTH]    [--marker_in_clade
       MARKER_IN_CLADE]    [--sample_in_clade     SAMPLE_IN_CLADE]     [--sample_in_marker     SAMPLE_IN_MARKER]
       [--gap_in_trailing_col     GAP_IN_TRAILING_COL]     [--gap_trailing_col_limit     GAP_TRAILING_COL_LIMIT]
       [--gap_in_internal_col GAP_IN_INTERNAL_COL] [--gap_in_sample GAP_IN_SAMPLE]  [--N_col  N_COL]  [--N_count
       N_COUNT]   [--long_gap_length  LONG_GAP_LENGTH]  [--long_gap_percentage  LONG_GAP_PERCENTAGE]  [--p_value
       P_VALUE]  [--clades  CLADES  [CLADES  ...]]   [--marker_list_fn   MARKER_LIST_FN]   [--print_clades_only]
       [--alignment_program         {muscle,mafft}]        [--relaxed_parameters]        [--relaxed_parameters2]
       [--keep_alignment_files] [--keep_full_alignment_files] [--save_sample2fullfreq] [--use_threads]

DESCRIPTION

       Metaphlan2_strainer is a computational tool for tracking individual strains across large set of  samples.
       The input of metaphlan2_strainer is a set of metagenomic samples and the output is a set of phylogenetic.
       For  each  sample,  metaphlan2_strainer  extracts  the  strain  of  a  specific  species  by  merging and
       concatenating all reads mapped against that species markers in the MetaPhlAn2 database.

OPTIONS

optional arguments
-h, --help
show this help message and exit

--ifn_samples IFN_SAMPLES [IFN_SAMPLES ...]
The list of sample files (space separated).The wildcard can also be used.

--mpa_pkl MPA_PKL
The database of metaphlan3.py.

--output_dir OUTPUT_DIR
The output directory.

--ifn_markers IFN_MARKERS
The marker file in fasta format.

--nprocs_main NPROCS_MAIN
The number of processors are used for the main threads. Default 1.

--nprocs_load_samples NPROCS_LOAD_SAMPLES
The number of processors are used for loading samples. Default nprocs_main.

--nprocs_align_clean NPROCS_ALIGN_CLEAN
The number of processors are used for aligning and cleaning markers. Default nprocs_main.

--nprocs_raxml NPROCS_RAXML
The number of processors are used for running raxml. Default nprocs_main.

--bootstrap_raxml BOOTSTRAP_RAXML
The number of runs for bootstraping when building the tree. Default 0.

--ifn_ref_genomes IFN_REF_GENOMES [IFN_REF_GENOMES ...]
The reference genome file names. They are separated by spaces.

--N_in_marker N_IN_MARKER
The consensus markers with the rate of N nucleotides greater than this threshold are removed.
Default 0.2.

--marker_strip_length MARKER_STRIP_LENGTH
The number of nucleotides will be deleted from each of two ends of a marker. Default 50.

--marker_in_clade MARKER_IN_CLADE
In each sample, the clades with the rate of present markers less than this threshold are removed.
Default 0.8.

--sample_in_clade SAMPLE_IN_CLADE
Only clades present in at least sample_in_clade samples are kept. Default 2.

--sample_in_marker SAMPLE_IN_MARKER
If the percentage of samples that a marker present in is less than this threshold, that marker is
removed. Default 0.8.

--gap_in_trailing_col GAP_IN_TRAILING_COL
If the number of the trailing nucleotide columns in aligned markers with the percentage of gaps
greater than gap_in_trailing_col is less than gap_trailing_col_limit, these columns will be
removed. Default 0.2.

--gap_trailing_col_limit GAP_TRAILING_COL_LIMIT
If the number of the trailing nucleotide columns in aligned markers with the percentage of gaps
greater than gap_in_trailing_col is less than gap_trailing_col_limit, these columns will be
removed. Default 101.

--gap_in_internal_col GAP_IN_INTERNAL_COL
The internal nucleotide columns in aligned markers with the percentage of gaps greater than
gap_in_internal_col will be removed. Default 0.3.

--gap_in_sample GAP_IN_SAMPLE
The samples with full sequences from all markers and having the percentage of gaps greater than
this threshold will be removed. Default 0.2.

--N_col N_COL
In aligned markers, if the percentage of nucleotide columns containing more than N_count Ns less
than this threshold, these columns will be removed. Default 0.8.

--N_count N_COUNT
In aligned markers, if the percentage of nucleotide columns containing more than N_count Ns less
than N_col threshold, these columns will be removed. Default 0.

--long_gap_length LONG_GAP_LENGTH
In each concatenated sequence of a sample, sequential gap positions is a gap group. A gap group
with length greater than this threshold is considered as a long gap group. If the ratio between
the number of unique positions in all long gap groups and the concatenated sequence length is less
than long_gap_percentage, these positions will be removed from all concatenated sequences. Default
2.

--long_gap_percentage LONG_GAP_PERCENTAGE
Combining this threshold with long_gap_length to removed long gaps. Default 0.8.

--p_value P_VALUE
The p_value to reject a non-polymorphic site.Default 0.05.

--clades CLADES [CLADES ...]
The clades (space separated) for which the script will compute the marker alignments in fasta
format and the phylogenetic trees. If a file name is specified, the clade list in that file where
each clade name is on a line will be read.Default "automatically identify all clades".

--marker_list_fn MARKER_LIST_FN
The file name containing the list of considered markers. The other markers will be discarded.
Default "None".

--print_clades_only
Only print the potential clades and stop without building any tree. This option is useful when you
want to check quickly all possible clades and rerun only for some specific ones. Default "False".

--alignment_program {muscle,mafft}
The alignment program. Default "muscle".

--relaxed_parameters
Set marker_in_clade=0.5, sample_in_marker=0.5, N_in_marker=0.5, gap_in_sample=0.5. Default
"False".

--relaxed_parameters2
Set marker_in_clade=0.2, sample_in_marker=0.2, N_in_marker=0.8, gap_in_sample=0.8. Default
"False".

--keep_alignment_files
Keep the alignment files of all markers before cleaning step.

--keep_full_alignment_files
Keep the alignment files of all markers before truncating the starting and ending parts, and
cleaning step. This is equivalent to --keep_alignment_files --marker_strip_length 0

--save_sample2fullfreq
Save sample2fullfreq to a msgpack file sample2fullfreq.msgpack.

--use_threads
Use multithreading. Default "Use multiprocessing".

AUTHOR

       This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage
       of the program.

metaphlan2_strainer 2.5.0                           July 2016                             METAPHLAN2_STRAINER(1)