Provided by: bali-phy_3.6.1+dfsg-3_amd64
NAME
alignment-thin - Remove sequences or columns from an alignment.
SYNOPSIS
alignment-thin alignment-file [OPTIONS]
DESCRIPTION
Remove sequences or columns from an alignment.
GENERAL OPTIONS:
-h, –help Print usage information. -V, –verbose Output more log messages on stderr.
SEQUENCE FILTERING OPTIONS:
-p arg, –protect arg Sequences that cannot be removed (comma-separated). -k arg, –keep arg Remove sequences not in comma-separated list arg. -r arg, –remove arg Remove sequences in comma-separated list arg. -l arg, –longer-than arg Remove sequences not longer than arg. -s arg, –shorter-than arg Remove sequences not shorter than arg. -c arg, –cutoff arg Remove similar sequences with #mismatches < cutoff. -d arg, –down-to arg Remove similar sequences down to arg sequences. –remove-crazy arg Remove arg outlier sequences – defined as sequences that are missing too many conserved sites. –conserved arg (=0.75) Fraction of sequences that must contain a letter for it to be considered conserved.
COLUMN FILTERING OPTIONS:
-K arg, –keep-columns arg Keep columns from this sequence -m arg, –min-letters arg Remove columns with fewer than arg letters. -u arg, –remove-unique arg Remove insertions in a single sequence if longer than arg letters -e, –erase-empty-columns Remove columns with no characters (all gaps).
OUTPUT OPTIONS:
-S, –sort Sort partially ordered columns to group similar gaps. -L, –show-lengths Just print out sequence lengths. -N, –show-names Just print out sequence lengths. -F arg, –find-dups arg For each sequence, find the closest other sequence.
EXAMPLES:
Remove columns without a minimum number of letters: % alignment-thin --min-letters=5 file.fasta > file-thinned.fasta Remove sequences by name: % alignment-thin --remove=seq1,seq2 file.fasta > file2.fasta % alignment-thin --keep=seq1,seq2 file.fasta > file2.fasta Remove short sequences: % alignment-thin --longer-than=250 file.fasta > file-long.fasta Remove similar sequences with <= 5 differences from the closest other sequence: % alignment-thin --cutoff=5 file.fasta > more-than-5-differences.fasta Remove similar sequences until we have the right number of sequences: % alignment-thin --down-to=30 file.fasta > file-30taxa.fasta Remove dissimilar sequences that are missing conserved columns: % alignment-thin --remove-crazy=10 file.fasta > file2.fasta Protect some sequences from being removed: % alignment-thin --down-to=30 file.fasta --protect=seq1,seq2 > file2.fasta % alignment-thin --down-to=30 file.fasta --protect=@filename > file2.fasta
REPORTING BUGS:
BAli-Phy online help: <http://www.bali-phy.org/docs.php>. Please send bug reports to <bali-phy-users@googlegroups.com>.
AUTHORS
Benjamin Redelings. Feb 2018 alignment-thin(1)