Provided by: mira-assembler_4.9.6-10_amd64
NAME
mirabait - a 'grep' like tool to select reads with kmers up to 256 bp
SYNOPSIS
mirabait [options] {-b baitfile [-b ...] | -B file | -j joblibrary} {-p file_1 file_2 | -P file3}* [file4 ...]
DESCRIPTION
mirabait selects reads from a read collection which are partly similar or equal to sequences defined as target baits. Similarity is defined by finding a user-adjustable number of common k-mers (sequences of k consecutive bases) which are the same in the bait sequences and the screened sequences to be selected, either in forward or forward/reverse complement direction. Adding a DUST-like repeat filter for repeats up 4 bases is optional. When used on paired files, selects sequences where at least one mate matches.
OPTIONS
Main options: -b file Load bait sequences from file (multiple -b allowed) -B file Load baits from kmer statistics file, not from sequence files. Only one -B allowed, cannot be combined with -b. (see -K for creating such a file) -j job Set options for predefined job from supplied MIRA library Currently available jobs: rrna Bait rRNA sequences -p file1 file2 Load paired sequences to search from file1 and file2 Files must contain same number of sequences, sequence names must be in same order. Multiple -p allowed, but must come before non-paired files. -P file Load paired sequences from file File must be interleaved: pairs must follow each other, non-pairs are not allowed. Multiple -p allowed, but must come before non-paired files. -k int kmer length of bait in bases (<=256, default=31) -n int If >0: minimum number of k-mer baits needed (default=1) If <=0: allowed number of missed kmers over sequence length -d Do not use kmers with microrepeats (DUST-like, see also -D) -D int Set length of microrepeats in kmers to discard from bait. - int > 0 microrepeat len in percentage of kmer length. E.g.: -k 17 -D 67 --> 11.39 bases --> 12 bases. - int < 0 microrepeat len in bases. - int != 0 implies -d, int=0 turns DUST filter off. -i Selects sequences that do not hit bait -I Selects sequences that hit and do not hit bait (to different files) -r No checking of reverse complement direction -t Number of threads to use (default=0 -> up to 4 CPU cores) Options for output definition: Normally mirabait writes separate result files (named 'bait_match_*' and 'bait_miss_*') for each input to the current directory. For changing this behaviour and other relating to output, use these options: -c No case change of sequence to denote bait hits -l int length of a line (FASTA only, default 0=unlimited) -K file Save kmer statistics to 'file' (see also -B) -N name Change the prefix 'bait' to <name> Has no effect if -o/-O is used and targets are not directories -o <path> Save sequences matching bait to path If path is a directory, write separate files into this directory. If not, combine all matching sequences from the input file(s) into a single file specified by the path. -O <path> Like -o, but for sequences not matching Other options: -T dir Use 'dir' as directory for temporary files instead of current working directory. -m integer Memory to use for computing kmer statistics 0..100 = use percentage of free system memory >100 = amount of MiB to use (e.g. 16384 for 16 GiB) Default 75 (75% of free system memory).
Defining files types to load/save:
Normally mirabait recognises the file types according to the file extension (even when packed). In cases you need to force a certain file type because the file extension is non-standard, use the EMBOSS notation to force a type: <filetype>::<name_of_file>. E.g., to tell that "somefile.dat" is FASTQ, use: fastq::somefile.dat Recognised types are: caf, fasta, fastq, gbf, gbk, gbff, maf and phd. MIRABAIT will write files in the same file type as the corresponding input files. Examples: mirabait -b b.fasta file.fastq mirabait -I -j rrna -p file_1.fastq file_2.fastq mirabait -b b1.fasta -b b2.gbk file.fastq mirabait -b fasta::baits.dat -p fastq::file_1.dat fastq::file_2.dat mirabait -b b.fasta -p file_1.fastq file_2.fastq -P file3.fasta file4.caf mirabait -I -b b.fasta -p file_1.fastq file_2.fastq -P file3.fasta file4.caf mirabait -k 27 -n 10 -b b.fasta file.fastq mirabait -b fasta::b.dat fastq::file.dat mirabait -o /dev/shm/ -b b.fasta -p file_1.fastq file_2.fastq mirabait -o /dev/shm/match -b b.fasta -p file_1.fastq file_2.fastq mirabait -b human_genome.fasta -K HG_kmerstats.mhs.gz -p file1.fastq file2.fastq mirabait -B HG_kmerstats.mhs.gz -p file1.fastq file2.fastq mirabait -d -B HG_kmerstats.mhs.gz -p file1.fastq file2.fastq
SEE ALSO
mira(1), miraconvert(1) A more extensive documentation is provided in the MIRA manual available online at http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html On Debian, this can be installed with the mira-doc package and can then be found at /usr/share/doc/mira-assembler/DefinitiveGuideToMIRA.html. On other systems, you may want to check in /usr/local/share/mira/doc or run "locate DefinitiveGuideToMIRA" to find it locally. You can also subscribe one of the MIRA mailing lists at http://www.chevreux.org/mira_mailinglists.html After subscribing, mail general questions to the MIRA talk mailing list: mira_talk@freelists.org
BUGS
To report bugs or ask for features, please use the ticketing system at: http://sourceforge.net/projects/mira-assembler/
AUTHOR
Bastien Chevreux <bach@chevreux.org> This manual page was written by Bastien Chevreux <bach@chevreux.org> but can be freely used for any documentation purpose.