Ubuntu Manpage: bp_mask_by_search - mask sequence(s) based on its alignment results

NAME

       bp_mask_by_search - mask sequence(s) based on its alignment results

SYNOPSIS

         bp_mask_by_search.pl -f blast genomefile blastfile.bls > maskedgenome.fa

DESCRIPTION

       Mask sequence based on significant alignments of another sequence.  You need to provide
       the report file and the entire sequence data which you want to mask.  By default this will
       assume you have done a TBLASTN (or TFASTY) and try and mask the hit sequence assuming
       you've provided the sequence file for the hit database.  If you would like to do the
       reverse and mask the query sequence specify the -t/--type query flag.

       This is going to read in the whole sequence file into memory so for large genomes this may
       fall over.  I'm using DB_File to prevent keeping everything in memory, one solution is to
       split the genome into pieces (BEFORE you run the DB search though, you want to use the
       exact file you BLASTed with as input to this program).

       Below the double dash (--) options are of the form --format=fasta or --format fasta or you
       can just say -f fasta

       By -f/--format I mean either are acceptable options.  The =s or =n or =c specify these
       arguments expect a 'string'

       Options:
           -f/--format=s    Search report format (fasta,blast,axt,hmmer,etc)
           -sf/--sformat=s  Sequence format (fasta,genbank,embl,swissprot)
           --hardmask       (booelean) Hard mask the sequence
                            with the maskchar [default is lowercase mask]
           --maskchar=c     Character to mask with [default is N], change
                            to 'X' for protein sequences
           -e/--evalue=n    Evalue cutoff for HSPs and Hits, only
                            mask sequence if alignment has specified evalue
                            or better
           -o/--out/
           --outfile=file   Output file to save the masked sequence to.
           -t/--type=s      Alignment seq type you want to mask, the
                            'hit' or the 'query' sequence. [default is 'hit']
           --minlen=n       Minimum length of an HSP for it to be used
                            in masking [default 0]
           -h/--help        See this help information

AUTHOR - Jason Stajich

       Jason Stajich, jason-at-bioperl-dot-org.