Ubuntu Manpage: Beagle - Genotype calling, genotype phasing and imputation of ungenotyped markers

Provided by: beagle_220208-1_all

NAME

       Beagle - Genotype calling, genotype phasing and imputation of ungenotyped markers

SYNOPSIS

       java -Xmx[GB]g -jar /usr/share/beagle/beagle.jar [options]

DESCRIPTION

       Beagle  performs  genotype calling, genotype phasing, imputation of ungenotyped markers, and identity-by-
       descent segment detection. Genotypic imputation works on  phased  haplotypes  using  a  Li  and  Stephens
       haplotype  frequency model.  Beagle also implements the Refined IBD algorithm for detecting homozygosity-
       by-descent (HBD) and identity-by-descent (IBD) segments.

OPTIONS

Data input/output parameters
gt=filename
Optional
Specifies a VCF file containing a GT (genotype) format field for each marker. If a genotype
contains the phased allele separator, "|", then Beagle will preserve the phase of the genotype
during the analysis. If you use the gt argument, all genotypes in the output file will be phased
and non-missing.

gl=filename
Optional
Specifies a VCF file containing a GL or PL (genotype likelihood) format field for each marker.
Any data in the GT format field will be ignored. If both GL and PL format fields are present for
a marker, the GL format will be used.

gtgl=filename
Optional
Specifies a VCF file containing a GT, GL or PL format field for each marker. If a genotype is
non-missing, Beagle will ignore the genotype likelihood. If both GL and PL format fields are
present for a marker, the GL field will be used.

ref=filename
Optional
Specifies a VCF file containing phased reference genotypes. See the impute parameter.

out=prefix
Required
Specifies the output filename prefix. The prefix may be an absolute or relative filename, but it
cannot be a directory name.

excludesamples=filename
Optional
Specifies a file containing non-reference samples (one sample per line) to be excluded from the
analysis and output files.

excludemarkers=filename
Optional
Specifies a file containing markers (one marker per line) to be excluded from the analysis and the
output files. An excluded marker identifier can either be an identifier from the VCF record’s ID
field or a genomic coordinate in the format: CHROM:POS.

map=filename
Optional
Specifies a PLINK format genetic map on the cM scale. HapMap GrCh36 and GrCh37 genetic maps in
PLINK format are available for download from the Beagle website. Use of a genetic map is
recommended if you are imputing ungenotyped markers. If no genetic map is specified, Beagle will
assume a constant recombination rate of 1 cM / Mb.

chrom=chrom:start-end
Optional
Specifies a chromosome or chromosome interval using a chromosome identifier in the VCF file and
the starting and ending positions of the interval. The entire chromosome, the beginning of the
chromosome, and the end of a chromosome can be specified by chrom=[chrom], chrom=[chrom:-end], and
chrom=[chrom:start-] respectively.

maxlr=number_≥_1
Default = 5000
Specifies the maximum likelihood ratio at a genotype. If M is the maximum of the likelihoods of
each possible genotype, any likelihood that is less than (M ⁄ maxlr) is set to 0.0 to improve
computational efficiency.

General parameters
nthreads=positive_integer
Default: machine-dependent
Specifies the number of threads of execution. If no nthreads parameter is specified, the nthreads
parameter will be set equal to the number of CPU cores on the host machine.

lowmem=true/false
Default = false
Specifies whether a memory efficient algorithm should be used. The memory efficient algorithm
increases run-time by a factor less than 2.0.

window=positive_integer
Default = 50000
Specifies the number of markers to include in each sliding window. The window parameter must be
at least twice as large as the overlap parameter. The window parameter controls the amount of
memory used in the analysis. For human data, it is recommended that the window parameter be
greater than or equal to the typical number of markers in 5 cM.

overlap=positive_integer
Default = 3000
Specifies the number of markers of overlap between sliding windows. For human data, it is
recommended that the overlap be set to the typical number of markers in 0.5 cM (when ibd=false) or
2.0 cM (when ibd=true).

seed=integer
Default = -99999
Specifies the seed for the random number generator.

Phasing and imputation parameters
niterations=non-negative_integer
Default = 5
Specifies the number of phasing iterations. The phasing iterations are preceded by 10 burn-in
iterations which carry out the Beagle version 4.0 phasing algorithm. If you want to phase your
data with the Beagle 4.0 phasing algorithm, use niterations=0. Accuracy and compute time increase
with the number of iterations.

impute=true/false
Default = true
Specifies whether markers that are present in the reference panel but absent in your data will be
imputed. This option has no effect if the ref and gt arguments are not used.

gprobs=true/false
Default = false
Specifies whether a GP (genotype probability) format field will be included in the output VCF file
when imputing ungenotyped markers. By default, a GP fields is not printed because a DS (alternate
allele dose) format field is always printed when imputing ungenotyped markers.

ne=integer
Default = 1000000
Specifies the effective population size when imputing ungenotyped markers. The default value is
suitable for a large outbred human population. Smaller values in the hundreds or thousands for
the ne parameter are suggested for inbred human and animal populations.

err=non-negative_number
Default = 0.0001
Specifies the allele miscall rate. The default value should give good results for most sequence
and SNP array data.

cluster=non-negative_number
Default = 0.005
Specifies the maximum cM distance between individual markers that are combined into an aggregate
marker when imputing ungenotyped markers.

IBD parameters
ibd=true/false
Default = false
Specifies whether IBD analysis will be performed when the gt argument is used.

ibdlod=non-negative_integer
Default = 3.0
Specifies the minimum LOD score for reported IBD.

ibdscale=non-negative_number
Default: data-dependent
Specifies the scale parameter used to build the haplotype frequency model for IBD analysis. If no
ibdscale parameter is specified the scale parameter for the IBD analysis will be set to max{2,
sqrt[sample size]/100}, which we have found to work well for outbred populations.

ibdtrim=non-negative_integer
Default = 40
Specifies the number of markers trimmed from the end of a shared haplotype when testing for IBD.
Note: The default ibdtrim parameter is designed for European samples genotyped with a 1M SNP array
(~ 1 marker per 3 kb). For human SNP array data, it is recommended to set the ibdtrim parameter
to the typical number of markers in a 0.15 cM region. Pilot studies of randomly selected genomic
regions can be used to fine-tune the values of the ibdtrim parameter.

AUTHOR

       Beagle was written by Brian L. Browning.

       This  manual  page  was written by Dylan Aïssi <bob.dybian@gmail.com>, for the Debian project (but may be
       used by others).

4.1                                               February 2016                                        Beagle(1)

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

SEE ALSO

AUTHOR