Provided by: ncbi-entrez-direct_14.6.20210224+dfsg-5ubuntu0.3_amd64
NAME
xtract - NCBI Entrez Direct XML conversion and transformation tool
SYNOPSIS
xtract [-help] [-strict] [-mixed] [-accent] [-ascii] [-compress] [-stops] [-input filename] [-transform filename] [-pattern expr] [-group expr] [-block expr] [-subset expr] [-path path] [-if expr [constraint]] [-unless expr [constraint]] [-and condition] [-or condition] [-else] [-position pos] [-equals str] [-contains str] [-is-within str] [-starts-with str] [-ends-with str] [-is-not str] [-is-before str] [-is-after str] [-matches str] [-resembles str] [-is-equal-to expr] [-differs-from expr] [-gt N] [-ge N] [-lt N] [-le N] [-eq N] [-ne N] [-ret str] [-tab str] [-sep str] [-pfx str] [-sfx str] [-rst] [-clr] [-pfc str] [-deq str] [-def str] [-lbl str] [-set tag] [-rec tag] [-wrp tag] [-enc tag] [-plg str] [-elg str] [-pkg tag] [-fwd str] [-awd str] [-element element] [-first element] [-last element] [-NAME] [--STATS] [-num element] [-len element] [-sum element] [-min element] [-max element] [-inc element] [-dec element] [-sub element] [-avg element] [-dev element] [-med element] [-mul element] [-div element] [-mod element] [-bin element] [-bit element] [-encode element] [-plain element] [-upper element] [-lower element] [-chain element] [-title element] [-year element] [-doi element] [-translate element] [-terms element] [-words element] [-pairs element] [-order element] [-reverse element] [-letters element] [-clauses element] [-replace -reg target -exp replacement] [-revcomp] [-nucleic] [-fasta] [-ncbi2na] [-ncbi4na] [-molwt] [-0-based element] [-1-based element] [-ucsc-based element] [-insd arg ...] [-histogram] [-e2index] [-indices element] [-head str] [-tail str] [-hd str] [-tl str] [-select condition] [-in filename] [-sort element] [-format fmt [-unicode style]] [-verify] [-outline] [-synopsis] [-contour [delimiter]] [-examples] [-unix] [-version]
DESCRIPTION
xtract converts an XML document into a table of data values according to user-specified rules.
OPTIONS
Processing Flags -strict Remove HTML and MathML tags. -mixed Allow mixed content XML. -accent Delete Unicode accents and diacritical marks. -ascii Convert Unicode to numeric HTML character entities. -compress Compress runs of spaces. -stops Retain stop words in selected phrases. Data Source -input filename Read XML from file instead of standard input. -transform filename File of substitutions for -translate. Exploration Argument Hierarchy -pattern expr -group expr -block expr -subset expr Name of record within set. Use of different argument names allows command-line control of nested looping. Path Navigation -path path Explore by list of adjacent object names. Exploration Constructs Object DateRevised Parent/Child Book/AuthorList Path MedlineCitation/Article/Journal/JournalIssue/PubDate Heterogeneous "PubmedArticleSet/*" Exhaustive "History/**" Nested "*/Taxon" Recursive "**/Gene-commentary" Conditional Execution -if expr [constraint] Element (or @attribute) must exist and satisfy any specified constraint. -unless expr [constraint] Skip if element matches. -and condition Preceding and following tests must both pass. -or condition Any passing test suffices. -else Execute if conditional test failed. -position pos first/last/outer/inner/even/odd/all. String Constraints -equals str String must match exactly. -contains str Substring must be present. -is-within str String must be present. -starts-with str Substring must be at beginning. -ends-with str Substring must be at end. -is-not str String must not match. -is-before str First string < second string. -is-after str First string > second string. -matches str Matches without commas or semicolons. -resembles str Requires all words, but in any order. Object Constraints -is-equal-to expr Object values must match. -differs-from expr Object values must differ. Numeric Constraints -gt N Greater than. -ge N Greater than or equal to. -lt N Less than to. -le N Less than or equal to. -eq N Equal to. -ne N Not equal to. Format Customization -ret str Override line break between patterns. -tab str Replace tab character between fields. -sep str Separator between group members. -pfx str Prefix to print before group. -sfx str Suffix to print after group. -rst Reset -sep through -elg. -clr Clear queued tab separator. -pfc str Preface combines -clr and -pfx. -deq str Delete and replace queued tab separator. -def str Default placeholder for missing fields. -lbl str Insert arbitrary text. XML Generation -set tag XML tag for entire set. -rec tag XML tag for each record. -wrp tag Wrap elements in XML object. -enc tag Encase instance in XML object. -plg str Prologue to print before instance. -elg str Epilogue to print after instance. -pkg tag Package subset in XML object. -fwd str Foreword to print before subset. -awd str Afterword to print after subset. Element Selection -element element Print all items that match tag name. -first element Only print value of first item. -last element Only print value of last item. -NAME Record value in named variable. --STATS Accumulate values into variable. -element Constructs Tag Caption Group Initials,LastName Parent/Child MedlineCitation/PMID Recursive "**/Gene-commentary_accession" Unrestricted PubDate/* Attribute DescriptorName@MajorTopicYN Range MedlineDate[1:4] Substring "Title[phospholipase | rattlesnake]" Object Count "#Author" Item Length "%Title" Element Depth "^PMID" Variable "&NAME" Special -element Operations Parent Index "+" Object Name "+" XML Subtree "*" Children "$" Attributes "@" Numeric Processing -num element Count. -len element Length. -sum element Sum. -min element Minimum. -max element Maximum. -inc element Increment. -dec element Decrement. -sub element Difference. -avg element Average. -dev element Deviation. -med element Median. -mul element Product. -div element Quotient. -mod element Remainder. -bin element Binary. -bit element Bit count. String Processing -encode element XML-encode <, >, &, ", and ' characters. -plain element Remove embedded mixed-content markup tags. -upper element Convert text to uppercase. -lower element Convert text to lowercase. -chain element Change spaces to underscores. -title element Capitalize initial letters of words. -year element Extract first 4-digit year from string. -doi element Add https://doi.org/ prefix, URL encode. -translate element Substitute values with -transform table. Text Processing -terms element Partition text at spaces. -words element Split at punctuation marks. -pairs element Adjacent informative words. -order element Rearrange words in sorted order. -reverse element Reverse words in string. -letters element Separate individual letters. -clauses element Break at phrase separators. Regular Expression -replace Substitute text using regular expressions. -reg target Target expression. -exp pattern Replacement pattern. Sequence Processing -revcomp Reverse complement nucleotide sequence. -nucleic Subrange determines forward or revcomp. -fasta Split sequence into blocks of 50 letters. -ncbi2na Expand ncbi2na to IUPAC. (May need to truncate result to actual sequence length.) -ncbi4na Expand ncbi4na to IUPAC. (May need to truncate result to actual sequence length.) -molwt Calculate molecular weight of peptide. Sequence Coordinates -0-based element Zero-based. -1-based element One-based. -ucsc-based element Half-open. Command Generator -insd arg ... Generate INSDSeq extraction commands. Print them if invoked standalone; run them if invoked as part of a pipeline. Requires one or more arguments, which may appear in the following order: Descriptor(s) INSDSeq_sequence/INSDSeq_definition/INSDSeq_division/... [...] Completeness complete/partial Feature(s) CDS/mRNA/...[,...] Qualifier(s) INSDFeature_key/"#INSDInterval"/gene/product/ feat_location/sub_sequence/... [...] Frequency Table -histogram Collects data for sort-uniq-count(1) on entire set of records. Entrez Indexing -e2index Create Entrez index XML. -indices element Index normalized words. Output Organization -head str Print before everything else. -tail str Print after everything else. -hd str Print before each record. -tl str Print after each record. Record Selection -select condition Select record subset by conditions. -in filename File of identifiers to use for selection. Record Rearrangement -sort element Element to use as sort key. Reformatting -format fmt copy Fast block copy (still applies processing flags). compact Compress runs of spaces. flush Suppress line indentation. indent Indent according to nesting depth. expand Place each attribute on a separate line. Validation -verify Report XML data integrity problems. Summary -outline Display outline of XML structure. -synopsis Display individual XML paths. -contour [delimiter] Display XML paths to leaf nodes (delimited by / by default). Documentation -help Print usage information and some example argument combinations. -examples Complete examples of edirect(1) and xtract usage. -unix Illustrate common Unix command arguments. -version Print version number.
NOTES
String constraints use case-insensitive comparisons. Numeric constraints and selection arguments use integer values. -num and -len selections are synonyms for Object Count (#) and Item Length (%). -words, -pairs, and -indices convert to lower case.
SEE ALSO
download-ncbi-data(1), edirect(1), esample(1), index-extras(1), index-pubmed(1), pm-index(1), pm-invert(1), pm-stash(1), rchive(1), sort-uniq-count(1), transmute(1), xml2tbl(1), xy-plot(1).