Provided by: ncbi-entrez-direct_12.0.20190816+ds-1ubuntu0.2_amd64 

NAME
xtract - convert XML into a table of data values
SYNOPSIS
xtract [-help] [-strict] [-mixed] [-accent] [-ascii] [-compress] [-stops] [-input filename] [-transform filename] [-pattern expr] [-group expr] [-block expr] [-subset expr] [-path path] [-if expr [constraint]] [-unless expr [constraint]] [-and condition] [-or condition] [-else] [-position pos] [-equals str] [-contains str] [-is-within str] [-starts-with str] [-ends-with str] [-is-not str] [-is-equal-to expr] [-differs-from expr] [-gt N] [-ge N] [-lt N] [-le N] [-eq N] [-ne N] [-ret str] [-tab str] [-sep str] [-pfx str] [-sfx str] [-plg str] [-elg str] [-rst] [-clr] [-pfc str] [-deq str] [-wrp tag] [-def str] [-lbl str] [-element element] [-first element] [-last element] [-NAME] [-num element] [-len element] [-sum element] [-min element] [-max element] [-inc element] [-dec element] [-sub element] [-avg element] [-dev element] [-med element] [-bin element] [-bit element] [-encode element] [-plain element] [-upper element] [-lower element] [-title element] [-year element] [-translate element] [-terms element] [-words element] [-pairs element] [-reverse element] [-letters element] [-clauses element] [-indices element] [-e2index] [-revcomp] [-nucleic] [-0-based element] [-1-based element] [-ucsc-based element] [-insd arg ...] [-head str] [-tail str] [-hd str] [-tl str] [-format fmt] [-unicode style] [-script style] [-mathml terse] [-filter element action target] [-verify] [-outline] [-synopsis] [-select condition] [-in filename] [-j2x] [-set tag] [-rec tag] [-nest flat|recurse|plural|depth] [-examples] [-version]
DESCRIPTION
xtract converts an XML document into a table of data values according to user-specified rules.
OPTIONS
Processing Flags -strict Remove HTML and MathML tags. -mixed Allow mixed content XML. -accent Delete Unicode accents and diacritical marks. -ascii Convert Unicode to numeric HTML character entities. -compress Compress runs of spaces. -stops Retain stop words in selected phrases. Data Source -input filename Read XML from file instead of standard input. -transform filename File of substitutions for -translate. Exploration Argument Hierarchy -pattern expr -group expr -block expr -subset expr Name of record within set. Use of different argument names allows command-line control of nested looping. Path Navigation -path path Explore by list of adjacent object names. Exploration Constructs Object DateRevised Parent/Child Book/AuthorList Path MedlineCitation/Article/Journal/JournalIssue/PubDate Heterogeneous "PubmedArticleSet/*" Exhaustive "History/**" Nested "*/Taxon" Recursive "**/Gene-commentary" Conditional Execution -if expr [constraint] Element (or @attribute) must exist and satisfy any specified constraint. -unless expr [constraint] Skip if element matches. -and condition Preceding and following tests must both pass. -or condition Any passing test suffices. -else Execute if conditional test failed. -position pos first/last/outer/inner/even/odd/all. String Constraints -equals str String must match exactly. -contains str Substring must be present. -is-within str String must be present. -starts-with str Substring must be at beginning. -ends-with str Substring must be at end. -is-not str String must not match. Object Constraints -is-equal-to expr Object values must match. -differs-from expr Object values must differ. Numeric Constraints -gt N Greater than. -ge N Greater than or equal to. -lt N Less than to. -le N Less than or equal to. -eq N Equal to. -ne N Not equal to. Format Customization -ret str Override line break between patterns. -tab str Replace tab character between fields. -sep str Separator between group members. -pfx str Prefix to print before group. -sfx str Suffix to print after group. -plg str Prologue to print once before elements. -elg str Epilogue to print once after elements. -rst Reset -sep through -elg. -clr Clear queued tab separator. -pfc str Preface combines -clr and -pfx. -deq str Delete and replace queued tab separator. -wrp tag Wrap elements in XML object. -def str Default placeholder for missing fields. -lbl str Insert arbitrary text. Element Selection -element element Print all items that match tag name. -first element Only print value of first item. -last element Only print value of last item. -NAME Record value in named variable. -element Constructs Tag Caption Group Initials,LastName Parent/Child MedlineCitation/PMID Recursive "**/Gene-commentary_accession" Unrestricted PubDate/* Attribute DescriptorName@MajorTopicYN Range MedlineDate[1:4] Substring "Title[phospholipase | rattlesnake]" Object Count "#Author" Item Length "%Title" Element Depth "^PMID" Variable "&NAME" Special -element Operations Parent Index "+" Object Name "+" XML Subtree "*" Children "$" Attributes "@" Numeric Processing -num element Count. -len element Length. -sum element Sum. -min element Minimum. -max element Maximum. -inc element Increment. -dec element Decrement. -sub element Difference. -avg element Average. -dev element Deviation. -med element Median. -bin element Binary. -bit element Bit count. String Processing -encode element URL-encode <, >, &, ", and ' characters. -plain element Remove embedded mixed-content markup tags. -upper element Convert text to uppercase. -lower element Convert text to lowercase. -title element Capitalize initial letters of words. -year element Extract first 4-digit year from string. -translate element Substitute values with -transform table. Text Processing -terms element Partition text at spaces. -words element Split at punctuation marks. -pairs element Adjacent informative words. -reverse element Reverse words in string. -letters element Separate individual letters. -clauses element Break at phrase separators. -indices element Word pair index generation. -e2index Create Entrez index XML. Sequence Processing -revcomp Reverse-complement nucleotide sequence. -nucleic Subrange determines forward or revcomp. Sequence Coordinates -0-based element Zero-based. -1-based element One-based. -ucsc-based element Half-open. Command Generator -insd arg ... Generate INSDSeq extraction commands. Print them if invoked standalone; run them if invoked as part of a pipeline. Requires one or more arguments, which may appear in the following order: Descriptor(s) INSDSeq_sequence/INSDSeq_definition/INSDSeq_division/... [...] Completeness complete/partial Feature(s) CDS/mRNA/...[,...] Qualifier(s) INSDFeature_key/"#INSDInterval"/gene/product/sub_sequence/... [...] Miscellaneous -head str Print before everything else. -tail str Print after everything else. -hd str Print before each record. -tl str Print after each record. Phrase Filtering -require str Keep records that contain a given phrase. -exclude str Keep records that do not contain a given phrase. Reformatting -format fmt copy Fast block copy (still applies processing flags). compact Compress runs of spaces. flush Suppress line indentation. indent Indent according to nesting depth. expand Place each attribute on a separate line. -unicode style How to handle Unicode superscript and subscript digits (first converted to ASCII form in all cases). fuse Run them all together, with no additional markup. space Add spaces between digits in different positions. period Add periods between digits in different positions. brackets Surround superscripts by square brackets and subscripts by parentheses. markdown Surround superscripts with carets and subscripts with tildes. slash Add backslashes when going up in height and forward slashes when going down. tag Put superscripts in XML sup elements and subscripts in sub elements. -script style How to handle XML sup and sub elements (denoting superscripts and subscripts, respectively). brackets Surround superscripts by square brackets and subscripts by parentheses. markdown Surround superscripts with carets and subscripts with tildes. -mathml terse Flatten MathML markup tersely. Modification -filter element action target Actions: retain Keep matching elements (no-op). remove Remove matching elements. encode HTML-escape special characters. decode Decode HTML escapes. shrink Compress runs of spaces. expand Place each attribute on a separate line. accent Strip off Unicode accents. Targets: content Plain-text content. cdata CDATA blocks. comment Comments. object The whole object. attributes Attributes. container Start and end tags. Validation -verify Report XML data integrity problems. Summary -outline Display outline of XML structure. -synopsis Display count of unique XML paths. Record Selection -select condition Select record subset by conditions. -in filename File of identifiers to select. Data Conversion -j2x Convert JSON stream to XML suitable for -path navigation. -set tag Replace set wrapper tag. -rec tag Replace record wrapper tag. -nest flat|recurse|plural|depth Nested array naming policy. Documentation -help Print usage information and some example argument combinations. -examples Complete examples of edirect(1) and xtract usage. -version Print version number.
NOTES
String constraints use case-insensitive comparisons. Numeric constraints and selection arguments use integer values. -num and -len selections are synonyms for Object Count (#) and Item Length (%). -words, -pairs, and -indices convert to lower case.
SEE ALSO
download-ncbi-data(1), edirect(1), esample(1), index-bioc(1), index-pubmed(1), pm-index(1), pm-invert(1), pm-stash(1), rchive(1), transmute(1), xml2tbl(1), xy-plot(1).