Provided by: ncbi-entrez-direct_12.0.20190816+ds-1ubuntu0.2_amd64 

NAME
xtract - convert XML into a table of data values
SYNOPSIS
xtract [-help] [-strict] [-mixed] [-accent] [-ascii] [-compress] [-stops] [-input filename]
[-transform filename] [-pattern expr] [-group expr] [-block expr] [-subset expr] [-path path]
[-if expr [constraint]] [-unless expr [constraint]] [-and condition] [-or condition] [-else]
[-position pos] [-equals str] [-contains str] [-is-within str] [-starts-with str] [-ends-with str]
[-is-not str] [-is-equal-to expr] [-differs-from expr] [-gt N] [-ge N] [-lt N] [-le N] [-eq N] [-ne N]
[-ret str] [-tab str] [-sep str] [-pfx str] [-sfx str] [-plg str] [-elg str] [-rst] [-clr] [-pfc str]
[-deq str] [-wrp tag] [-def str] [-lbl str] [-element element] [-first element] [-last element] [-NAME]
[-num element] [-len element] [-sum element] [-min element] [-max element] [-inc element] [-dec element]
[-sub element] [-avg element] [-dev element] [-med element] [-bin element] [-bit element]
[-encode element] [-plain element] [-upper element] [-lower element] [-title element] [-year element]
[-translate element] [-terms element] [-words element] [-pairs element] [-reverse element]
[-letters element] [-clauses element] [-indices element] [-e2index] [-revcomp] [-nucleic]
[-0-based element] [-1-based element] [-ucsc-based element] [-insd arg ...] [-head str] [-tail str]
[-hd str] [-tl str] [-format fmt] [-unicode style] [-script style] [-mathml terse] [-filter element
action target] [-verify] [-outline] [-synopsis] [-select condition] [-in filename] [-j2x] [-set tag]
[-rec tag] [-nest flat|recurse|plural|depth] [-examples] [-version]
DESCRIPTION
xtract converts an XML document into a table of data values according to user-specified rules.
OPTIONS
Processing Flags
-strict
Remove HTML and MathML tags.
-mixed Allow mixed content XML.
-accent
Delete Unicode accents and diacritical marks.
-ascii Convert Unicode to numeric HTML character entities.
-compress
Compress runs of spaces.
-stops Retain stop words in selected phrases.
Data Source
-input filename
Read XML from file instead of standard input.
-transform filename
File of substitutions for -translate.
Exploration Argument Hierarchy
-pattern expr
-group expr
-block expr
-subset expr
Name of record within set. Use of different argument names allows command-line control of nested
looping.
Path Navigation
-path path
Explore by list of adjacent object names.
Exploration Constructs
Object DateRevised
Parent/Child Book/AuthorList
Path MedlineCitation/Article/Journal/JournalIssue/PubDate
Heterogeneous "PubmedArticleSet/*"
Exhaustive "History/**"
Nested "*/Taxon"
Recursive "**/Gene-commentary"
Conditional Execution
-if expr [constraint]
Element (or @attribute) must exist and satisfy any specified constraint.
-unless expr [constraint]
Skip if element matches.
-and condition
Preceding and following tests must both pass.
-or condition
Any passing test suffices.
-else Execute if conditional test failed.
-position pos
first/last/outer/inner/even/odd/all.
String Constraints
-equals str
String must match exactly.
-contains str
Substring must be present.
-is-within str
String must be present.
-starts-with str
Substring must be at beginning.
-ends-with str
Substring must be at end.
-is-not str
String must not match.
Object Constraints
-is-equal-to expr
Object values must match.
-differs-from expr
Object values must differ.
Numeric Constraints
-gt N Greater than.
-ge N Greater than or equal to.
-lt N Less than to.
-le N Less than or equal to.
-eq N Equal to.
-ne N Not equal to.
Format Customization
-ret str
Override line break between patterns.
-tab str
Replace tab character between fields.
-sep str
Separator between group members.
-pfx str
Prefix to print before group.
-sfx str
Suffix to print after group.
-plg str
Prologue to print once before elements.
-elg str
Epilogue to print once after elements.
-rst Reset -sep through -elg.
-clr Clear queued tab separator.
-pfc str
Preface combines -clr and -pfx.
-deq str
Delete and replace queued tab separator.
-wrp tag
Wrap elements in XML object.
-def str
Default placeholder for missing fields.
-lbl str
Insert arbitrary text.
Element Selection
-element element
Print all items that match tag name.
-first element
Only print value of first item.
-last element
Only print value of last item.
-NAME Record value in named variable.
-element Constructs
Tag Caption
Group Initials,LastName
Parent/Child MedlineCitation/PMID
Recursive "**/Gene-commentary_accession"
Unrestricted PubDate/*
Attribute DescriptorName@MajorTopicYN
Range MedlineDate[1:4]
Substring "Title[phospholipase | rattlesnake]"
Object Count "#Author"
Item Length "%Title"
Element Depth "^PMID"
Variable "&NAME"
Special -element Operations
Parent Index "+"
Object Name "+"
XML Subtree "*"
Children "$"
Attributes "@"
Numeric Processing
-num element
Count.
-len element
Length.
-sum element
Sum.
-min element
Minimum.
-max element
Maximum.
-inc element
Increment.
-dec element
Decrement.
-sub element
Difference.
-avg element
Average.
-dev element
Deviation.
-med element
Median.
-bin element
Binary.
-bit element
Bit count.
String Processing
-encode element
URL-encode <, >, &, ", and ' characters.
-plain element
Remove embedded mixed-content markup tags.
-upper element
Convert text to uppercase.
-lower element
Convert text to lowercase.
-title element
Capitalize initial letters of words.
-year element
Extract first 4-digit year from string.
-translate element
Substitute values with -transform table.
Text Processing
-terms element
Partition text at spaces.
-words element
Split at punctuation marks.
-pairs element
Adjacent informative words.
-reverse element
Reverse words in string.
-letters element
Separate individual letters.
-clauses element
Break at phrase separators.
-indices element
Word pair index generation.
-e2index
Create Entrez index XML.
Sequence Processing
-revcomp
Reverse-complement nucleotide sequence.
-nucleic
Subrange determines forward or revcomp.
Sequence Coordinates
-0-based element
Zero-based.
-1-based element
One-based.
-ucsc-based element
Half-open.
Command Generator
-insd arg ...
Generate INSDSeq extraction commands. Print them if invoked standalone; run them if invoked as
part of a pipeline. Requires one or more arguments, which may appear in the following order:
Descriptor(s) INSDSeq_sequence/INSDSeq_definition/INSDSeq_division/... [...]
Completeness complete/partial
Feature(s) CDS/mRNA/...[,...]
Qualifier(s) INSDFeature_key/"#INSDInterval"/gene/product/sub_sequence/... [...]
Miscellaneous
-head str
Print before everything else.
-tail str
Print after everything else.
-hd str
Print before each record.
-tl str
Print after each record.
Phrase Filtering
-require str
Keep records that contain a given phrase.
-exclude str
Keep records that do not contain a given phrase.
Reformatting
-format fmt
copy Fast block copy (still applies processing flags).
compact Compress runs of spaces.
flush Suppress line indentation.
indent Indent according to nesting depth.
expand Place each attribute on a separate line.
-unicode style
How to handle Unicode superscript and subscript digits (first converted to ASCII form in all
cases).
fuse Run them all together, with no additional markup.
space Add spaces between digits in different positions.
period Add periods between digits in different positions.
brackets Surround superscripts by square brackets and subscripts by parentheses.
markdown Surround superscripts with carets and subscripts with tildes.
slash Add backslashes when going up in height and forward slashes when going down.
tag Put superscripts in XML sup elements and subscripts in sub elements.
-script style
How to handle XML sup and sub elements (denoting superscripts and subscripts, respectively).
brackets Surround superscripts by square brackets and subscripts by parentheses.
markdown Surround superscripts with carets and subscripts with tildes.
-mathml terse
Flatten MathML markup tersely.
Modification
-filter element action target
Actions:
retain Keep matching elements (no-op).
remove Remove matching elements.
encode HTML-escape special characters.
decode Decode HTML escapes.
shrink Compress runs of spaces.
expand Place each attribute on a separate line.
accent Strip off Unicode accents.
Targets:
content Plain-text content.
cdata CDATA blocks.
comment Comments.
object The whole object.
attributes Attributes.
container Start and end tags.
Validation
-verify
Report XML data integrity problems.
Summary
-outline
Display outline of XML structure.
-synopsis
Display count of unique XML paths.
Record Selection
-select condition
Select record subset by conditions.
-in filename
File of identifiers to select.
Data Conversion
-j2x Convert JSON stream to XML suitable for -path navigation.
-set tag
Replace set wrapper tag.
-rec tag
Replace record wrapper tag.
-nest flat|recurse|plural|depth
Nested array naming policy.
Documentation
-help Print usage information and some example argument combinations.
-examples
Complete examples of edirect(1) and xtract usage.
-version
Print version number.
NOTES
String constraints use case-insensitive comparisons.
Numeric constraints and selection arguments use integer values.
-num and -len selections are synonyms for Object Count (#) and Item Length (%).
-words, -pairs, and -indices convert to lower case.
SEE ALSO
download-ncbi-data(1), edirect(1), esample(1), index-bioc(1), index-pubmed(1), pm-index(1), pm-invert(1),
pm-stash(1), rchive(1), transmute(1), xml2tbl(1), xy-plot(1).
NCBI 2020-02-02 XTRACT(1)