Provided by: ncbi-entrez-direct_19.0.20230216+dfsg-2_amd64 bug

NAME

       xtract - NCBI Entrez Direct XML conversion and transformation tool

SYNOPSIS

       xtract   [-help]  [-strict]  [-mixed]  [-self]  [-accent]  [-ascii]  [-compress]  [-stops]
       [-input filename] [-transform filename] [-aliases filename] [-pattern expr]  [-group expr]
       [-block expr]          [-subset expr]         [-path path]         [-if expr [constraint]]
       [-unless expr [constraint]]  [-and condition]  [-or condition]   [-else]   [-position pos]
       [-equals str]    [-contains str]   [-includes str]   [-is-within str]   [-starts-with str]
       [-ends-with str]    [-is-not str]    [-is-before str]    [-is-after str]    [-matches str]
       [-resembles str]  [-is-equal-to expr] [-differs-from expr] [-gt N] [-ge N] [-lt N] [-le N]
       [-eq N] [-ne N] [-ret str]  [-tab str]  [-sep str]  [-pfx str]  [-sfx str]  [-rst]  [-clr]
       [-pfc str]  [-deq str]  [-def str]  [-lbl str] [-set tag] [-rec tag] [-wrp tag] [-enc tag]
       [-plg str] [-elg str] [-pkg tag] [-fwd str] [-awd str] [-element element] [-first element]
       [-last element]   [-backward element]   [-NAME]  [--STATS]  [-num element]  [-len element]
       [-sum element] [-acc element] [-min element] [-max element] [-inc element]  [-dec element]
       [-sub element]  [-avg element] [-dev element] [-med element] [-mul element] [-div element]
       [-mod element] [-bin element] [-oct element] [-hex element] [-bit element]  [-pad element]
       [-encode element]   [-upper element]  [-lower element]  [-chain element]  [-title element]
       [-mirror element]  [-alnum element]  [-basic element]  [-plain element]  [-simple element]
       [-author element]   [-prose element]  [-terms element]  [-words element]  [-pairs element]
       [-order element] [-reverse element] [-letters element] [-clauses element]  [-year element]
       [-month element]   [-date element]   [-page element]  [-auth element]  [-initials element]
       [-jour element]   [-trim element]   [-wct element]   [-doi element]   [-translate element]
       [-classify element] [-replace -reg target -exp replacement] [-revcomp] [-nucleic] [-fasta]
       [-ncbi2na] [-ncbi4na] [-molwt] [-0-based element] [-1-based element] [-ucsc-based element]
       [-insd arg ...]  [-histogram]  [-e2index [extras]]  [-indices element]  [-article element]
       [-abstract element]  [-paragraph element]   [-stemmed element]   [-head str]   [-tail str]
       [-hd str]     [-tl str]     [-select condition]    [-in filename]    [-sort[-fwd] element]
       [-sort-rev element]  [-format fmt  [-unicode style]]  [-verify]   [-outline]   [-synopsis]
       [-contour [delimiter]] [-examples] [-unix] [-version]

DESCRIPTION

       xtract  converts  an  XML document into a table of data values according to user-specified
       rules.

OPTIONS

   Processing Flags
       -strict
              Remove HTML and MathML tags.

       -mixed Allow mixed content XML.

       -self  Allow detection of empty self-closing tags.

       -accent
              Delete Unicode accents and diacritical marks.

       -ascii Convert Unicode to numeric HTML character entities.

       -compress
              Compress runs of spaces.

       -stops Retain stop words in selected phrases.

   Data Source
       -input filename
              Read XML from file instead of standard input.

       -transform filename
              File of substitutions for -translate.

       -aliases filename
              Mappings file for -classify operation.

   Exploration Argument Hierarchy
       -pattern expr
       -group expr
       -block expr
       -subset expr
              Name of record within set.  Use of different  argument  names  allows  command-line
              control of nested looping.

   Path Navigation
       -path path
              Explore by list of adjacent object names.

   Exploration Constructs
       Object         DateRevised
       Parent/Child   Book/AuthorList
       Path           MedlineCitation/Article/Journal/JournalIssue/PubDate
       Heterogeneous  "PubmedArticleSet/*"
       Exhaustive     "History/**"
       Nested         "*/Taxon"

   Conditional Execution
       -if expr [constraint]
              Element (or @attribute) must exist and satisfy any specified constraint.

       -unless expr [constraint]
              Skip if element matches.

       -and condition
              Preceding and following tests must both pass.

       -or condition
              Any passing test suffices.

       -else  Execute if conditional test failed.

       -position pos
              first/last/outer/inner/even/odd/all.

   String Constraints
       -equals str
              String must match exactly.

       -contains str
              Substring must be present.

       -includes str
              Substring must match at word boundaries.

       -is-within str
              String must be present.

       -starts-with str
              Substring must be at beginning.

       -ends-with str
              Substring must be at end.

       -is-not str
              String must not match.

       -is-before str
              First string < second string.

       -is-after str
              First string > second string.

       -matches str
              Matches without commas or semicolons.

       -resembles str
              Requires all words, but in any order.

   Object Constraints
       -is-equal-to expr
              Object values must match.

       -differs-from expr
              Object values must differ.

   Numeric Constraints
       -gt N  Greater than.

       -ge N  Greater than or equal to.

       -lt N  Less than to.

       -le N  Less than or equal to.

       -eq N  Equal to.

       -ne N  Not equal to.

   Format Customization
       -ret str
              Override line break between patterns.

       -tab str
              Replace tab character between fields.

       -sep str
              Separator between group members.

       -pfx str
              Prefix to print before group.

       -sfx str
              Suffix to print after group.

       -rst   Reset -sep through -elg.

       -clr   Clear queued tab separator.

       -pfc str
              Preface combines -clr and -pfx.

       -deq str
              Delete and replace queued tab separator.

       -def str
              Default placeholder for missing fields.

       -lbl str
              Insert arbitrary text.

   XML Generation
       -set tag
              XML tag for entire set.

       -rec tag
              XML tag for each record.

       -wrp tag
              Wrap elements in XML object.

       -enc tag
              Encase instance in XML object.

       -plg str
              Prologue to print before instance.

       -elg str
              Epilogue to print after instance.

       -pkg tag
              Package subset in XML object.

       -fwd str
              Foreword to print before subset.

       -awd str
              Afterword to print after subset.

   Element Selection
       -element element
              Print all items that match tag name.

       -first element
              Only print value of first item.

       -last element
              Only print value of last item.

       -backward element
              Print values in reverse order.

       -NAME  Record value in named variable.

       --STATS
              Accumulate values into variable.

   -element Constructs
       Tag            Caption
       Group          Initials,LastName
       Parent/Child   MedlineCitation/PMID
       Recursive      "**/Gene-commentary_accession"
       Unrestricted   PubDate/*
       Attribute      DescriptorName@MajorTopicYN
       Range          MedlineDate[1:4]
       Substring      "Title[phospholipase | rattlesnake]"
       Object Count   "#Author"
       Item Length    "%Title"
       Element Depth  "^PMID"
       Variable       "&NAME"

   Special -element Operations
       Parent Index   "+"
       Object Name    "?"
       Object Value   "~"
       XML Subtree    "*"
       Children       "$"
       Attributes     "@"
       ASN.1 Record   "."
       JSON Record    "%"

   Numeric Processing
       -num element
              Count.

       -len element
              Length.

       -sum element
              Sum.

       -acc element
              Accumulator.

       -min element
              Minimum.

       -max element
              Maximum.

       -inc element
              Increment.

       -dec element
              Decrement.

       -sub element
              Difference.

       -avg element
              Average.

       -dev element
              Deviation.

       -med element
              Median.

       -mul element
              Product.

       -div element
              Quotient.

       -mod element
              Remainder.

       -bin element
              Binary.

       -oct element
              Octal.

       -hex element
              Hexadecimal.

       -bit element
              Bit count.

       -pad element
              Zero-pad to eight digits.

   Character Processing
       -encode element
              XML-encode <, >, &, ", and ' characters.

       -upper element
              Convert text to uppercase.

       -lower element
              Convert text to lowercase.

       -chain element
              Change spaces to underscores.

       -title element
              Capitalize initial letters of words.

       -mirror element
              Reverse order of letters.

       -alnum element
              Non-alphanumeric characters to space.

   String Processing
       -basic element
              Convert superscripts and subscripts.

       -plain element
              Remove embedded mixed-content markup tags.

       -simple element
              Normalize accented letters; spell Greek letters.

       -author element
              Multi-step author cleanup.

       -prose element
              Text conversion to ASCII.

   Text Processing
       -terms element
              Partition text at spaces.

       -words element
              Split at punctuation marks.

       -pairs element
              Adjacent informative words.

       -order element
              Rearrange words in sorted order.

       -reverse element
              Reverse words in string.

       -letters element
              Separate individual letters.

       -clauses element
              Break at phrase separators.

   Citation Functions
       -year element
              Extract first 4-digit year from string.

       -month element
              Match first month name and return a corresponding integer.

       -date element
              YYYY/MM/DD from -unit "PubDate" -date "*"

       -page element
              Get digits (and letters) of first page number.

       -auth element
              Change GenBank authors to Medline form.

       -initials element
              Parse initials from forename or given name.

       -jour element
              Clean up journal name punctuation.

       -trim element
              Remove extra spaces and leading zeros.

       -wct element
              Count number of -words in a string.

       -doi element
              Add https://doi.org/ prefix, URL encode.

   Value Transformation
       -translate element
              Substitute values with -transform table.

       -classify element
              Substring word or phrase matches to -aliases table.

   Regular Expression
       -replace
              Substitute text using regular expressions.
              -reg target    Target expression.
              -exp pattern   Replacement pattern.

   Sequence Processing
       -revcomp
              Reverse complement nucleotide sequence.

       -nucleic
              Subrange determines forward or revcomp.

       -fasta Split sequence into blocks of 70 uppercase letters.

       -ncbi2na
              Expand ncbi2na to IUPAC.  (May need to truncate result to actual sequence length.)

       -ncbi4na
              Expand ncbi4na to IUPAC.  (May need to truncate result to actual sequence length.)

       -molwt Calculate molecular weight of peptide.

   Sequence Coordinates
       -0-based element
              Zero-based.

       -1-based element
              One-based.

       -ucsc-based element
              Half-open.

   Command Generator
       -insd arg ...
              Generate  INSDSeq  extraction commands.  Print them if invoked standalone; run them
              if invoked as part of a pipeline.  Requires one or more arguments, which may appear
              in the following order:

              Descriptor(s)  INSDSeq_sequence/INSDSeq_definition/INSDSeq_division/... [...]

              Completeness   complete/partial

              Feature(s)     CDS/mRNA/...[,...]

              Qualifier(s)   INSDFeature_key/"#INSDInterval"/gene/product/
                             feat_location/sub_sequence/... [...]

   Frequency Table
       -histogram
              Collects data for sort-uniq-count(1) on entire set of records.

   Entrez Indexing
       -e2index [extras]
              Create Entrez index XML.  extras  (true  or  false;  false  by  default)  indicates
              whether to index extra fields.

       -indices element
              Index normalized words.

       -article element
              Title positional index.

       -abstract element
              Abstract positional index.

       -paragraph element
              Index text paragraphs.

       -stemmed element
              Apply Porter2 algorithm.

   Output Organization
       -head str
              Print before everything else.

       -tail str
              Print after everything else.

       -hd str
              Print before each record.

       -tl str
              Print after each record.

   Record Selection
       -select condition
              Select record subset by conditions.

       -in filename
              File of identifiers to use for selection.

   Record Rearrangement
       -sort[-fwd] element
              Element to use as sort key.

       -sort-rev element
              Sort records in reverse order.

   Reformatting
       -format fmt
              copy     Fast block copy (still applies processing flags).
              compact  Compress runs of spaces.
              flush    Suppress line indentation.
              indent   Indent according to nesting depth.
              expand   Place each attribute on a separate line.

   Validation
       -verify
              Report XML data integrity problems.

   Summary
       -outline
              Display outline of XML structure.

       -synopsis
              Display individual XML paths.

       -contour [delimiter]
              Display XML paths to leaf nodes (delimited by / by default).

   Documentation
       -help  Print usage information and some example argument combinations.

       -examples
              Complete usage examples, involving additional Entrez Direct tools.

       -unix  Illustrate common Unix command arguments.

       -version
              Print version number.

NOTES

       String constraints use case-insensitive comparisons.

       Numeric constraints and selection arguments use integer values.

       -num and -len selections are synonyms for Object Count (#) and Item Length (%).

       -words, -pairs, and -indices convert to lower case.

SEE ALSO

       archive-pmc(1),     archive-pubmed(1),    custom-index(1),    disambiguate-nucleotides(1),
       download-ncbi-data(1),    ds2pme(1),    esample(1),     fetch-pmc(1),     fetch-pubmed(1),
       find-in-gene(1),    fuse-segments(1),    gene2range(1),   hgvs2spdi(1),   index-extras(1),
       index-pubmed(1),  pma2pme(1),  rchive(1),  snp2hgvs(1),  snp2tbl(1),   sort-uniq-count(1),
       spdi2tbl(1), tbl2prod(1), transmute(1), uniq-table(1), xml2fsa(1), xml2tbl(1), xy-plot(1).