bionic (1) xtract.1.gz

Provided by: ncbi-entrez-direct_7.40.20170928+ds-1_amd64 bug

NAME

       xtract - convert XML into a table of data values

SYNOPSIS

       xtract   [-help]   [-strict]   [-mixed]   [-accent]   [-ascii]  [-compress]  [-spaces]  [-input filename]
       [-pattern expr]      [-group expr]       [-block expr]       [-subset expr]       [-if expr [constraint]]
       [-unless expr [constraint]]   [-and condition]   [-or condition]  [-else]  [-position pos]  [-equals str]
       [-contains str] [-starts-with str] [-ends-with str] [-is-not str] [-gt N] [-ge N] [-lt N] [-le N] [-eq N]
       [-ne N]  [-ret str]  [-tab str]  [-sep str]  [-pfx str]  [-sfx str]  [-clr]  [-pfc str] [-rst] [-def str]
       [-lbl str] [-element element]  [-first element]  [-last element]  [-NAME]  [-num element]  [-len element]
       [-sum element]  [-min element] [-max element] [-inc element] [-dec element] [-sub element] [-avg element]
       [-dev element]  [-encode element]  [-upper element]  [-lower element]  [-title element]  [-terms element]
       [-words element]  [-pairs element] [-letters element] [-indices element] [-phrase str] [-0-based element]
       [-1-based element] [-ucsc-based element]  [-insd arg ...]  [-head str]  [-tail str]  [-hd str]  [-tl str]
       [-format fmt]  [-filter element  action target]  [-verify]  [-outline]  [-synopsis]  [-archive directory]
       [-index element]  [-flag strict|mixed|none]  [-gzip]  [-hash]  [-skip filename]   [-examples]   [-extras]
       [-version]

DESCRIPTION

       xtract converts an XML document into a table of data values according to user-specified rules.

OPTIONS

   Processing Flags
       -strict
              Remove HTML highlight tags.

       -mixed Allow PubMed mixed content.

       -accent
              Delete Unicode accents.

       -ascii Convert Unicode to numeric character references.

       -compress
              Compress runs of spaces.

       -spaces
              Fix non-ASCII spaces.

       -input filename
              Read XML from file instead of standard input.

   Exploration Argument Hierarchy
       -pattern expr
       -group expr
       -block expr
       -subset expr
              Name  of record within set.  Use of different argument names allows command-line control of nested
              looping.

   Exploration Constructs
       Object         DateCreated
       Parent/Child   Book/AuthorList
       Heterogeneous  "PubmedArticleSet/*"
       Nested         "*/Taxon"
       Recursive      "**/Gene-commentary"

   Conditional Execution
       -if expr [constraint]
              Element (or @attribute) must exist and satisfy any specified constraint.

       -unless expr [constraint]
              Skip if element matches.

       -and condition
              Preceding and following tests must both pass.

       -or condition
              Any passing test suffices.

       -else  Execute if conditional test failed.

       -position pos
              Must be at first/last location in list.

   String Constraints
       -equals str
              String must match exactly.

       -contains str
              Substring must be present.

       -starts-with str
              Substring must be at beginning.

       -ends-with str
              Substring must be at end.

       -is-not str
              String must not match.

   Numeric Constraints
       -gt N  Greater than.

       -ge N  Greater than or equal to.

       -lt N  Less than to.

       -le N  Less than or equal to.

       -eq N  Equal to.

       -ne N  Not equal to.

   Format Customization
       -ret str
              Override line break between patterns.

       -tab str
              Replace tab character between fields.

       -sep str
              Separator between group members.

       -pfx str
              Prefix to print before group.

       -sfx str
              Suffix to print after group.

       -clr   Clear queued tab separator.

       -pfc str
              Preface combines -clr and -pfx.

       -rst   Reset -sep, -pfx, and -sfx.

       -def str
              Default placeholder for missing fields.

       -lbl str
              Insert arbitrary text.

   Element Selection
       -element element
              Print all items that match tag name.

       -first element
              Only print value of first item.

       -last element
              Only print value of last item.

       -NAME  Record value in named variable.

   -element Constructs
       Tag            Caption
       Group          Initials,LastName
       Attribute      DescriptorName@MajorTopicYN
       Recursive      "**/Gene-commentary_accession"
       Object Count   "#Author"
       Item Length    "%Title"
       Element Depth  "^PMID"
       Variable       "&NAME"

   Special -element Operations
       Parent Index   "+"
       XML Subtree    "*"
       Children       "$"
       Attributes     "@"

   Numeric Processing
       -num element
              Count.

       -len element
              Length.

       -sum element
              Sum.

       -min element
              Minimum.

       -max element
              Maximum.

       -inc element
              Increment.

       -dec element
              Decrement.

       -sub element
              Difference.

       -avg element
              Average.

       -dev element
              Deviation.

   String Processing
       -encode element
              URL-encode <, >, &, ", and ' characters.

       -upper element
              Convert text to uppercase.

       -lower element
              Convert text to lowercase.

       -title element
              Capitalize initial letters of words.

   Phrase Processing
       -terms element
              Partition phrase at spaces.

       -words element
              Split at punctuation marks.

       -pairs element
              Adjacent informative words.

       -letters element
              Separate individual letters.

       -indices element
              Experimental index generation.

   Phrase Filtering
       -phrase str
              Keep records that contain a given phrase.

   Sequence Coordinates
       -0-based element
              Zero-based.

       -1-based element
              One-based.

       -ucsc-based element
              Half-open.

   Command Generator
       -insd arg ...
              Generate INSDSeq extraction commands.  Print them if invoked standalone; run them  if  invoked  as
              part of a pipeline.  Requires one or more arguments, which may appear in the following order:

              Descriptor(s)  INSDSeq_sequence/INSDSeq_definition/INSDSeq_division/... [...]

              Completeness   complete/partial

              Feature(s)     CDS/mRNA/...[,...]

              Qualifier(s)   INSDFeature_key/"#INSDInterval"/gene/product/... [...]

   Miscellaneous
       -head str
              Print before everything else.

       -tail str
              Print after everything else.

       -hd str
              Print before each record.

       -tl str
              Print after each record.

   Reformatting
       -format fmt
              copy     Fast block copy (still applies processing flags).
              compact  Compress runs of spaces.
              flush    Suppress line indentation.
              indent   Indent according to nesting depth.
              expand   Place each attribute on a separate line.

   Modification
       -filter element action target
              Actions:
              retain      Keep matching elements (no-op).
              remove      Remove matching elements.
              encode      HTML-escape special characters.
              decode      Decode HTML escapes.
              shrink      Compress runs of spaces.
              expand      Place each attribute on a separate line.
              accent      Strip off Unicode accents.

              Targets:
              content     Plain-text content.
              cdata       CDATA blocks.
              comment     Comments.
              object      The whole object.
              attributes  Attributes.
              container   Start and end tags.

   Summary
       -outline
              Display outline of XML structure.

       -synopsis
              Display count of unique XML paths.

   Local Record Indexing
       -archive directory
              Base path for individual XML files.

       -index element
              Name of element to use for identifier.

       -flag strict|mixed|none

       -gzip  Use compression for local XML files.

       -hash  Print UIDs and checksum values to standard output.

       -skip filename
              File of UIDs to skip.

   Documentation
       -help  Print usage information and some example argument combinations.

       -examples
              Complete examples of edirect(1) and xtract usage.

       -extras
              Batch  and  local  processing  examples,  and a summary of specialized options the main -help text
              doesn't cover.

       -version
              Print version number.

NOTES

       String constraints use case-insensitive comparisons.

       Numeric constraints and selection arguments use integer values.

       -num and -len selections are synonyms for Object Count (#) and Item Length (%).

       -words, -pairs, and -indices convert to lower case.

SEE ALSO

       edirect(1), xy-plot(1).