Provided by: trurl_0.8-1_amd64 bug

NAME

       trurl - transpose URLs

SYNOPSIS

       trurl [options / URLs]

DESCRIPTION

       trurl parses, manipulates and outputs URLs and parts of URLs.

       It  uses  the RFC 3986 definition of URLs and it uses libcurl's URL parser to do so, which
       includes a few "extensions". The URL support is limited to "hierarchical" URLs,  the  ones
       that use "://" separators after the scheme.

       Typically  you  pass in one or more URLs and decide what of that you want output. Possibly
       modifying the URL as well.

       trurl  knows  URLs  and  every  URL  consists  of  up  to  ten  separate  and  independent
       "components".  These  components can be extracted, removed and updated with trurl and they
       are referred to by their respective names: scheme, user, password,  options,  host,  port,
       path, query, fragment and zoneid.

OPTIONS

       Options start with one or two dashes. Many of the options require an additional value next
       to them.

       Any other argument is interpreted as a URL argument, and is treated as if it was following
       a --url option.

       The  first  argument  that  is  exactly  two  dashes ("--"), marks the end of options; any
       argument after the end of options is interpreted as a URL argument even if it starts  with
       a dash.

       -a, --append [component]=[data]
              Append  data  to  a  component. This can only append data to the path and the query
              components.

              For path, this URL encodes and appends the new segment to the path, separated  with
              a slash.

              For  query,  this  URL  encodes and appends the new segment to the query, separated
              with an ampersand (&). If the appended segment contains an equal  sign  ('=')  that
              one  will  be  kept  verbatim  and  both  sides of the first occurrence will be URL
              encoded separately.

       --accept-space
              When set, trurl will try to accept spaces as part of the URL and instead URL encode
              such occurrences accordingly.

              According  to  RFC  3986,  a  space  cannot  legally  be part of a URL. This option
              provides a best-effort to convert the provided string into a valid URL.

       --default-port
              When set, trurl will use the scheme's default port number for  URLs  with  a  known
              scheme, and without an explicit port number.

              Note  that trurl only knows default port numbers for URL schemes that are supported
              by libcurl.

              Since, by default, trurl removes default  port  numbers  from  URLs  with  a  known
              scheme, this option is pretty much ignored unless one of --get, --json, and --keep-
              port is not also specified.

       -f, --url-file [file name]
              Read URLs to work on from the given file. Use the file name "-" (a single minus) to
              tell trurl to read the URLs from stdin.

              Each  line  needs  to  be a single valid URL. trurl will remove one carriage return
              character at the end of the line if present, trim off all the  trailing  space  and
              tab characters, and skip all empty (after trimming) lines.

              The  maximum  line  length  supported in a file like this is 4094 bytes. Lines that
              exceed that length are skipped, and a warning is printed to stderr  when  they  are
              encountered.

       -g, --get [format]
              Output  text  and URL data according to the provided format string. Components from
              the URL can be output when specified as {component} or [component], with  the  name
              of  the  part  show  within  curly  braces  or brackets. You can not mix braces and
              brackets for this purpose in the same command line.

              The following component names are available (case sensitive):  url,  scheme,  user,
              password, options, host, port, path, query, fragment and zoneid.

              {component} will expand to nothing if the given component does not have a value.

              Components  are  shown  URL  decoded by default. If you instead write the component
              prefixed with a colon like "{:path}", it gets output URL encoded.

              You may also prefix components with default: and/or puny:, in any order.

              If default: is specified, like "{default:url}" or "{default:port}", and the port is
              not explicitly specified in the URL, the scheme's default port will be output if it
              is known.

              If puny: is specified, like "{puny:url}" or "{puny:host}", the "punycoded"  version
              of the host name will be used in the output.

              If  --default-port is specified, all formats are expanded as if they used default:;
              and if --punycode is used, all formats are expanded as if  they  used  puny:.  Also
              note that "{url}" is affected by the --keep-port option.

              Hosts provided as IPv6 numerical addresses will be provided within square brackets.
              Like "[fe80::20c:29ff:fe9c:409b]".

              Hosts provided as IPv4 numerical addresses will be  "normalized"  and  provided  as
              four dot-separated decimal numbers when output.

              You can access specific keys in the query string using the format {query:key}. Then
              the value of the first matching key will be output using a  case  sensitive  match.
              When  extracting  a  URL  decoded  query  key that contains %00, such octet will be
              replaced with a single period '.' in the output.

              You can access specific keys in the query string  and  out  all  values  using  the
              format  {query-all:key}.  This looks for 'key' case sensitively and will output all
              values for that key space-separated.

              The "format" string supports the following backslash sequences:

              \\ - backslash

              \t - tab

              \n - newline

              \r - carriage return

              \{ - an open curly brace that does not start a variable

              \[ - an open bracket that does not start a variable

              All other text in the format string will be shown as-is.

       -h, --help
              Show the help output.

       --iterate [component]=[item1 item2 ...]
              Set the component to multiple values and output the result once for each iteration.
              Several  combined  iterations  are  allowed  to generate combinations, but only one
              --iterate option per  component.  The  listed  items  to  iterate  over  should  be
              separated by single spaces.

       --json Outputs  all  set components of the URLs as JSON objects. All components of the URL
              that have data will get populated in the parts object using their component  names.
              See below for details on the format.

       --keep-port
              By  default,  trurl removes default port numbers from URLs with a known scheme even
              if they are explicitly specified in the input URL. This options,  makes  trurl  not
              remove them.

       --no-guess-scheme
              Disables  libcurl's scheme guessing feature. URLs that do not contain a scheme will
              be treated as invalid URLs.

       --punycode
              Uses the "punycoded" version of the host name, which is  how  International  Domain
              Names  are  converted  into  plain  ASCII.  If  the host name is not using IDN, the
              regular ASCII name is used.

       --query-separator [what]
              Specify the single letter used for separating query pairs. The default is  "&"  but
              at least in the past sometimes semicolons ";" or even colons ":" have been used for
              this purpose. If your URL uses something other than the default letter, setting the
              right one makes sure trurl can do its query operations properly.

       --redirect [URL]
              Redirect  the  URL  to this new location.  The redirection is performed on the base
              URL, so, if no base URL is specified, no redirection will be performed.

       -s, --set [component][:]=[data]
              Set this URL component. Setting blank string ("") will clear the component from the
              URL.

              The  following  components  can be set: url, scheme, user, password, options, host,
              port, path, query, fragment and zoneid.

              If a simple "="-assignment is used, the data is URL encoded when applied.  If  ":="
              is used, the data is assumed to already be URL encoded and will be stored as-is.

              If  no URL or --url-file argument is provided, trurl will try to create a URL using
              the components provided  by  the  --set  options.  If  not  enough  components  are
              specified, this will fail.

       --sort-query
              The  "variable=content"  tuplets  in  the  query  component  are  sorted  in a case
              insensitive alphabetical order. This helps making  URLs  identical  that  otherwise
              only had their query pairs in different orders.

       --url [URL]
              Set  the  input  URL  to work with. The URL may be provided without a scheme, which
              then typically is not actually a legal URL but trurl will try to figure out what is
              meant and guess what scheme to use (unless --no-guess-scheme is used).

              Providing multiple URLs will make trurl act on all URLs in a serial fashion.

              If  the  URL cannot be parsed for whatever reason, trurl will simply move on to the
              next provided URL - unless --verify is used.

       --urlencode
              Outputs URL encoded version of components by default when using --get or --json.

       --trim [component]=[what]
              Trims data off a component. Currently this can only trim a query component.

              "what" is specified as a full word or as a word prefix  (using  a  single  trailing
              asterisk  ('*'))  which  makes  trurl  remove the tuples from the query string that
              match the instruction.

       -v, --version
              Show version information and exit.

       --verify
              When a URL is provided, return error immediately if it does not parse  as  a  valid
              URL. In normal cases, trurl can forgive a bad URL input.

JSON output format

       The --json option outputs a JSON array with one or more objects. One for each URL.

       Each  URL  JSON  object  contains a number of properties, a series of key/value pairs. The
       exact set depends on the given URL.

       url    This key exists in every object. It is the complete  URL.  Affected  by  --default-
              port, --keep-port, and --punycode.

       parts  This  key exists in every object, and contains an object with a key for each of the
              settable URL components. If a component is missing, it means it is not  present  in
              the URL. The parts are URL decoded unless --urlencode is used.

              scheme The URL scheme.

              user   The user name.

              password
                     The password.

              options
                     The  options.  Note  that  only  a  few  URL  schemes  support the "options"
                     component.

              host   The and normalized host name. It might be a UTF-8 name if an  IDN  name  was
                     used.   It  can  also  be a normalized IPv4 or IPv6 address. An IPv6 address
                     always starts with a bracket ([) - and no other host names can contain  such
                     a  symbol.  If  --punycode  is  used,  the  punycode  version of the host is
                     outputted instead.

              port   The provided port number as a string. If the port number was not provided in
                     the  URL,  but  the scheme is a known one, and --default-port is in use, the
                     default port for that scheme will be provided here.

              path   The path. Including the leading slash.

              query  The full query, excluding the question mark separator.

              fragment
                     The fragment, excluding the pound sign separator.

              zoneid The zone id, which can only be present in an IPv6 address. When this key  is
                     present, then host is an IPv6 numerical address.

       params This  key  contains  an  array of query key/value objects. Each such pair is listed
              with "key" and "value" and their respective contents in the output.

              The key/values are extracted from the query where they are separated by  ampersands
              (&) - or the user sets with --query-separator.

              The query pairs are listed in the order of appearance in a left-to-right order, but
              can be made alpha-sorted with --sort-query.

              It is only present if the URL has a query.

EXAMPLES

       Replace the host name of a URL
              $ trurl --url https://curl.se --set host=example.com
              https://example.com/

       Create a URL by setting components
               $ trurl --set host=example.com --set scheme=ftp
               ftp://example.com/

       Redirect a URL
              $ trurl --url https://curl.se/we/are.html --redirect here.html
              https://curl.se/we/here.html

       Change port number
              This also shows how trurl will remove dot-dot sequences
              $ trurl --url https://curl.se/we/../are.html --set port=8080
              https://curl.se:8080/are.html

       Extract the path from a URL
              $ trurl --url https://curl.se/we/are.html --get '{path}'
              /we/are.html

       Extract the port from a URL
              This gets the default port based on the scheme if the port is not set in the URL.
              $ trurl --url https://curl.se/we/are.html --get '{default:port}'
              443

       Append a path segment to a URL
              $ trurl --url https://curl.se/hello --append path=you
              https://curl.se/hello/you

       Append a query segment to a URL
              $ trurl --url "https://curl.se?name=hello" --append query=search=string
               https://curl.se/?name=hello&search=string

       Read URLs from stdin
              $ cat urllist.txt | trurl --url-file -
              ...

       Output JSON
              $ trurl "https://fake.host/search?q=answers&user=me#frag" --json
              [
                {
                  "url": "https://fake.host/search?q=answers&user=me#frag",
                  "parts": [
                      "scheme": "https",
                      "host": "fake.host",
                      "path": "/search",
                      "query": "q=answers&user=me"
                      "fragment": "frag",
                  ],
                  "params": [
                    {
                      "key": "q",
                      "value": "answers"
                    },
                    {
                      "key": "user",
                      "value": "me"
                    }
                  ]
                }
              ]

       Remove tracking tuples from query
              $ trurl "https://curl.se?search=hey&utm_source=tracker" --trim query="utm_*"
              https://curl.se/?search=hey

       Show a specific query key value
              $ trurl "https://example.com?a=home&here=now&thisthen" -g '{query:a}'
              home

       Sort the key/value pairs in the query component
              $ trurl "https://example.com?b=a&c=b&a=c" --sort-query
              https://example.com?a=c&b=a&c=b

       Work with a query that uses a semicolon separator
              $ trurl "https://curl.se?search=fool;page=5" --trim query="search" --query-separator ";"
              https://curl.se?page=5

       Accept spaces in the URL path
              $ trurl "https://curl.se/this has space/index.html" --accept-space
              https://curl.se/this%20has%20space/index.html

       Create multiple variations of a URL with different schemes
              $ trurl "https://curl.se/path/index.html" --iterate "scheme=http ftp sftp"
              http://curl.se/path/index.html
              ftp://curl.se/path/index.html
              sftp://curl.se/path/index.html

WWW

       https://curl.se/trurl

SEE ALSO

       curl_url_set(3) curl_url_get(3)