Provided by: miller_6.0.0-1ubuntu0.1_amd64
NAME
Miller -- like awk, sed, cut, join, and sort for name-indexed data such as CSV and tabular JSON.
SYNOPSIS
Usage: mlr [flags] {verb} [verb-dependent options ...] {zero or more file names} If zero file names are provided, standard input is read, e.g. mlr --csv sort -f shape example.csv Output of one verb may be chained as input to another using "then", e.g. mlr --csv stats1 -a min,mean,max -f quantity then sort -f color example.csv Please see 'mlr help topics' for more information. Please also see https://miller.readthedocs.io
DESCRIPTION
Miller operates on key-value-pair data while the familiar Unix tools operate on integer- indexed fields: if the natural data structure for the latter is the array, then Miller's natural data structure is the insertion-ordered hash map. This encompasses a variety of data formats, including but not limited to the familiar CSV, TSV, and JSON. (Miller can handle positionally-indexed data as a special case.) This manpage documents mlr 6.0.0.
EXAMPLES
mlr --icsv --opprint cat example.csv mlr --icsv --opprint sort -f shape example.csv mlr --icsv --opprint sort -f shape -nr index example.csv mlr --icsv --opprint cut -f flag,shape example.csv mlr --csv filter '$color == "red"' example.csv mlr --icsv --ojson put '$ratio = $quantity / $rate' example.csv mlr --icsv --opprint --from example.csv sort -nr index then cut -f shape,quantity
FILE FORMATS
CSV/CSV-lite: comma-separated values with separate header line TSV: same but with tabs in places of commas +---------------------+ | apple,bat,cog | | 1,2,3 | Record 1: "apple":"1", "bat":"2", "cog":"3" | 4,5,6 | Record 2: "apple":"4", "bat":"5", "cog":"6" +---------------------+ JSON (array of objects): +---------------------+ | [ | | { | | "apple": 1, | Record 1: "apple":"1", "bat":"2", "cog":"3" | "bat": 2, | | "cog": 3 | | }, | | { | | "dish": { | Record 2: "dish.egg":"7", | "egg": 7, | "dish.flint":"8", "garlic":"" | "flint": 8 | | }, | | "garlic": "" | | } | | ] | +---------------------+ JSON Lines (sequence of one-line objects): +------------------------------------------------+ | {"apple": 1, "bat": 2, "cog": 3} | | {"dish": {"egg": 7, "flint": 8}, "garlic": ""} | +------------------------------------------------+ Record 1: "apple":"1", "bat":"2", "cog":"3" Record 2: "dish:egg":"7", "dish:flint":"8", "garlic":"" PPRINT: pretty-printed tabular +---------------------+ | apple bat cog | | 1 2 3 | Record 1: "apple:"1", "bat":"2", "cog":"3" | 4 5 6 | Record 2: "apple":"4", "bat":"5", "cog":"6" +---------------------+ Markdown tabular (supported for output only): +-----------------------+ | | apple | bat | cog | | | | --- | --- | --- | | | | 1 | 2 | 3 | | Record 1: "apple:"1", "bat":"2", "cog":"3" | | 4 | 5 | 6 | | Record 2: "apple":"4", "bat":"5", "cog":"6" +-----------------------+ XTAB: pretty-printed transposed tabular +---------------------+ | apple 1 | Record 1: "apple":"1", "bat":"2", "cog":"3" | bat 2 | | cog 3 | | | | dish 7 | Record 2: "dish":"7", "egg":"8" | egg 8 | +---------------------+ DKVP: delimited key-value pairs (Miller default format) +---------------------+ | apple=1,bat=2,cog=3 | Record 1: "apple":"1", "bat":"2", "cog":"3" | dish=7,egg=8,flint | Record 2: "dish":"7", "egg":"8", "3":"flint" +---------------------+ NIDX: implicitly numerically indexed (Unix-toolkit style) +---------------------+ | the quick brown | Record 1: "1":"the", "2":"quick", "3":"brown" | fox jumped | Record 2: "1":"fox", "2":"jumped" +---------------------+
HELP OPTIONS
Type 'mlr help {topic}' for any of the following: Essentials: mlr help topics mlr help basic-examples mlr help file-formats Flags: mlr help flags mlr help list-separator-aliases mlr help list-separator-regex-aliases mlr help comments-in-data-flags mlr help compressed-data-flags mlr help csv-only-flags mlr help file-format-flags mlr help flatten-unflatten-flags mlr help format-conversion-keystroke-saver-flags mlr help legacy-flags mlr help miscellaneous-flags mlr help output-colorization-flags mlr help pprint-only-flags mlr help profiling-flags mlr help separator-flags Verbs: mlr help list-verbs mlr help usage-verbs mlr help verb Functions: mlr help list-functions mlr help list-function-classes mlr help list-functions-in-class mlr help usage-functions mlr help usage-functions-by-class mlr help function Keywords: mlr help list-keywords mlr help usage-keywords mlr help keyword Other: mlr help auxents mlr help mlrrc mlr help output-colorization mlr help type-arithmetic-info Shorthands: mlr -g = mlr help flags mlr -l = mlr help list-verbs mlr -L = mlr help usage-verbs mlr -f = mlr help list-functions mlr -F = mlr help usage-functions mlr -k = mlr help list-keywords mlr -K = mlr help usage-keywords Lastly, 'mlr help ...' will search for your exact text '...' using the sources of ’mlr help flag', 'mlr help verb', 'mlr help function', and 'mlr help keyword'. Use 'mlr help find ...' for approximate (substring) matches, e.g. 'mlr help find map' for all things with "map" in their names.
VERB LIST
altkv bar bootstrap cat check clean-whitespace count-distinct count count-similar cut decimate fill-down fill-empty filter flatten format-values fraction gap grep group-by group-like having-fields head histogram json-parse json-stringify join label least-frequent merge-fields most-frequent nest nothing put regularize remove-empty-columns rename reorder repeat reshape sample sec2gmtdate sec2gmt seqgen shuffle skip-trivial-records sort sort-within-records stats1 stats2 step tac tail tee template top unflatten uniq unsparsify
FUNCTION LIST
abs acos acosh any append apply arrayify asin asinh asserting_absent asserting_array asserting_bool asserting_boolean asserting_empty asserting_empty_map asserting_error asserting_float asserting_int asserting_map asserting_nonempty_map asserting_not_array asserting_not_empty asserting_not_map asserting_not_null asserting_null asserting_numeric asserting_present asserting_string atan atan2 atanh bitcount boolean capitalize cbrt ceil clean_whitespace collapse_whitespace cos cosh depth dhms2fsec dhms2sec erf erfc every exp expm1 flatten float floor fmtnum fold fsec2dhms fsec2hms get_keys get_values gmt2localtime gmt2sec gsub haskey hexfmt hms2fsec hms2sec hostname int invqnorm is_absent is_array is_bool is_boolean is_empty is_empty_map is_error is_float is_int is_map is_nonempty_map is_not_array is_not_empty is_not_map is_not_null is_null is_numeric is_present is_string joink joinkv joinv json_parse json_stringify leafcount length localtime2gmt localtime2sec log log10 log1p logifit lstrip madd mapdiff mapexcept mapselect mapsum max md5 mexp min mmul msub os pow qnorm reduce regextract regextract_or_else round roundm rstrip sec2dhms sec2gmt sec2gmtdate sec2hms sec2localdate sec2localtime select sgn sha1 sha256 sha512 sin sinh sort splita splitax splitkv splitkvx splitnv splitnvx sqrt ssub strftime strftime_local string strip strlen strptime strptime_local sub substr substr0 substr1 system systime systimeint tan tanh tolower toupper truncate typeof unflatten uptime urand urand32 urandelement urandint urandrange version ! != !=~ % & && * ** + - . .* .+ .- ./ / // < << <= <=> == =~ > >= >> >>> ?: ?? ??? ^ ^^ | || ~
COMMENTS-IN-DATA FLAGS
Miller lets you put comments in your data, such as # This is a comment for a CSV file a,b,c 1,2,3 4,5,6 Notes: * Comments are only honored at the start of a line. * In the absence of any of the below four options, comments are data like any other text. (The comments-in-data feature is opt-in.) * When `--pass-comments` is used, comment lines are written to standard output immediately upon being read; they are not part of the record stream. Results may be counterintuitive. A suggestion is to place comments at the start of data files. --pass-comments Immediately print commented lines (prefixed by `#`) within the input. --pass-comments-with {string} Immediately print commented lines within input, with specified prefix. --skip-comments Ignore commented lines (prefixed by `#`) within the input. --skip-comments-with {string} Ignore commented lines within input, with specified prefix.
COMPRESSED-DATA FLAGS
Miller offers a few different ways to handle reading data files which have been compressed. * Decompression done within the Miller process itself: `--bz2in` `--gzin` `--zin` * Decompression done outside the Miller process: `--prepipe` `--prepipex` Using `--prepipe` and `--prepipex` you can specify an action to be taken on each input file. The prepipe command must be able to read from standard input; it will be invoked with `{command} < {filename}`. The prepipex command must take a filename as argument; it will be invoked with `{command} {filename}`. Examples: mlr --prepipe gunzip mlr --prepipe zcat -cf mlr --prepipe xz -cd mlr --prepipe cat Note that this feature is quite general and is not limited to decompression utilities. You can use it to apply per-file filters of your choice. For output compression (or other) utilities, simply pipe the output: `mlr ... | {your compression command} > outputfilenamegoeshere` Lastly, note that if `--prepipe` or `--prepipex` is specified, it replaces any decisions that might have been made based on the file suffix. Likewise, `--gzin`/`--bz2in`/`--zin` are ignored if `--prepipe` is also specified. --bz2in Uncompress bzip2 within the Miller process. Done by default if file ends in `.bz2`. --gzin Uncompress gzip within the Miller process. Done by default if file ends in `.gz`. --prepipe {decompression command} You can, of course, already do without this for single input files, e.g. `gunzip < myfile.csv.gz | mlr ...`. Allowed at the command line, but not in `.mlrrc` to avoid unexpected code execution. --prepipe-bz2 Same as `--prepipe bz2`, except this is allowed in `.mlrrc`. --prepipe-gunzip Same as `--prepipe gunzip`, except this is allowed in `.mlrrc`. --prepipe-zcat Same as `--prepipe zcat`, except this is allowed in `.mlrrc`. --prepipex {decompression command} Like `--prepipe` with one exception: doesn't insert `<` between command and filename at runtime. Useful for some commands like `unzip -qc` which don't read standard input. Allowed at the command line, but not in `.mlrrc` to avoid unexpected code execution. --zin Uncompress zlib within the Miller process. Done by default if file ends in `.z`.
CSV-ONLY FLAGS
These are flags which are applicable to CSV format. --allow-ragged-csv-input or --ragged If a data line has fewer fields than the header line, fill remaining keys with empty string. If a data line has more fields than the header line, use integer field labels as in the implicit-header case. --headerless-csv-output or --ho Print only CSV data lines; do not print CSV header lines. --implicit-csv-header or --headerless-csv-input or --hi Use 1,2,3,... as field labels, rather than from line 1 of input files. Tip: combine with `label` to recreate missing headers. --no-implicit-csv-header Opposite of `--implicit-csv-header`. This is the default anyway -- the main use is for the flags to `mlr join` if you have main file(s) which are headerless but you want to join in on a file which does have a CSV header. Then you could use `mlr --csv --implicit-csv-header join --no-implicit-csv-header -l your-join-in-with-header.csv ... your-headerless.csv`. -N Keystroke-saver for `--implicit-csv-header --headerless-csv-output`.
FILE-FORMAT FLAGS
See the File formats doc page, and or `mlr help file-formats`, for more about file formats Miller supports. Examples: `--csv` for CSV-formatted input and output; `--icsv --opprint` for CSV-formatted input and pretty-printed output. Please use `--iformat1 --oformat2` rather than `--format1 --oformat2`. The latter sets up input and output flags for `format1`, not all of which are overridden in all cases by setting output format to `format2`. --asv or --asvlite Use ASV format for input and output data. --csv or -c Use CSV format for input and output data. --csvlite Use CSV-lite format for input and output data. --dkvp Use DKVP format for input and output data. --gen-field-name Specify field name for --igen. Defaults to "i". --gen-start Specify start value for --igen. Defaults to 1. --gen-step Specify step value for --igen. Defaults to 1. --gen-stop Specify stop value for --igen. Defaults to 100. --iasv or --iasvlite Use ASV format for input data. --icsv Use CSV format for input data. --icsvlite Use CSV-lite format for input data. --idkvp Use DKVP format for input data. --igen Ignore input files and instead generate sequential numeric input using --gen-field-name, --gen-start, --gen-step, and --gen-stop values. See also the seqgen verb, which is more useful/intuitive. --ijson Use JSON format for input data. --ijsonl Use JSON Lines format for input data. --inidx Use NIDX format for input data. --io {format name} Use format name for input and output data. For example: `--io csv` is the same as `--csv`. --ipprint Use PPRINT format for input data. --itsv Use TSV format for input data. --itsvlite Use TSV-lite format for input data. --iusv or --iusvlite Use USV format for input data. --ixtab Use XTAB format for input data. --json or -j Use JSON format for input and output data. --jsonl Use JSON Lines format for input and output data. --nidx Use NIDX format for input and output data. --oasv or --oasvlite Use ASV format for output data. --ocsv Use CSV format for output data. --ocsvlite Use CSV-lite format for output data. --odkvp Use DKVP format for output data. --ojson Use JSON format for output data. --ojsonl Use JSON Lines format for output data. --omd Use markdown-tabular format for output data. --onidx Use NIDX format for output data. --opprint Use PPRINT format for output data. --otsv Use TSV format for output data. --otsvlite Use TSV-lite format for output data. --ousv or --ousvlite Use USV format for output data. --oxtab Use XTAB format for output data. --pprint Use PPRINT format for input and output data. --tsv Use TSV format for input and output data. --tsvlite or -t Use TSV-lite format for input and output data. --usv or --usvlite Use USV format for input and output data. --xtab Use XTAB format for input and output data. -i {format name} Use format name for input data. For example: `-i csv` is the same as `--icsv`. -o {format name} Use format name for output data. For example: `-o csv` is the same as `--ocsv`.
FLATTEN-UNFLATTEN FLAGS
These flags control how Miller converts record values which are maps or arrays, when input is JSON and output is non-JSON (flattening) or input is non-JSON and output is JSON (unflattening). See the Flatten/unflatten doc page for more information. --flatsep or --jflatsep {string} Separator for flattening multi-level JSON keys, e.g. `{"a":{"b":3}}` becomes `a:b => 3` for non-JSON formats. Defaults to `.`. --no-auto-flatten When output is non-JSON, suppress the default auto-flatten behavior. Default: if `$y = [7,8,9]` then this flattens to `y.1=7,y.2=8,y.3=9, and similarly for maps. With `--no-auto-flatten`, instead we get `$y=[1, 2, 3]`. --no-auto-unflatten When input non-JSON and output is JSON, suppress the default auto-unflatten behavior. Default: if the input has `y.1=7,y.2=8,y.3=9` then this unflattens to `$y=[7,8,9]`. flattens to `y.1=7,y.2=8,y.3=9. With `--no-auto-flatten`, instead we get `${y.1}=7,${y.2}=8,${y.3}=9`. --xvright Right-justify values for XTAB format.
FORMAT-CONVERSION KEYSTROKE-SAVER FLAGS
As keystroke-savers for format-conversion you may use the following. The letters c, t, j, l, d, n, x, p, and m refer to formats CSV, TSV, DKVP, NIDX, JSON, JSON Lines, XTAB, PPRINT, and markdown, respectively. Note that markdown format is available for output only. | In\out | CSV | TSV | JSON | JSONL | DKVP | NIDX | XTAB | PPRINT | Markdown | +--------+-------+-------+--------+--------+--------+--------+--------+----------+ | CSV | | --c2t | --c2j | --c2l | --c2d | --c2n | --c2x | --c2p | --c2m | | TSV | --t2c | | --t2j | --t2l | --t2d | --t2n | --t2x | --t2p | --t2m | | JSON | --j2c | --j2t | | --j2l | --j2d | --j2n | --j2x | --j2p | --j2m | | JSONL | --l2c | --l2t | | | --l2d | --l2n | --l2x | --l2p | --l2m | | DKVP | --d2c | --d2t | --d2j | --d2l | | --d2n | --d2x | --d2p | --d2m | | NIDX | --n2c | --n2t | --n2j | --n2l | --n2d | | --n2x | --n2p | --n2m | | XTAB | --x2c | --x2t | --x2j | --x2l | --x2d | --x2n | | --x2p | --x2m | | PPRINT | --p2c | --p2t | --p2j | --p2l | --p2d | --p2n | --p2x | | --p2m | -p Keystroke-saver for `--nidx --fs space --repifs`. -T Keystroke-saver for `--nidx --fs tab`.
LEGACY FLAGS
These are flags which don't do anything in the current Miller version. They are accepted as no-op flags in order to keep old scripts from breaking. --jknquoteint Type information from JSON input files is now preserved throughout the processing stream. --jlistwrap or --jl Wrap JSON output in outermost `[ ]`. This is the default for JSON output format. --jquoteall Type information from JSON input files is now preserved throughout the processing stream. --json-fatal-arrays-on-input Miller now supports arrays as of version 6. --json-map-arrays-on-input Miller now supports arrays as of version 6. --json-skip-arrays-on-input Miller now supports arrays as of version 6. --jsonx The `--jvstack` flag is now default true in Miller 6. --jvquoteall Type information from JSON input files is now preserved throughout the processing stream. --jvstack Put one key-value pair per line for JSON output (multi-line output). This is the default for JSON output format. --mmap Miller no longer uses memory-mapping to access data files. --no-jlistwrap Wrap JSON output in outermost `[ ]`. This is the default for JSON Lines output format. --no-jvstack Put objects/arrays all on one line for JSON output. This is the default for JSON Lines output format. --no-mmap Miller no longer uses memory-mapping to access data files. --ojsonx The `--jvstack` flag is now default true in Miller 6. --quote-all Ignored as of version 6. Types are inferred/retained through the processing flow now. --quote-minimal Ignored as of version 6. Types are inferred/retained through the processing flow now. --quote-none Ignored as of version 6. Types are inferred/retained through the processing flow now. --quote-numeric Ignored as of version 6. Types are inferred/retained through the processing flow now. --quote-original Ignored as of version 6. Types are inferred/retained through the processing flow now. --vflatsep Ignored as of version 6. This functionality is subsumed into JSON formatting.
MISCELLANEOUS FLAGS
These are flags which don't fit into any other category. --fflush Force buffered output to be written after every output record. The default is flush output after every record if the output is to the terminal, or less often if the output is to a file or a pipe. The default is a significant performance optimization for large files. Use this flag to force frequent updates even when output is to a pipe or file, at a performance cost. --from {filename} Use this to specify an input file before the verb(s), rather than after. May be used more than once. Example: `mlr --from a.dat --from b.dat cat` is the same as `mlr cat a.dat b.dat`. --hash-records This is an internal parameter which normally does not need to be modified. It controls the mechanism by which Miller accesses fields within records. In general --no-hash-records is faster, and is the default. For specific use-cases involving data having many fields, and many of them being processed during a given processing run, --hash-records might offer a slight performance benefit. --infer-int-as-float or -A Cast all integers in data files to floats. --infer-none or -S Don't treat values like 123 or 456.7 in data files as int/float; leave them as strings. --infer-octal or -O Treat numbers like 0123 in data files as numeric; default is string. Note that 00--07 etc scan as int; 08-09 scan as float. --load {filename} Load DSL script file for all put/filter operations on the command line. If the name following `--load` is a directory, load all `*.mlr` files in that directory. This is just like `put -f` and `filter -f` except it's up-front on the command line, so you can do something like `alias mlr='mlr --load ~/myscripts'` if you like. --mfrom {filenames} Use this to specify one of more input files before the verb(s), rather than after. May be used more than once. The list of filename must end with `--`. This is useful for example since `--from *.csv` doesn't do what you might hope but `--mfrom *.csv --` does. --mload {filenames} Like `--load` but works with more than one filename, e.g. `--mload *.mlr --`. --no-dedupe-field-names By default, if an input record has a field named `x` and another also named `x`, the second will be renamed `x_2`, and so on. With this flag provided, the second `x`'s value will replace the first `x`'s value when the record is read. This flag has no effect on JSON input records, where duplicate keys always result in the last one's value being retained. --no-fflush Let buffered output not be written after every output record. The default is flush output after every record if the output is to the terminal, or less often if the output is to a file or a pipe. The default is a significant performance optimization for large files. Use this flag to allow less-frequent updates when output is to the terminal. This is unlikely to be a noticeable performance improvement, since direct-to-screen output for large files has its own overhead. --no-hash-records See --hash-records. --nr-progress-mod {m} With m a positive integer: print filename and record count to os.Stderr every m input records. --ofmt {format} E.g. `%.18f`, `%.0f`, `%9.6e`. Please use sprintf-style codes for floating-point numbers. If not specified, default formatting is used. See also the `fmtnum` function and the `format-values` verb. --records-per-batch {n} This is an internal parameter for maximum number of records in a batch size. Normally this does not need to be modified. --seed {n} with `n` of the form `12345678` or `0xcafefeed`. For `put`/`filter` `urand`, `urandint`, and `urand32`. --tz {timezone} Specify timezone, overriding `$TZ` environment variable (if any). -I Process files in-place. For each file name on the command line, output is written to a temp file in the same directory, which is then renamed over the original. Each file is processed in isolation: if the output format is CSV, CSV headers will be present in each output file, statistics are only over each file's own records; and so on. -n Process no input files, nor standard input either. Useful for `mlr put` with `begin`/`end` statements only. (Same as `--from /dev/null`.) Also useful in `mlr -n put -v '...'` for analyzing abstract syntax trees (if that's your thing).
OUTPUT-COLORIZATION FLAGS
Miller uses colors to highlight outputs. You can specify color preferences. Note: output colorization does not work on Windows. Things having colors: * Keys in CSV header lines, JSON keys, etc * Values in CSV data lines, JSON scalar values, etc in regression-test output * Some online-help strings Rules for coloring: * By default, colorize output only if writing to stdout and stdout is a TTY. * Example: color: `mlr --csv cat foo.csv` * Example: no color: `mlr --csv cat foo.csv > bar.csv` * Example: no color: `mlr --csv cat foo.csv | less` * The default colors were chosen since they look OK with white or black terminal background, and are differentiable with common varieties of human color vision. Mechanisms for coloring: * Miller uses ANSI escape sequences only. This does not work on Windows except within Cygwin. * Requires `TERM` environment variable to be set to non-empty string. * Doesn't try to check to see whether the terminal is capable of 256-color ANSI vs 16-color ANSI. Note that if colors are in the range 0..15 then 16-color ANSI escapes are used, so this is in the user's control. How you can control colorization: * Suppression/unsuppression: * Environment variable `export MLR_NO_COLOR=true` means don't color even if stdout+TTY. * Environment variable `export MLR_ALWAYS_COLOR=true` means do color even if not stdout+TTY. For example, you might want to use this when piping mlr output to `less -r`. * Command-line flags `--no-color` or `-M`, `--always-color` or `-C`. * Color choices can be specified by using environment variables, or command-line flags, with values 0..255: * `export MLR_KEY_COLOR=208`, `MLR_VALUE_COLOR=33`, etc.: `MLR_KEY_COLOR` `MLR_VALUE_COLOR` `MLR_PASS_COLOR` `MLR_FAIL_COLOR` `MLR_REPL_PS1_COLOR` `MLR_REPL_PS2_COLOR` `MLR_HELP_COLOR` * Command-line flags `--key-color 208`, `--value-color 33`, etc.: `--key-color` `--value-color` `--pass-color` `--fail-color` `--repl-ps1-color` `--repl-ps2-color` `--help-color` * This is particularly useful if your terminal's background color clashes with current settings. If environment-variable settings and command-line flags are both provided, the latter take precedence. Please do mlr `--list-color-codes` to see the available color codes (like 170), and `mlr --list-color-names` to see available names (like `orchid`). --always-color or -C Instructs Miller to colorize output even when it normally would not. Useful for piping output to `less -r`. --fail-color Specify the color (see `--list-color-codes` and `--list-color-names`) for failing cases in `mlr regtest`. --help-color Specify the color (see `--list-color-codes` and `--list-color-names`) for highlights in `mlr help` output. --key-color Specify the color (see `--list-color-codes` and `--list-color-names`) for record keys. --list-color-codes Show the available color codes in the range 0..255, such as 170 for example. --list-color-names Show the names for the available color codes, such as `orchid` for example. --no-color or -M Instructs Miller to not colorize any output. --pass-color Specify the color (see `--list-color-codes` and `--list-color-names`) for passing cases in `mlr regtest`. --value-color Specify the color (see `--list-color-codes` and `--list-color-names`) for record values.
PPRINT-ONLY FLAGS
These are flags which are applicable to PPRINT output format. --barred Prints a border around PPRINT output (not available for input). --right Right-justifies all fields for PPRINT output.
PROFILING FLAGS
These are flags for profiling Miller performance. --cpuprofile {CPU-profile file name} Create a CPU-profile file for performance analysis. Instructions will be printed to stderr. This flag must be the very first thing after 'mlr' on the command line. --time Print elapsed execution time in seconds to stderr at the end of the execution of the program. --traceprofile Create a trace-profile file for performance analysis. Instructions will be printed to stderr. This flag must be the very first thing after 'mlr' on the command line.
SEPARATOR FLAGS
See the Separators doc page for more about record separators, field separators, and pair separators. Also see the File formats doc page, or `mlr help file-formats`, for more about the file formats Miller supports. In brief: * For DKVP records like `x=1,y=2,z=3`, the fields are separated by a comma, the key-value pairs are separated by a comma, and each record is separated from the next by a newline. * Each file format has its own default separators. * Most formats, such as CSV, don't support pair-separators: keys are on the CSV header line and values are on each CSV data line; keys and values are not placed next to one another. * Some separators are not programmable: for example JSON uses a colon as a pair separator but this is non-modifiable in the JSON spec. * You can set separators differently between Miller's input and output -- hence `--ifs` and `--ofs`, etc. Notes about line endings: * Default line endings (`--irs` and `--ors`) are newline which is interpreted to accept carriage-return/newline files (e.g. on Windows) for input, and to produce platform-appropriate line endings on output. Notes about all other separators: * IPS/OPS are only used for DKVP and XTAB formats, since only in these formats do key-value pairs appear juxtaposed. * IRS/ORS are ignored for XTAB format. Nominally IFS and OFS are newlines; XTAB records are separated by two or more consecutive IFS/OFS -- i.e. a blank line. Everything above about `--irs/--ors/--rs auto` becomes `--ifs/--ofs/--fs` auto for XTAB format. (XTAB's default IFS/OFS are "auto".) * OFS must be single-character for PPRINT format. This is because it is used with repetition for alignment; multi-character separators would make alignment impossible. * OPS may be multi-character for XTAB format, in which case alignment is disabled. * TSV is simply CSV using tab as field separator (`--fs tab`). * FS/PS are ignored for markdown format; RS is used. * All FS and PS options are ignored for JSON format, since they are not relevant to the JSON format. * You can specify separators in any of the following ways, shown by example: - Type them out, quoting as necessary for shell escapes, e.g. `--fs '|' --ips :` - C-style escape sequences, e.g. `--rs '\r\n' --fs '\t'`. - To avoid backslashing, you can use any of the following names: ascii_esc = "\x1b" ascii_etx = "\x04" ascii_fs = "\x1c" ascii_gs = "\x1d" ascii_null = "\x01" ascii_rs = "\x1e" ascii_soh = "\x02" ascii_stx = "\x03" ascii_us = "\x1f" asv_fs = "\x1f" asv_rs = "\x1e" colon = ":" comma = "," cr = "\r" crcr = "\r\r" crlf = "\r\n" crlfcrlf = "\r\n\r\n" equals = "=" lf = "\n" lflf = "\n\n" newline = "\n" pipe = "|" semicolon = ";" slash = "/" space = " " tab = "\t" usv_fs = "\xe2\x90\x9f" usv_rs = "\xe2\x90\x9e" - Similarly, you can use the following for `--ifs-regex` and `--ips-regex`: spaces = "( )+" tabs = "(\t)+" whitespace = "([ \t])+" * Default separators by format: Format FS PS RS csv "," N/A "\n" csvlite "," N/A "\n" dkvp "," "=" "\n" json N/A N/A N/A markdown " " N/A "\n" nidx " " N/A "\n" pprint " " N/A "\n" xtab "\n" " " "\n\n" --fs {string} Specify FS for input and output. --ifs {string} Specify FS for input. --ifs-regex {string} Specify FS for input as a regular expression. --ips {string} Specify PS for input. --ips-regex {string} Specify PS for input as a regular expression. --irs {string} Specify RS for input. --ofs {string} Specify FS for output. --ops {string} Specify PS for output. --ors {string} Specify RS for output. --ps {string} Specify PS for input and output. --repifs Let IFS be repeated: e.g. for splitting on multiple spaces. --rs {string} Specify RS for input and output.
AUXILIARY COMMANDS
Available subcommands: aux-list hex lecat termcvt unhex help regtest repl version For more information, please invoke mlr {subcommand} --help.
MLRRC
You can set up personal defaults via a $HOME/.mlrrc and/or ./.mlrrc. For example, if you usually process CSV, then you can put "--csv" in your .mlrrc file and that will be the default input/output format unless otherwise specified on the command line. The .mlrrc file format is one "--flag" or "--option value" per line, with the leading "--" optional. Hash-style comments and blank lines are ignored. Sample .mlrrc: # Input and output formats are CSV by default (unless otherwise specified # on the mlr command line): csv # These are no-ops for CSV, but when I do use JSON output, I want these # pretty-printing options to be used: jvstack jlistwrap How to specify location of .mlrrc: * If $MLRRC is set: o If its value is "__none__" then no .mlrrc files are processed. o Otherwise, its value (as a filename) is loaded and processed. If there are syntax errors, they abort mlr with a usage message (as if you had mistyped something on the command line). If the file can't be loaded at all, though, it is silently skipped. o Any .mlrrc in your home directory or current directory is ignored whenever $MLRRC is set in the environment. * Otherwise: o If $HOME/.mlrrc exists, it's then processed as above. o If ./.mlrrc exists, it's then also processed as above. (I.e. current-directory .mlrrc defaults are stacked over home-directory .mlrrc defaults.) * The command-line flag "--norc" can be used to suppress loading the .mlrrc file even when other conditions are met. See also: https://miller.readthedocs.io/en/latest/customization.html
REPL
Usage: mlr repl [options] {zero or more data-file names} -v Prints the expressions's AST (abstract syntax tree), which gives full transparency on the precedence and associativity rules of Miller's grammar, to stdout. -d Like -v but uses a parenthesized-expression format for the AST. -D Like -d but with output all on one line. -w Show warnings about uninitialized variables -q Don't show startup banner -s Don't show prompts --load {DSL script file} Load script file before presenting the prompt. If the name following --load is a directory, load all "*.mlr" files in that directory. --mload {DSL script files} -- Like --load but works with more than one filename, e.g. '--mload *.mlr --'. -h|--help Show this message. Or any --icsv, --ojson, etc. reader/writer options as for the main Miller command line. Any data-file names are opened just as if you had waited and typed :open {filenames} at the Miller REPL prompt.
VERBS
altkv Usage: mlr altkv [options] Given fields with values of the form a,b,c,d,e,f emits a=b,c=d,e=f pairs. Options: -h|--help Show this message. bar Usage: mlr bar [options] Replaces a numeric field with a number of asterisks, allowing for cheesy bar plots. These align best with --opprint or --oxtab output format. Options: -f {a,b,c} Field names to convert to bars. --lo {lo} Lower-limit value for min-width bar: default '0.000000'. --hi {hi} Upper-limit value for max-width bar: default '100.000000'. -w {n} Bar-field width: default '40'. --auto Automatically computes limits, ignoring --lo and --hi. Holds all records in memory before producing any output. -c {character} Fill character: default '*'. -x {character} Out-of-bounds character: default '#'. -b {character} Blank character: default '.'. Nominally the fill, out-of-bounds, and blank characters will be strings of length 1. However you can make them all longer if you so desire. -h|--help Show this message. bootstrap Usage: mlr bootstrap [options] Emits an n-sample, with replacement, of the input records. See also mlr sample and mlr shuffle. Options: -n Number of samples to output. Defaults to number of input records. Must be non-negative. -h|--help Show this message. cat Usage: mlr cat [options] Passes input records directly to output. Most useful for format conversion. Options: -n Prepend field "n" to each record with record-counter starting at 1. -N {name} Prepend field {name} to each record with record-counter starting at 1. -g {a,b,c} Optional group-by-field names for counters, e.g. a,b,c -h|--help Show this message. check Usage: mlr check [options] Consumes records without printing any output. Useful for doing a well-formatted check on input data. Options: -h|--help Show this message. clean-whitespace Usage: mlr clean-whitespace [options] For each record, for each field in the record, whitespace-cleans the keys and/or values. Whitespace-cleaning entails stripping leading and trailing whitespace, and replacing multiple whitespace with singles. For finer-grained control, please see the DSL functions lstrip, rstrip, strip, collapse_whitespace, and clean_whitespace. Options: -k|--keys-only Do not touch values. -v|--values-only Do not touch keys. It is an error to specify -k as well as -v -- to clean keys and values, leave off -k as well as -v. -h|--help Show this message. count-distinct Usage: mlr count-distinct [options] Prints number of records having distinct values for specified field names. Same as uniq -c. Options: -f {a,b,c} Field names for distinct count. -n Show only the number of distinct values. Not compatible with -u. -o {name} Field name for output count. Default "count". Ignored with -u. -u Do unlashed counts for multiple field names. With -f a,b and without -u, computes counts for distinct combinations of a and b field values. With -f a,b and with -u, computes counts for distinct a field values and counts for distinct b field values separately. count Usage: mlr count [options] Prints number of records, optionally grouped by distinct values for specified field names. Options: -g {a,b,c} Optional group-by-field names for counts, e.g. a,b,c -n {n} Show only the number of distinct values. Not interesting without -g. -o {name} Field name for output-count. Default "count". -h|--help Show this message. count-similar Usage: mlr count-similar [options] Ingests all records, then emits each record augmented by a count of the number of other records having the same group-by field values. Options: -g {a,b,c} Group-by-field names for counts, e.g. a,b,c -o {name} Field name for output-counts. Defaults to "count". -h|--help Show this message. cut Usage: mlr cut [options] Passes through input records with specified fields included/excluded. Options: -f {a,b,c} Comma-separated field names for cut, e.g. a,b,c. -o Retain fields in the order specified here in the argument list. Default is to retain them in the order found in the input data. -x|--complement Exclude, rather than include, field names specified by -f. -r Treat field names as regular expressions. "ab", "a.*b" will match any field name containing the substring "ab" or matching "a.*b", respectively; anchors of the form "^ab$", "^a.*b$" may be used. The -o flag is ignored when -r is present. -h|--help Show this message. Examples: mlr cut -f hostname,status mlr cut -x -f hostname,status mlr cut -r -f '^status$,sda[0-9]' mlr cut -r -f '^status$,"sda[0-9]"' mlr cut -r -f '^status$,"sda[0-9]"i' (this is case-insensitive) decimate Usage: mlr decimate [options] Passes through one of every n records, optionally by category. Options: -b Decimate by printing first of every n. -e Decimate by printing last of every n (default). -g {a,b,c} Optional group-by-field names for decimate counts, e.g. a,b,c. -n {n} Decimation factor (default 10). -h|--help Show this message. fill-down Usage: mlr fill-down [options] If a given record has a missing value for a given field, fill that from the corresponding value from a previous record, if any. By default, a 'missing' field either is absent, or has the empty-string value. With -a, a field is 'missing' only if it is absent. Options: --all Operate on all fields in the input. -a|--only-if-absent If a given record has a missing value for a given field, fill that from the corresponding value from a previous record, if any. By default, a 'missing' field either is absent, or has the empty-string value. With -a, a field is 'missing' only if it is absent. -f Field names for fill-down. -h|--help Show this message. fill-empty Usage: mlr fill-empty [options] Fills empty-string fields with specified fill-value. Options: -v {string} Fill-value: defaults to "N/A" -S Don't infer type -- so '-v 0' would fill string 0 not int 0. filter Usage: mlr filter [options] {DSL expression} Options: -f {file name} File containing a DSL expression (see examples below). If the filename is a directory, all *.mlr files in that directory are loaded. -e {expression} You can use this after -f to add an expression. Example use case: define functions/subroutines in a file you specify with -f, then call them with an expression you specify with -e. (If you mix -e and -f then the expressions are evaluated in the order encountered. Since the expression pieces are simply concatenated, please be sure to use intervening semicolons to separate expressions.) -s name=value: Predefines out-of-stream variable @name to have Thus mlr put -s foo=97 '$column += @foo' is like mlr put 'begin {@foo = 97} $column += @foo'. The value part is subject to type-inferencing. May be specified more than once, e.g. -s name1=value1 -s name2=value2. Note: the value may be an environment variable, e.g. -s sequence=$SEQUENCE -x (default false) Prints records for which {expression} evaluates to false, not true, i.e. invert the sense of the filter expression. -q Does not include the modified record in the output stream. Useful for when all desired output is in begin and/or end blocks. -S and -F: There are no-ops in Miller 6 and above, since now type-inferencing is done by the record-readers before filter/put is executed. Supported as no-op pass-through flags for backward compatibility. -h|--help Show this message. Parser-info options: -w Print warnings about things like uninitialized variables. -W Same as -w, but exit the process if there are any warnings. -p Prints the expressions's AST (abstract syntax tree), which gives full transparency on the precedence and associativity rules of Miller's grammar, to stdout. -d Like -p but uses a parenthesized-expression format for the AST. -D Like -d but with output all on one line. -E Echo DSL expression before printing parse-tree -v Same as -E -p. -X Exit after parsing but before stream-processing. Useful with -v/-d/-D, if you only want to look at parser information. Records will pass the filter depending on the last bare-boolean statement in the DSL expression. That can be the result of <, ==, >, etc., the return value of a function call which returns boolean, etc. Examples: mlr --csv --from example.csv filter '$color == "red"' mlr --csv --from example.csv filter '$color == "red" && flag == true' More example filter expressions: First record in each file: 'FNR == 1' Subsampling: 'urand() < 0.001' Compound booleans: '$color != "blue" && $value > 4.2' '($x < 0.5 && $y < 0.5) || ($x > 0.5 && $y > 0.5)' Regexes with case-insensitive flag '($name =~ "^sys.*east$") || ($name =~ "^dev.[0-9]+"i)' Assignments, then bare-boolean filter statement: '$ab = $a+$b; $cd = $c+$d; $ab != $cd' Bare-boolean filter statement within a conditional: 'if (NR < 100) { $x > 0.3; } else { $x > 0.002; } ' Using 'any' higher-order function to see if $index is 10, 20, or 30: 'any([10,20,30], func(e) {return $index == e})' See also https://miller.readthedocs.io/reference-dsl for more context. flatten Usage: mlr flatten [options] Flattens multi-level maps to single-level ones. Example: field with name 'a' and value '{"b": { "c": 4 }}' becomes name 'a.b.c' and value 4. Options: -f Comma-separated list of field names to flatten (default all). -s Separator, defaulting to mlr --flatsep value. -h|--help Show this message. format-values Usage: mlr format-values [options] Applies format strings to all field values, depending on autodetected type. * If a field value is detected to be integer, applies integer format. * Else, if a field value is detected to be float, applies float format. * Else, applies string format. Note: this is a low-keystroke way to apply formatting to many fields. To get finer control, please see the fmtnum function within the mlr put DSL. Note: this verb lets you apply arbitrary format strings, which can produce undefined behavior and/or program crashes. See your system's "man printf". Options: -i {integer format} Defaults to "%d". Examples: "%06lld", "%08llx". Note that Miller integers are long long so you must use formats which apply to long long, e.g. with ll in them. Undefined behavior results otherwise. -f {float format} Defaults to "%f". Examples: "%8.3lf", "%.6le". Note that Miller floats are double-precision so you must use formats which apply to double, e.g. with l[efg] in them. Undefined behavior results otherwise. -s {string format} Defaults to "%s". Examples: "_%s", "%08s". Note that you must use formats which apply to string, e.g. with s in them. Undefined behavior results otherwise. -n Coerce field values autodetected as int to float, and then apply the float format. fraction Usage: mlr fraction [options] For each record's value in specified fields, computes the ratio of that value to the sum of values in that field over all input records. E.g. with input records x=1 x=2 x=3 and x=4, emits output records x=1,x_fraction=0.1 x=2,x_fraction=0.2 x=3,x_fraction=0.3 and x=4,x_fraction=0.4 Note: this is internally a two-pass algorithm: on the first pass it retains input records and accumulates sums; on the second pass it computes quotients and emits output records. This means it produces no output until all input is read. Options: -f {a,b,c} Field name(s) for fraction calculation -g {d,e,f} Optional group-by-field name(s) for fraction counts -p Produce percents [0..100], not fractions [0..1]. Output field names end with "_percent" rather than "_fraction" -c Produce cumulative distributions, i.e. running sums: each output value folds in the sum of the previous for the specified group E.g. with input records x=1 x=2 x=3 and x=4, emits output records x=1,x_cumulative_fraction=0.1 x=2,x_cumulative_fraction=0.3 x=3,x_cumulative_fraction=0.6 and x=4,x_cumulative_fraction=1.0 gap Usage: mlr gap [options] Emits an empty record every n records, or when certain values change. Options: Emits an empty record every n records, or when certain values change. -g {a,b,c} Print a gap whenever values of these fields (e.g. a,b,c) changes. -n {n} Print a gap every n records. One of -f or -g is required. -n is ignored if -g is present. -h|--help Show this message. grep Usage: mlr grep [options] {regular expression} Passes through records which match the regular expression. Options: -i Use case-insensitive search. -v Invert: pass through records which do not match the regex. -h|--help Show this message. Note that "mlr filter" is more powerful, but requires you to know field names. By contrast, "mlr grep" allows you to regex-match the entire record. It does this by formatting each record in memory as DKVP, using command-line-specified ORS/OFS/OPS, and matching the resulting line against the regex specified here. In particular, the regex is not applied to the input stream: if you have CSV with header line "x,y,z" and data line "1,2,3" then the regex will be matched, not against either of these lines, but against the DKVP line "x=1,y=2,z=3". Furthermore, not all the options to system grep are supported, and this command is intended to be merely a keystroke-saver. To get all the features of system grep, you can do "mlr --odkvp ... | grep ... | mlr --idkvp ..." group-by Usage: mlr group-by [options] {comma-separated field names} Outputs records in batches having identical values at specified field names.Options: -h|--help Show this message. group-like Usage: mlr group-like [options] Outputs records in batches having identical field names. Options: -h|--help Show this message. having-fields Usage: mlr having-fields [options] Conditionally passes through records depending on each record's field names. Options: --at-least {comma-separated names} --which-are {comma-separated names} --at-most {comma-separated names} --all-matching {regular expression} --any-matching {regular expression} --none-matching {regular expression} Examples: mlr having-fields --which-are amount,status,owner mlr having-fields --any-matching 'sda[0-9]' mlr having-fields --any-matching '"sda[0-9]"' mlr having-fields --any-matching '"sda[0-9]"i' (this is case-insensitive) head Usage: mlr head [options] Passes through the first n records, optionally by category. Without -g, ceases consuming more input (i.e. is fast) when n records have been read. Options: -g {a,b,c} Optional group-by-field names for head counts, e.g. a,b,c. -n {n} Head-count to print. Default 10. -h|--help Show this message. histogram Just a histogram. Input values < lo or > hi are not counted. Usage: mlr histogram [options] -f {a,b,c} Value-field names for histogram counts --lo {lo} Histogram low value --hi {hi} Histogram high value --nbins {n} Number of histogram bins. Defaults to 20. --auto Automatically computes limits, ignoring --lo and --hi. Holds all values in memory before producing any output. -o {prefix} Prefix for output field name. Default: no prefix. -h|--help Show this message. json-parse Usage: mlr json-parse [options] Tries to convert string field values to parsed JSON, e.g. "[1,2,3]" -> [1,2,3]. Options: -f {...} Comma-separated list of field names to json-parse (default all). -h|--help Show this message. json-stringify Usage: mlr json-stringify [options] Produces string field values from field-value data, e.g. [1,2,3] -> "[1,2,3]". Options: -f {...} Comma-separated list of field names to json-parse (default all). --jvstack Produce multi-line JSON output. --no-jvstack Produce single-line JSON output per record (default). -h|--help Show this message. join Usage: mlr join [options] Joins records from specified left file name with records from all file names at the end of the Miller argument list. Functionality is essentially the same as the system "join" command, but for record streams. Options: -f {left file name} -j {a,b,c} Comma-separated join-field names for output -l {a,b,c} Comma-separated join-field names for left input file; defaults to -j values if omitted. -r {a,b,c} Comma-separated join-field names for right input file(s); defaults to -j values if omitted. --lp {text} Additional prefix for non-join output field names from the left file --rp {text} Additional prefix for non-join output field names from the right file(s) --np Do not emit paired records --ul Emit unpaired records from the left file --ur Emit unpaired records from the right file(s) -s|--sorted-input Require sorted input: records must be sorted lexically by their join-field names, else not all records will be paired. The only likely use case for this is with a left file which is too big to fit into system memory otherwise. -u Enable unsorted input. (This is the default even without -u.) In this case, the entire left file will be loaded into memory. --prepipe {command} As in main input options; see mlr --help for details. If you wish to use a prepipe command for the main input as well as here, it must be specified there as well as here. --prepipex {command} Likewise. File-format options default to those for the right file names on the Miller argument list, but may be overridden for the left file as follows. Please see the main "mlr --help" for more information on syntax for these arguments: -i {one of csv,dkvp,nidx,pprint,xtab} --irs {record-separator character} --ifs {field-separator character} --ips {pair-separator character} --repifs --implicit-csv-header --no-implicit-csv-header For example, if you have 'mlr --csv ... join -l foo ... ' then the left-file format will be specified CSV as well unless you override with 'mlr --csv ... join --ijson -l foo' etc. Likewise, if you have 'mlr --csv --implicit-csv-header ...' then the join-in file will be expected to be headerless as well unless you put '--no-implicit-csv-header' after 'join'. Please use "mlr --usage-separator-options" for information on specifying separators. Please see https://miller.readthedocs.io/en/latest/reference-verbs.html#join for more information including examples. label Usage: mlr label [options] {new1,new2,new3,...} Given n comma-separated names, renames the first n fields of each record to have the respective name. (Fields past the nth are left with their original names.) Particularly useful with --inidx or --implicit-csv-header, to give useful names to otherwise integer-indexed fields. Options: -h|--help Show this message. least-frequent Usage: mlr least-frequent [options] Shows the least frequently occurring distinct values for specified field names. The first entry is the statistical anti-mode; the remaining are runners-up. Options: -f {one or more comma-separated field names}. Required flag. -n {count}. Optional flag defaulting to 10. -b Suppress counts; show only field values. -o {name} Field name for output count. Default "count". See also "mlr most-frequent". merge-fields Usage: mlr merge-fields [options] Computes univariate statistics for each input record, accumulated across specified fields. Options: -a {sum,count,...} Names of accumulators. One or more of: count Count instances of fields mode Find most-frequently-occurring values for fields; first-found wins tie antimode Find least-frequently-occurring values for fields; first-found wins tie sum Compute sums of specified fields mean Compute averages (sample means) of specified fields var Compute sample variance of specified fields stddev Compute sample standard deviation of specified fields meaneb Estimate error bars for averages (assuming no sample autocorrelation) skewness Compute sample skewness of specified fields kurtosis Compute sample kurtosis of specified fields min Compute minimum values of specified fields max Compute maximum values of specified fields -f {a,b,c} Value-field names on which to compute statistics. Requires -o. -r {a,b,c} Regular expressions for value-field names on which to compute statistics. Requires -o. -c {a,b,c} Substrings for collapse mode. All fields which have the same names after removing substrings will be accumulated together. Please see examples below. -i Use interpolated percentiles, like R's type=7; default like type=1. Not sensical for string-valued fields. -o {name} Output field basename for -f/-r. -k Keep the input fields which contributed to the output statistics; the default is to omit them. String-valued data make sense unless arithmetic on them is required, e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data, numbers are less than strings. Example input data: "a_in_x=1,a_out_x=2,b_in_y=4,b_out_x=8". Example: mlr merge-fields -a sum,count -f a_in_x,a_out_x -o foo produces "b_in_y=4,b_out_x=8,foo_sum=3,foo_count=2" since "a_in_x,a_out_x" are summed over. Example: mlr merge-fields -a sum,count -r in_,out_ -o bar produces "bar_sum=15,bar_count=4" since all four fields are summed over. Example: mlr merge-fields -a sum,count -c in_,out_ produces "a_x_sum=3,a_x_count=2,b_y_sum=4,b_y_count=1,b_x_sum=8,b_x_count=1" since "a_in_x" and "a_out_x" both collapse to "a_x", "b_in_y" collapses to "b_y", and "b_out_x" collapses to "b_x". most-frequent Usage: mlr most-frequent [options] Shows the most frequently occurring distinct values for specified field names. The first entry is the statistical mode; the remaining are runners-up. Options: -f {one or more comma-separated field names}. Required flag. -n {count}. Optional flag defaulting to 10. -b Suppress counts; show only field values. -o {name} Field name for output count. Default "count". See also "mlr least-frequent". nest Usage: mlr nest [options] Explodes specified field values into separate fields/records, or reverses this. Options: --explode,--implode One is required. --values,--pairs One is required. --across-records,--across-fields One is required. -f {field name} Required. --nested-fs {string} Defaults to ";". Field separator for nested values. --nested-ps {string} Defaults to ":". Pair separator for nested key-value pairs. --evar {string} Shorthand for --explode --values ---across-records --nested-fs {string} --ivar {string} Shorthand for --implode --values ---across-records --nested-fs {string} Please use "mlr --usage-separator-options" for information on specifying separators. Examples: mlr nest --explode --values --across-records -f x with input record "x=a;b;c,y=d" produces output records "x=a,y=d" "x=b,y=d" "x=c,y=d" Use --implode to do the reverse. mlr nest --explode --values --across-fields -f x with input record "x=a;b;c,y=d" produces output records "x_1=a,x_2=b,x_3=c,y=d" Use --implode to do the reverse. mlr nest --explode --pairs --across-records -f x with input record "x=a:1;b:2;c:3,y=d" produces output records "a=1,y=d" "b=2,y=d" "c=3,y=d" mlr nest --explode --pairs --across-fields -f x with input record "x=a:1;b:2;c:3,y=d" produces output records "a=1,b=2,c=3,y=d" Notes: * With --pairs, --implode doesn't make sense since the original field name has been lost. * The combination "--implode --values --across-records" is non-streaming: no output records are produced until all input records have been read. In particular, this means it won't work in tail -f contexts. But all other flag combinations result in streaming (tail -f friendly) data processing. * It's up to you to ensure that the nested-fs is distinct from your data's IFS: e.g. by default the former is semicolon and the latter is comma. See also mlr reshape. nothing Usage: mlr nothing [options] Drops all input records. Useful for testing, or after tee/print/etc. have produced other output. Options: -h|--help Show this message. put Usage: mlr put [options] {DSL expression} Options: -f {file name} File containing a DSL expression (see examples below). If the filename is a directory, all *.mlr files in that directory are loaded. -e {expression} You can use this after -f to add an expression. Example use case: define functions/subroutines in a file you specify with -f, then call them with an expression you specify with -e. (If you mix -e and -f then the expressions are evaluated in the order encountered. Since the expression pieces are simply concatenated, please be sure to use intervening semicolons to separate expressions.) -s name=value: Predefines out-of-stream variable @name to have Thus mlr put -s foo=97 '$column += @foo' is like mlr put 'begin {@foo = 97} $column += @foo'. The value part is subject to type-inferencing. May be specified more than once, e.g. -s name1=value1 -s name2=value2. Note: the value may be an environment variable, e.g. -s sequence=$SEQUENCE -x (default false) Prints records for which {expression} evaluates to false, not true, i.e. invert the sense of the filter expression. -q Does not include the modified record in the output stream. Useful for when all desired output is in begin and/or end blocks. -S and -F: There are no-ops in Miller 6 and above, since now type-inferencing is done by the record-readers before filter/put is executed. Supported as no-op pass-through flags for backward compatibility. -h|--help Show this message. Parser-info options: -w Print warnings about things like uninitialized variables. -W Same as -w, but exit the process if there are any warnings. -p Prints the expressions's AST (abstract syntax tree), which gives full transparency on the precedence and associativity rules of Miller's grammar, to stdout. -d Like -p but uses a parenthesized-expression format for the AST. -D Like -d but with output all on one line. -E Echo DSL expression before printing parse-tree -v Same as -E -p. -X Exit after parsing but before stream-processing. Useful with -v/-d/-D, if you only want to look at parser information. Examples: mlr --from example.csv put '$qr = $quantity * $rate' More example put expressions: If-statements: 'if ($flag == true) { $quantity *= 10}' 'if ($x > 0.0 { $y=log10($x); $z=sqrt($y) } else {$y = 0.0; $z = 0.0}' Newly created fields can be read after being written: '$new_field = $index**2; $qn = $quantity * $new_field' Regex-replacement: '$name = sub($name, "http.*com"i, "")' Regex-capture: 'if ($a =~ "([a-z]+)_([0-9]+)) { $b = "left_\1"; $c = "right_\2" }' Built-in variables: '$filename = FILENAME' Aggregations (use mlr put -q): '@sum += $x; end {emit @sum}' '@sum[$shape] += $quantity; end {emit @sum, "shape"}' '@sum[$shape][$color] += $x; end {emit @sum, "shape", "color"}' ' @min = min(@min,$x); @max=max(@max,$x); end{emitf @min, @max} ' See also https://miller.readthedocs.io/reference-dsl for more context. regularize Usage: mlr regularize [options] Outputs records sorted lexically ascending by keys. Options: -h|--help Show this message. remove-empty-columns Usage: mlr remove-empty-columns [options] Omits fields which are empty on every input row. Non-streaming. Options: -h|--help Show this message. rename Usage: mlr rename [options] {old1,new1,old2,new2,...} Renames specified fields. Options: -r Treat old field names as regular expressions. "ab", "a.*b" will match any field name containing the substring "ab" or matching "a.*b", respectively; anchors of the form "^ab$", "^a.*b$" may be used. New field names may be plain strings, or may contain capture groups of the form "\1" through "\9". Wrapping the regex in double quotes is optional, but is required if you wish to follow it with 'i' to indicate case-insensitivity. -g Do global replacement within each field name rather than first-match replacement. -h|--help Show this message. Examples: mlr rename old_name,new_name' mlr rename old_name_1,new_name_1,old_name_2,new_name_2' mlr rename -r 'Date_[0-9]+,Date,' Rename all such fields to be "Date" mlr rename -r '"Date_[0-9]+",Date' Same mlr rename -r 'Date_([0-9]+).*,\1' Rename all such fields to be of the form 20151015 mlr rename -r '"name"i,Name' Rename "name", "Name", "NAME", etc. to "Name" reorder Usage: mlr reorder [options] Moves specified names to start of record, or end of record. Options: -e Put specified field names at record end: default is to put them at record start. -f {a,b,c} Field names to reorder. -b {x} Put field names specified with -f before field name specified by {x}, if any. If {x} isn't present in a given record, the specified fields will not be moved. -a {x} Put field names specified with -f after field name specified by {x}, if any. If {x} isn't present in a given record, the specified fields will not be moved. -h|--help Show this message. Examples: mlr reorder -f a,b sends input record "d=4,b=2,a=1,c=3" to "a=1,b=2,d=4,c=3". mlr reorder -e -f a,b sends input record "d=4,b=2,a=1,c=3" to "d=4,c=3,a=1,b=2". repeat Usage: mlr repeat [options] Copies input records to output records multiple times. Options must be exactly one of the following: -n {repeat count} Repeat each input record this many times. -f {field name} Same, but take the repeat count from the specified field name of each input record. -h|--help Show this message. Example: echo x=0 | mlr repeat -n 4 then put '$x=urand()' produces: x=0.488189 x=0.484973 x=0.704983 x=0.147311 Example: echo a=1,b=2,c=3 | mlr repeat -f b produces: a=1,b=2,c=3 a=1,b=2,c=3 Example: echo a=1,b=2,c=3 | mlr repeat -f c produces: a=1,b=2,c=3 a=1,b=2,c=3 a=1,b=2,c=3 reshape Usage: mlr reshape [options] Wide-to-long options: -i {input field names} -o {key-field name,value-field name} -r {input field regexes} -o {key-field name,value-field name} These pivot/reshape the input data such that the input fields are removed and separate records are emitted for each key/value pair. Note: this works with tail -f and produces output records for each input record seen. Long-to-wide options: -s {key-field name,value-field name} These pivot/reshape the input data to undo the wide-to-long operation. Note: this does not work with tail -f; it produces output records only after all input records have been read. Examples: Input file "wide.txt": time X Y 2009-01-01 0.65473572 2.4520609 2009-01-02 -0.89248112 0.2154713 2009-01-03 0.98012375 1.3179287 mlr --pprint reshape -i X,Y -o item,value wide.txt time item value 2009-01-01 X 0.65473572 2009-01-01 Y 2.4520609 2009-01-02 X -0.89248112 2009-01-02 Y 0.2154713 2009-01-03 X 0.98012375 2009-01-03 Y 1.3179287 mlr --pprint reshape -r '[A-Z]' -o item,value wide.txt time item value 2009-01-01 X 0.65473572 2009-01-01 Y 2.4520609 2009-01-02 X -0.89248112 2009-01-02 Y 0.2154713 2009-01-03 X 0.98012375 2009-01-03 Y 1.3179287 Input file "long.txt": time item value 2009-01-01 X 0.65473572 2009-01-01 Y 2.4520609 2009-01-02 X -0.89248112 2009-01-02 Y 0.2154713 2009-01-03 X 0.98012375 2009-01-03 Y 1.3179287 mlr --pprint reshape -s item,value long.txt time X Y 2009-01-01 0.65473572 2.4520609 2009-01-02 -0.89248112 0.2154713 2009-01-03 0.98012375 1.3179287 See also mlr nest. sample Usage: mlr sample [options] Reservoir sampling (subsampling without replacement), optionally by category. See also mlr bootstrap and mlr shuffle. Options: -g {a,b,c} Optional: group-by-field names for samples, e.g. a,b,c. -k {k} Required: number of records to output in total, or by group if using -g. -h|--help Show this message. sec2gmtdate Usage: ../c/mlr sec2gmtdate {comma-separated list of field names} Replaces a numeric field representing seconds since the epoch with the corresponding GMT year-month-day timestamp; leaves non-numbers as-is. This is nothing more than a keystroke-saver for the sec2gmtdate function: ../c/mlr sec2gmtdate time1,time2 is the same as ../c/mlr put '$time1=sec2gmtdate($time1);$time2=sec2gmtdate($time2)' sec2gmt Usage: mlr sec2gmt [options] {comma-separated list of field names} Replaces a numeric field representing seconds since the epoch with the corresponding GMT timestamp; leaves non-numbers as-is. This is nothing more than a keystroke-saver for the sec2gmt function: mlr sec2gmt time1,time2 is the same as mlr put '$time1 = sec2gmt($time1); $time2 = sec2gmt($time2)' Options: -1 through -9: format the seconds using 1..9 decimal places, respectively. --millis Input numbers are treated as milliseconds since the epoch. --micros Input numbers are treated as microseconds since the epoch. --nanos Input numbers are treated as nanoseconds since the epoch. -h|--help Show this message. seqgen Usage: mlr seqgen [options] Passes input records directly to output. Most useful for format conversion. Produces a sequence of counters. Discards the input record stream. Produces output as specified by the options Options: -f {name} (default "i") Field name for counters. --start {value} (default 1) Inclusive start value. --step {value} (default 1) Step value. --stop {value} (default 100) Inclusive stop value. -h|--help Show this message. Start, stop, and/or step may be floating-point. Output is integer if start, stop, and step are all integers. Step may be negative. It may not be zero unless start == stop. shuffle Usage: mlr shuffle [options] Outputs records randomly permuted. No output records are produced until all input records are read. See also mlr bootstrap and mlr sample. Options: -h|--help Show this message. skip-trivial-records Usage: mlr skip-trivial-records [options] Passes through all records except those with zero fields, or those for which all fields have empty value. Options: -h|--help Show this message. sort Usage: mlr sort {flags} Sorts records primarily by the first specified field, secondarily by the second field, and so on. (Any records not having all specified sort keys will appear at the end of the output, in the order they were encountered, regardless of the specified sort order.) The sort is stable: records that compare equal will sort in the order they were encountered in the input record stream. Options: -f {comma-separated field names} Lexical ascending -r {comma-separated field names} Lexical descending -c {comma-separated field names} Case-folded lexical ascending -cr {comma-separated field names} Case-folded lexical descending -n {comma-separated field names} Numerical ascending; nulls sort last -nf {comma-separated field names} Same as -n -nr {comma-separated field names} Numerical descending; nulls sort first -h|--help Show this message. Example: mlr sort -f a,b -nr x,y,z which is the same as: mlr sort -f a -f b -nr x -nr y -nr z sort-within-records Usage: mlr sort-within-records [options] Outputs records sorted lexically ascending by keys. Options: -r Recursively sort subobjects/submaps, e.g. for JSON input. -h|--help Show this message. stats1 Usage: mlr stats1 [options] Computes univariate statistics for one or more given fields, accumulated across the input record stream. Options: -a {sum,count,...} Names of accumulators: one or more of: median This is the same as p50 p10 p25.2 p50 p98 p100 etc. count Count instances of fields mode Find most-frequently-occurring values for fields; first-found wins tie antimode Find least-frequently-occurring values for fields; first-found wins tie sum Compute sums of specified fields mean Compute averages (sample means) of specified fields var Compute sample variance of specified fields stddev Compute sample standard deviation of specified fields meaneb Estimate error bars for averages (assuming no sample autocorrelation) skewness Compute sample skewness of specified fields kurtosis Compute sample kurtosis of specified fields min Compute minimum values of specified fields max Compute maximum values of specified fields -f {a,b,c} Value-field names on which to compute statistics --fr {regex} Regex for value-field names on which to compute statistics (compute statistics on values in all field names matching regex --fx {regex} Inverted regex for value-field names on which to compute statistics (compute statistics on values in all field names not matching regex) -g {d,e,f} Optional group-by-field names --gr {regex} Regex for optional group-by-field names (group by values in field names matching regex) --gx {regex} Inverted regex for optional group-by-field names (group by values in field names not matching regex) --grfx {regex} Shorthand for --gr {regex} --fx {that same regex} -i Use interpolated percentiles, like R's type=7; default like type=1. Not sensical for string-valued fields.\n"); -s Print iterative stats. Useful in tail -f contexts (in which case please avoid pprint-format output since end of input stream will never be seen). -h|--help Show this message. Example: mlr stats1 -a min,p10,p50,p90,max -f value -g size,shape Example: mlr stats1 -a count,mode -f size Example: mlr stats1 -a count,mode -f size -g shape Example: mlr stats1 -a count,mode --fr '^[a-h].*$' -gr '^k.*$' This computes count and mode statistics on all field names beginning with a through h, grouped by all field names starting with k. Notes: * p50 and median are synonymous. * min and max output the same results as p0 and p100, respectively, but use less memory. * String-valued data make sense unless arithmetic on them is required, e.g. for sum, mean, interpolated percentiles, etc. In case of mixed data, numbers are less than strings. * count and mode allow text input; the rest require numeric input. In particular, 1 and 1.0 are distinct text for count and mode. * When there are mode ties, the first-encountered datum wins. stats2 Usage: mlr stats2 [options] Computes bivariate statistics for one or more given field-name pairs, accumulated across the input record stream. -a {linreg-ols,corr,...} Names of accumulators: one or more of: linreg-ols Linear regression using ordinary least squares linreg-pca Linear regression using principal component analysis r2 Quality metric for linreg-ols (linreg-pca emits its own) logireg Logistic regression corr Sample correlation cov Sample covariance covx Sample-covariance matrix -f {a,b,c,d} Value-field name-pairs on which to compute statistics. There must be an even number of names. -g {e,f,g} Optional group-by-field names. -v Print additional output for linreg-pca. -s Print iterative stats. Useful in tail -f contexts (in which case please avoid pprint-format output since end of input stream will never be seen). --fit Rather than printing regression parameters, applies them to the input data to compute new fit fields. All input records are held in memory until end of input stream. Has effect only for linreg-ols, linreg-pca, and logireg. Only one of -s or --fit may be used. Example: mlr stats2 -a linreg-pca -f x,y Example: mlr stats2 -a linreg-ols,r2 -f x,y -g size,shape Example: mlr stats2 -a corr -f x,y step Usage: mlr step [options] Computes values dependent on the previous record, optionally grouped by category. Options: -a {delta,rsum,...} Names of steppers: comma-separated, one or more of: delta Compute differences in field(s) between successive records shift Include value(s) in field(s) from previous record, if any from-first Compute differences in field(s) from first record ratio Compute ratios in field(s) between successive records rsum Compute running sums of field(s) between successive records counter Count instances of field(s) between successive records ewma Exponentially weighted moving average over successive records -f {a,b,c} Value-field names on which to compute statistics -g {d,e,f} Optional group-by-field names -F Computes integerable things (e.g. counter) in floating point. As of Miller 6 this happens automatically, but the flag is accepted as a no-op for backward compatibility with Miller 5 and below. -d {x,y,z} Weights for ewma. 1 means current sample gets all weight (no smoothing), near under under 1 is light smoothing, near over 0 is heavy smoothing. Multiple weights may be specified, e.g. "mlr step -a ewma -f sys_load -d 0.01,0.1,0.9". Default if omitted is "-d 0.5". -o {a,b,c} Custom suffixes for EWMA output fields. If omitted, these default to the -d values. If supplied, the number of -o values must be the same as the number of -d values. -h|--help Show this message. Examples: mlr step -a rsum -f request_size mlr step -a delta -f request_size -g hostname mlr step -a ewma -d 0.1,0.9 -f x,y mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y mlr step -a ewma -d 0.1,0.9 -o smooth,rough -f x,y -g group_name Please see https://miller.readthedocs.io/en/latest/reference-verbs.html#filter or https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average for more information on EWMA. tac Usage: mlr tac [options] Prints records in reverse order from the order in which they were encountered. Options: -h|--help Show this message. tail Usage: mlr tail [options] Passes through the last n records, optionally by category. Options: -g {a,b,c} Optional group-by-field names for head counts, e.g. a,b,c. -n {n} Head-count to print. Default 10. -h|--help Show this message. tee Usage: mlr tee [options] {filename} Options: -a Append to existing file, if any, rather than overwriting. -p Treat filename as a pipe-to command. Any of the output-format command-line flags (see mlr -h). Example: using mlr --icsv --opprint put '...' then tee --ojson ./mytap.dat then stats1 ... the input is CSV, the output is pretty-print tabular, but the tee-file output is written in JSON format. -h|--help Show this message. template Usage: mlr template [options] Places input-record fields in the order specified by list of column names. If the input record is missing a specified field, it will be filled with the fill-with. If the input record possesses an unspecified field, it will be discarded. Options: -f {a,b,c} Comma-separated field names for template, e.g. a,b,c. -t {filename} CSV file whose header line will be used for template. --fill-with {filler string} What to fill absent fields with. Defaults to the empty string. -h|--help Show this message. Example: * Specified fields are a,b,c. * Input record is c=3,a=1,f=6. * Output record is a=1,b=,c=3. top Usage: mlr top [options] -f {a,b,c} Value-field names for top counts. -g {d,e,f} Optional group-by-field names for top counts. -n {count} How many records to print per category; default 1. -a Print all fields for top-value records; default is to print only value and group-by fields. Requires a single value-field name only. --min Print top smallest values; default is top largest values. -F Keep top values as floats even if they look like integers. -o {name} Field name for output indices. Default "top_idx". Prints the n records with smallest/largest values at specified fields, optionally by category. unflatten Usage: mlr unflatten [options] Reverses flatten. Example: field with name 'a.b.c' and value 4 becomes name 'a' and value '{"b": { "c": 4 }}'. Options: -f {a,b,c} Comma-separated list of field names to unflatten (default all). -s {string} Separator, defaulting to mlr --flatsep value. -h|--help Show this message. uniq Usage: mlr uniq [options] Prints distinct values for specified field names. With -c, same as count-distinct. For uniq, -f is a synonym for -g. Options: -g {d,e,f} Group-by-field names for uniq counts. -c Show repeat counts in addition to unique values. -n Show only the number of distinct values. -o {name} Field name for output count. Default "count". -a Output each unique record only once. Incompatible with -g. With -c, produces unique records, with repeat counts for each. With -n, produces only one record which is the unique-record count. With neither -c nor -n, produces unique records. unsparsify Usage: mlr unsparsify [options] Prints records with the union of field names over all input records. For field names absent in a given record but present in others, fills in a value. This verb retains all input before producing any output. Options: --fill-with {filler string} What to fill absent fields with. Defaults to the empty string. -f {a,b,c} Specify field names to be operated on. Any other fields won't be modified, and operation will be streaming. -h|--help Show this message. Example: if the input is two records, one being 'a=1,b=2' and the other being 'b=3,c=4', then the output is the two records 'a=1,b=2,c=' and ’a=,b=3,c=4'.
FUNCTIONS FOR FILTER/PUT
abs (class=math #args=1) Absolute value. acos (class=math #args=1) Inverse trigonometric cosine. acosh (class=math #args=1) Inverse hyperbolic cosine. any (class=higher-order-functions #args=2) Given a map or array as first argument and a function as second argument, yields a boolean true if the argument function returns true for any array/map element, false otherwise. For arrays, the function should take one argument, for array element; for maps, it should take two, for map-element key and value. In either case it should return a boolean. Examples: Array example: any([10,20,30], func(e) {return $index == e}) Map example: any({"a": "foo", "b": "bar"}, func(k,v) {return $[k] == v}) append (class=collections #args=2) Appends second argument to end of first argument, which must be an array. apply (class=higher-order-functions #args=2) Given a map or array as first argument and a function as second argument, applies the function to each element of the array/map. For arrays, the function should take one argument, for array element; it should return a new element. For maps, it should take two arguments, for map-element key and value; it should return a new key-value pair (i.e. a single-entry map). Examples: Array example: apply([1,2,3,4,5], func(e) {return e ** 3}) returns [1, 8, 27, 64, 125]. Map example: apply({"a":1, "b":3, "c":5}, func(k,v) {return {toupper(k): v ** 2}}) returns {"A": 1, "B":9, "C": 25}", arrayify (class=collections #args=1) Walks through a nested map/array, converting any map with consecutive keys "1", "2", ... into an array. Useful to wrap the output of unflatten. asin (class=math #args=1) Inverse trigonometric sine. asinh (class=math #args=1) Inverse hyperbolic sine. asserting_absent (class=typing #args=1) Aborts with an error if is_absent on the argument returns false, else returns its argument. asserting_array (class=typing #args=1) Aborts with an error if is_array on the argument returns false, else returns its argument. asserting_bool (class=typing #args=1) Aborts with an error if is_bool on the argument returns false, else returns its argument. asserting_boolean (class=typing #args=1) Aborts with an error if is_boolean on the argument returns false, else returns its argument. asserting_empty (class=typing #args=1) Aborts with an error if is_empty on the argument returns false, else returns its argument. asserting_empty_map (class=typing #args=1) Aborts with an error if is_empty_map on the argument returns false, else returns its argument. asserting_error (class=typing #args=1) Aborts with an error if is_error on the argument returns false, else returns its argument. asserting_float (class=typing #args=1) Aborts with an error if is_float on the argument returns false, else returns its argument. asserting_int (class=typing #args=1) Aborts with an error if is_int on the argument returns false, else returns its argument. asserting_map (class=typing #args=1) Aborts with an error if is_map on the argument returns false, else returns its argument. asserting_nonempty_map (class=typing #args=1) Aborts with an error if is_nonempty_map on the argument returns false, else returns its argument. asserting_not_array (class=typing #args=1) Aborts with an error if is_not_array on the argument returns false, else returns its argument. asserting_not_empty (class=typing #args=1) Aborts with an error if is_not_empty on the argument returns false, else returns its argument. asserting_not_map (class=typing #args=1) Aborts with an error if is_not_map on the argument returns false, else returns its argument. asserting_not_null (class=typing #args=1) Aborts with an error if is_not_null on the argument returns false, else returns its argument. asserting_null (class=typing #args=1) Aborts with an error if is_null on the argument returns false, else returns its argument. asserting_numeric (class=typing #args=1) Aborts with an error if is_numeric on the argument returns false, else returns its argument. asserting_present (class=typing #args=1) Aborts with an error if is_present on the argument returns false, else returns its argument. asserting_string (class=typing #args=1) Aborts with an error if is_string on the argument returns false, else returns its argument. atan (class=math #args=1) One-argument arctangent. atan2 (class=math #args=2) Two-argument arctangent. atanh (class=math #args=1) Inverse hyperbolic tangent. bitcount (class=arithmetic #args=1) Count of 1-bits. boolean (class=conversion #args=1) Convert int/float/bool/string to boolean. capitalize (class=string #args=1) Convert string's first character to uppercase. cbrt (class=math #args=1) Cube root. ceil (class=math #args=1) Ceiling: nearest integer at or above. clean_whitespace (class=string #args=1) Same as collapse_whitespace and strip. collapse_whitespace (class=string #args=1) Strip repeated whitespace from string. cos (class=math #args=1) Trigonometric cosine. cosh (class=math #args=1) Hyperbolic cosine. depth (class=collections #args=1) Prints maximum depth of map/array. Scalars have depth 0. dhms2fsec (class=time #args=1) Recovers floating-point seconds as in dhms2fsec("5d18h53m20.250000s") = 500000.250000 dhms2sec (class=time #args=1) Recovers integer seconds as in dhms2sec("5d18h53m20s") = 500000 erf (class=math #args=1) Error function. erfc (class=math #args=1) Complementary error function. every (class=higher-order-functions #args=2) Given a map or array as first argument and a function as second argument, yields a boolean true if the argument function returns true for every array/map element, false otherwise. For arrays, the function should take one argument, for array element; for maps, it should take two, for map-element key and value. In either case it should return a boolean. Examples: Array example: every(["a", "b", "c"], func(e) {return $[e] >= 0}) Map example: every({"a": "foo", "b": "bar"}, func(k,v) {return $[k] == v}) exp (class=math #args=1) Exponential function e**x. expm1 (class=math #args=1) e**x - 1. flatten (class=collections #args=2,3) Flattens multi-level maps to single-level ones. Useful for nested JSON-like structures for non-JSON file formats like CSV. Examples: flatten("a", ".", {"b": { "c": 4 }}) is {"a.b.c" : 4}. flatten("", ".", {"a": { "b": 3 }}) is {"a.b" : 3}. Two-argument version: flatten($*, ".") is the same as flatten("", ".", $*). float (class=conversion #args=1) Convert int/float/bool/string to float. floor (class=math #args=1) Floor: nearest integer at or below. fmtnum (class=conversion #args=2) Convert int/float/bool to string using printf-style format string, e.g. '$s = fmtnum($n, "%08d")' or '$t = fmtnum($n, "%.6e")'. fold (class=higher-order-functions #args=3) Given a map or array as first argument and a function as second argument, accumulates entries into a final output -- for example, sum or product. For arrays, the function should take two arguments, for accumulated value and array element. For maps, it should take four arguments, for accumulated key and value, and map-element key and value; it should return the updated accumulator as a new key-value pair (i.e. a single-entry map). The start value for the accumulator is taken from the third argument. Examples: Array example: fold([1,2,3,4,5], func(acc,e) {return acc + e**3}, 10000) returns 10225. Map example: fold({"a":1, "b":3, "c": 5}, func(acck,accv,ek,ev) {return {"sum": accv+ev**2}}, {"sum":10000}) returns 10035. fsec2dhms (class=time #args=1) Formats floating-point seconds as in fsec2dhms(500000.25) = "5d18h53m20.250000s" fsec2hms (class=time #args=1) Formats floating-point seconds as in fsec2hms(5000.25) = "01:23:20.250000" get_keys (class=collections #args=1) Returns array of keys of map or array get_values (class=collections #args=1) Returns array of keys of map or array -- in the latter case, returns a copy of the array gmt2localtime (class=time #args=1,2) Convert from a GMT-time string to a local-time string. Consulting $TZ unless second argument is supplied. Examples: gmt2localtime("1999-12-31T22:00:00Z") = "2000-01-01 00:00:00" with TZ="Asia/Istanbul" gmt2localtime("1999-12-31T22:00:00Z", "Asia/Istanbul") = "2000-01-01 00:00:00" gmt2sec (class=time #args=1) Parses GMT timestamp as integer seconds since the epoch. Example: gmt2sec("2001-02-03T04:05:06Z") = 981173106 gsub (class=string #args=3) '$name=gsub($name, "old", "new")' (replace all). haskey (class=collections #args=2) True/false if map has/hasn't key, e.g. 'haskey($*, "a")' or 'haskey(mymap, mykey)', or true/false if array index is in bounds / out of bounds. Error if 1st argument is not a map or array. Note -n..-1 alias to 1..n in Miller arrays. hexfmt (class=conversion #args=1) Convert int to hex string, e.g. 255 to "0xff". hms2fsec (class=time #args=1) Recovers floating-point seconds as in hms2fsec("01:23:20.250000") = 5000.250000 hms2sec (class=time #args=1) Recovers integer seconds as in hms2sec("01:23:20") = 5000 hostname (class=system #args=0) Returns the hostname as a string. int (class=conversion #args=1) Convert int/float/bool/string to int. invqnorm (class=math #args=1) Inverse of normal cumulative distribution function. Note that invqorm(urand()) is normally distributed. is_absent (class=typing #args=1) False if field is present in input, true otherwise is_array (class=typing #args=1) True if argument is an array. is_bool (class=typing #args=1) True if field is present with boolean value. Synonymous with is_boolean. is_boolean (class=typing #args=1) True if field is present with boolean value. Synonymous with is_bool. is_empty (class=typing #args=1) True if field is present in input with empty string value, false otherwise. is_empty_map (class=typing #args=1) True if argument is a map which is empty. is_error (class=typing #args=1) True if if argument is an error, such as taking string length of an integer. is_float (class=typing #args=1) True if field is present with value inferred to be float is_int (class=typing #args=1) True if field is present with value inferred to be int is_map (class=typing #args=1) True if argument is a map. is_nonempty_map (class=typing #args=1) True if argument is a map which is non-empty. is_not_array (class=typing #args=1) True if argument is not an array. is_not_empty (class=typing #args=1) False if field is present in input with empty value, true otherwise is_not_map (class=typing #args=1) True if argument is not a map. is_not_null (class=typing #args=1) False if argument is null (empty or absent), true otherwise. is_null (class=typing #args=1) True if argument is null (empty or absent), false otherwise. is_numeric (class=typing #args=1) True if field is present with value inferred to be int or float is_present (class=typing #args=1) True if field is present in input, false otherwise. is_string (class=typing #args=1) True if field is present with string (including empty-string) value joink (class=conversion #args=2) Makes string from map/array keys. Examples: joink({"a":3,"b":4,"c":5}, ",") = "a,b,c". joink([1,2,3], ",") = "1,2,3". joinkv (class=conversion #args=3) Makes string from map/array key-value pairs. Examples: joinkv([3,4,5], "=", ",") = "1=3,2=4,3=5" joinkv({"a":3,"b":4,"c":5}, "=", ",") = "a=3,b=4,c=5" joinv (class=conversion #args=2) Makes string from map/array values. Examples: joinv([3,4,5], ",") = "3,4,5" joinv({"a":3,"b":4,"c":5}, ",") = "3,4,5" json_parse (class=collections #args=1) Converts value from JSON-formatted string. json_stringify (class=collections #args=1,2) Converts value to JSON-formatted string. Default output is single-line. With optional second boolean argument set to true, produces multiline output. leafcount (class=collections #args=1) Counts total number of terminal values in map/array. For single-level map/array, same as length. length (class=collections #args=1) Counts number of top-level entries in array/map. Scalars have length 1. localtime2gmt (class=time #args=1,2) Convert from a local-time string to a GMT-time string. Consults $TZ unless second argument is supplied. Examples: localtime2gmt("2000-01-01 00:00:00") = "1999-12-31T22:00:00Z" with TZ="Asia/Istanbul" localtime2gmt("2000-01-01 00:00:00", "Asia/Istanbul") = "1999-12-31T22:00:00Z" localtime2sec (class=time #args=1,2) Parses local timestamp as integer seconds since the epoch. Consults $TZ environment variable, unless second argument is supplied. Examples: localtime2sec("2001-02-03 04:05:06") = 981165906 with TZ="Asia/Istanbul" localtime2sec("2001-02-03 04:05:06", "Asia/Istanbul") = 981165906" log (class=math #args=1) Natural (base-e) logarithm. log10 (class=math #args=1) Base-10 logarithm. log1p (class=math #args=1) log(1-x). logifit (class=math #args=3) Given m and b from logistic regression, compute fit: $yhat=logifit($x,$m,$b). lstrip (class=string #args=1) Strip leading whitespace from string. madd (class=arithmetic #args=3) a + b mod m (integers) mapdiff (class=collections #args=variadic) With 0 args, returns empty map. With 1 arg, returns copy of arg. With 2 or more, returns copy of arg 1 with all keys from any of remaining argument maps removed. mapexcept (class=collections #args=variadic) Returns a map with keys from remaining arguments, if any, unset. Remaining arguments can be strings or arrays of string. E.g. 'mapexcept({1:2,3:4,5:6}, 1, 5, 7)' is '{3:4}' and 'mapexcept({1:2,3:4,5:6}, [1, 5, 7])' is '{3:4}'. mapselect (class=collections #args=variadic) Returns a map with only keys from remaining arguments set. Remaining arguments can be strings or arrays of string. E.g. 'mapselect({1:2,3:4,5:6}, 1, 5, 7)' is '{1:2,5:6}' and 'mapselect({1:2,3:4,5:6}, [1, 5, 7])' is '{1:2,5:6}'. mapsum (class=collections #args=variadic) With 0 args, returns empty map. With >= 1 arg, returns a map with key-value pairs from all arguments. Rightmost collisions win, e.g. 'mapsum({1:2,3:4},{1:5})' is '{1:5,3:4}'. max (class=math #args=variadic) Max of n numbers; null loses. md5 (class=hashing #args=1) MD5 hash. mexp (class=arithmetic #args=3) a ** b mod m (integers) min (class=math #args=variadic) Min of n numbers; null loses. mmul (class=arithmetic #args=3) a * b mod m (integers) msub (class=arithmetic #args=3) a - b mod m (integers) os (class=system #args=0) Returns the operating-system name as a string. pow (class=arithmetic #args=2) Exponentiation. Same as **, but as a function. qnorm (class=math #args=1) Normal cumulative distribution function. reduce (class=higher-order-functions #args=2) Given a map or array as first argument and a function as second argument, accumulates entries into a final output -- for example, sum or product. For arrays, the function should take two arguments, for accumulated value and array element, and return the accumulated element. For maps, it should take four arguments, for accumulated key and value, and map-element key and value; it should return the updated accumulator as a new key-value pair (i.e. a single-entry map). The start value for the accumulator is the first element for arrays, or the first element's key-value pair for maps. Examples: Array example: reduce([1,2,3,4,5], func(acc,e) {return acc + e**3}) returns 225. Map example: reduce({"a":1, "b":3, "c": 5}, func(acck,accv,ek,ev) {return {"sum_of_squares": accv + ev**2}}) returns {"sum_of_squares": 35}. regextract (class=string #args=2) '$name=regextract($name, "[A-Z]{3}[0-9]{2}")' regextract_or_else (class=string #args=3) '$name=regextract_or_else($name, "[A-Z]{3}[0-9]{2}", "default")' round (class=math #args=1) Round to nearest integer. roundm (class=math #args=2) Round to nearest multiple of m: roundm($x,$m) is the same as round($x/$m)*$m. rstrip (class=string #args=1) Strip trailing whitespace from string. sec2dhms (class=time #args=1) Formats integer seconds as in sec2dhms(500000) = "5d18h53m20s" sec2gmt (class=time #args=1,2) Formats seconds since epoch as GMT timestamp. Leaves non-numbers as-is. With second integer argument n, includes n decimal places for the seconds part. Examples: sec2gmt(1234567890) = "2009-02-13T23:31:30Z" sec2gmt(1234567890.123456) = "2009-02-13T23:31:30Z" sec2gmt(1234567890.123456, 6) = "2009-02-13T23:31:30.123456Z" sec2gmtdate (class=time #args=1) Formats seconds since epoch (integer part) as GMT timestamp with year-month-date. Leaves non-numbers as-is. Example: sec2gmtdate(1440768801.7) = "2015-08-28". sec2hms (class=time #args=1) Formats integer seconds as in sec2hms(5000) = "01:23:20" sec2localdate (class=time #args=1,2) Formats seconds since epoch (integer part) as local timestamp with year-month-date. Leaves non-numbers as-is. Consults $TZ environment variable unless second argument is supplied. Examples: sec2localdate(1440768801.7) = "2015-08-28" with TZ="Asia/Istanbul" sec2localdate(1440768801.7, "Asia/Istanbul") = "2015-08-28" sec2localtime (class=time #args=1,2,3) Formats seconds since epoch (integer part) as local timestamp. Consults $TZ environment variable unless third argument is supplied. Leaves non-numbers as-is. With second integer argument n, includes n decimal places for the seconds part Examples: sec2localtime(1234567890) = "2009-02-14 01:31:30" with TZ="Asia/Istanbul" sec2localtime(1234567890.123456) = "2009-02-14 01:31:30" with TZ="Asia/Istanbul" sec2localtime(1234567890.123456, 6) = "2009-02-14 01:31:30.123456" with TZ="Asia/Istanbul" sec2localtime(1234567890.123456, 6, "Asia/Istanbul") = "2009-02-14 01:31:30.123456" select (class=higher-order-functions #args=2) Given a map or array as first argument and a function as second argument, includes each input element in the output if the function returns true. For arrays, the function should take one argument, for array element; for maps, it should take two, for map-element key and value. In either case it should return a boolean. Examples: Array example: select([1,2,3,4,5], func(e) {return e >= 3}) returns [3, 4, 5]. Map example: select({"a":1, "b":3, "c":5}, func(k,v) {return v >= 3}) returns {"b":3, "c": 5}. sgn (class=math #args=1) +1, 0, -1 for positive, zero, negative input respectively. sha1 (class=hashing #args=1) SHA1 hash. sha256 (class=hashing #args=1) SHA256 hash. sha512 (class=hashing #args=1) SHA512 hash. sin (class=math #args=1) Trigonometric sine. sinh (class=math #args=1) Hyperbolic sine. sort (class=higher-order-functions #args=1-2) Given a map or array as first argument and string flags or function as optional second argument, returns a sorted copy of the input. With one argument, sorts array elements naturally, and maps naturally by map keys. If the second argument is a string, it can contain any of "f" for lexical (default "n" for natural/numeric), "), "c" for case-folded lexical, and "r" for reversed/descending sort. If the second argument is a function, then for arrays it should take two arguments a and b, returning < 0, 0, or > 0 as a < b, a == b, or a > b respectively; for maps the function should take four arguments ak, av, bk, and bv, again returning < 0, 0, or > 0, using a and b's keys and values. Examples: Array example: sort([5,2,3,1,4], func(a,b) {return b <=> a}) returns [5,4,3,2,1]. Map example: sort({"c":2,"a":3,"b":1}, func(ak,av,bk,bv) {return bv <=> av}) returns {"a":3,"c":2,"b":1}. splita (class=conversion #args=2) Splits string into array with type inference. Example: splita("3,4,5", ",") = [3,4,5] splitax (class=conversion #args=2) Splits string into array without type inference. Example: splita("3,4,5", ",") = ["3","4","5"] splitkv (class=conversion #args=3) Splits string by separators into map with type inference. Example: splitkv("a=3,b=4,c=5", "=", ",") = {"a":3,"b":4,"c":5} splitkvx (class=conversion #args=3) Splits string by separators into map without type inference (keys and values are strings). Example: splitkvx("a=3,b=4,c=5", "=", ",") = {"a":"3","b":"4","c":"5"} splitnv (class=conversion #args=2) Splits string by separator into integer-indexed map with type inference. Example: splitnv("a,b,c", ",") = {"1":"a","2":"b","3":"c"} splitnvx (class=conversion #args=2) Splits string by separator into integer-indexed map without type inference (values are strings). Example: splitnvx("3,4,5", ",") = {"1":"3","2":"4","3":"5"} sqrt (class=math #args=1) Square root. ssub (class=string #args=3) Like sub but does no regexing. No characters are special. strftime (class=time #args=2) Formats seconds since the epoch as timestamp. Format strings are as in the C library (please see "man strftime" on your system), with the Miller-specific addition of "%1S" through "%9S" which format the seconds with 1 through 9 decimal places, respectively. ("%S" uses no decimal places.) See also strftime_local. Examples: strftime(1440768801.7,"%Y-%m-%dT%H:%M:%SZ") = "2015-08-28T13:33:21Z" strftime(1440768801.7,"%Y-%m-%dT%H:%M:%3SZ") = "2015-08-28T13:33:21.700Z" strftime_local (class=time #args=2,3) Like strftime but consults the $TZ environment variable to get local time zone. Examples: strftime_local(1440768801.7, "%Y-%m-%d %H:%M:%S %z") = "2015-08-28 16:33:21 +0300" with TZ="Asia/Istanbul" strftime_local(1440768801.7, "%Y-%m-%d %H:%M:%3S %z") = "2015-08-28 16:33:21.700 +0300" with TZ="Asia/Istanbul" strftime_local(1440768801.7, "%Y-%m-%d %H:%M:%3S %z", "Asia/Istanbul") = "2015-08-28 16:33:21.700 +0300" string (class=conversion #args=1) Convert int/float/bool/string/array/map to string. strip (class=string #args=1) Strip leading and trailing whitespace from string. strlen (class=string #args=1) String length. strptime (class=time #args=2) strptime: Parses timestamp as floating-point seconds since the epoch. See also strptime_local. Examples: strptime("2015-08-28T13:33:21Z", "%Y-%m-%dT%H:%M:%SZ") = 1440768801.000000 strptime("2015-08-28T13:33:21.345Z", "%Y-%m-%dT%H:%M:%SZ") = 1440768801.345000 strptime("1970-01-01 00:00:00 -0400", "%Y-%m-%d %H:%M:%S %z") = 14400 strptime("1970-01-01 00:00:00 EET", "%Y-%m-%d %H:%M:%S %Z") = -7200 strptime_local (class=time #args=2,3) Like stpftime but consults the $TZ environment variable to get local time zone. Examples: strptime_local("2015-08-28T13:33:21Z", "%Y-%m-%dT%H:%M:%SZ") = 1440758001 with TZ="Asia/Istanbul" strptime_local("2015-08-28T13:33:21.345Z","%Y-%m-%dT%H:%M:%SZ") = 1440758001.345 with TZ="Asia/Istanbul" strptime_local("2015-08-28 13:33:21", "%Y-%m-%d %H:%M:%S") = 1440758001 with TZ="Asia/Istanbul" strptime_local("2015-08-28 13:33:21", "%Y-%m-%d %H:%M:%S", "Asia/Istanbul") = 1440758001 sub (class=string #args=3) '$name=sub($name, "old", "new")' (replace once). substr (class=string #args=3) substr is an alias for substr0. See also substr1. Miller is generally 1-up with all array and string indices, but, this is a backward-compatibility issue with Miller 5 and below. Arrays are new in Miller 6; the substr function is older. substr0 (class=string #args=3) substr0(s,m,n) gives substring of s from 0-up position m to n inclusive. Negative indices -len .. -1 alias to 0 .. len-1. See also substr and substr1. substr1 (class=string #args=3) substr1(s,m,n) gives substring of s from 1-up position m to n inclusive. Negative indices -len .. -1 alias to 1 .. len. See also substr and substr0. system (class=system #args=1) Run command string, yielding its stdout minus final carriage return. systime (class=time #args=0) help string will go here systimeint (class=time #args=0) help string will go here tan (class=math #args=1) Trigonometric tangent. tanh (class=math #args=1) Hyperbolic tangent. tolower (class=string #args=1) Convert string to lowercase. toupper (class=string #args=1) Convert string to uppercase. truncate (class=string #args=2) Truncates string first argument to max length of int second argument. typeof (class=typing #args=1) Convert argument to type of argument (e.g. "str"). For debug. unflatten (class=collections #args=2) Reverses flatten. Useful for nested JSON-like structures for non-JSON file formats like CSV. See also arrayify. Example: unflatten({"a.b.c" : 4}, ".") is {"a": "b": { "c": 4 }}. uptime (class=time #args=0) help string will go here urand (class=math #args=0) Floating-point numbers uniformly distributed on the unit interval. Example: Int-valued example: '$n=floor(20+urand()*11)'. urand32 (class=math #args=0) Integer uniformly distributed 0 and 2**32-1 inclusive. urandelement (class=math #args=1) Random sample from the first argument, which must be an non-empty array. urandint (class=math #args=2) Integer uniformly distributed between inclusive integer endpoints. urandrange (class=math #args=2) Floating-point numbers uniformly distributed on the interval [a, b). version (class=system #args=0) Returns the Miller version as a string. ! (class=boolean #args=1) Logical negation. != (class=boolean #args=2) String/numeric inequality. Mixing number and string results in string compare. !=~ (class=boolean #args=2) String (left-hand side) does not match regex (right-hand side), e.g. '$name !=~ "^a.*b$"'. % (class=arithmetic #args=2) Remainder; never negative-valued (pythonic). & (class=arithmetic #args=2) Bitwise AND. && (class=boolean #args=2) Logical AND. * (class=arithmetic #args=2) Multiplication, with integer*integer overflow to float. ** (class=arithmetic #args=2) Exponentiation. Same as pow, but as an infix operator. + (class=arithmetic #args=1,2) Addition as binary operator; unary plus operator. - (class=arithmetic #args=1,2) Subtraction as binary operator; unary negation operator. . (class=string #args=2) String concatenation. .* (class=arithmetic #args=2) Multiplication, with integer-to-integer overflow. .+ (class=arithmetic #args=2) Addition, with integer-to-integer overflow. .- (class=arithmetic #args=2) Subtraction, with integer-to-integer overflow. ./ (class=arithmetic #args=2) Integer division; not pythonic. / (class=arithmetic #args=2) Division. Integer / integer is floating-point. // (class=arithmetic #args=2) Pythonic integer division, rounding toward negative. < (class=boolean #args=2) String/numeric less-than. Mixing number and string results in string compare. << (class=arithmetic #args=2) Bitwise left-shift. <= (class=boolean #args=2) String/numeric less-than-or-equals. Mixing number and string results in string compare. <=> (class=boolean #args=2) Comparator, nominally for sorting. Given a <=> b, returns <0, 0, >0 as a < b, a == b, or a > b, respectively. == (class=boolean #args=2) String/numeric equality. Mixing number and string results in string compare. =~ (class=boolean #args=2) String (left-hand side) matches regex (right-hand side), e.g. '$name =~ "^a.*b$"'. > (class=boolean #args=2) String/numeric greater-than. Mixing number and string results in string compare. >= (class=boolean #args=2) String/numeric greater-than-or-equals. Mixing number and string results in string compare. >> (class=arithmetic #args=2) Bitwise signed right-shift. >>> (class=arithmetic #args=2) Bitwise unsigned right-shift. ?: (class=boolean #args=3) Standard ternary operator. ?? (class=boolean #args=2) Absent-coalesce operator. $a ?? 1 evaluates to 1 if $a isn't defined in the current record. ??? (class=boolean #args=2) Absent-coalesce operator. $a ?? 1 evaluates to 1 if $a isn't defined in the current record, or has empty value. ^ (class=arithmetic #args=2) Bitwise XOR. ^^ (class=boolean #args=2) Logical XOR. | (class=arithmetic #args=2) Bitwise OR. || (class=boolean #args=2) Logical OR. ~ (class=arithmetic #args=1) Bitwise NOT. Beware '$y=~$x' since =~ is the regex-match operator: try '$y = ~$x'.
KEYWORDS FOR PUT AND FILTER
all all: used in "emit1", "emit", "emitp", and "unset" as a synonym for @* begin begin: defines a block of statements to be executed before input records are ingested. The body statements must be wrapped in curly braces. Example: 'begin { @count = 0 }' bool bool: declares a boolean local variable in the current curly-braced scope. Type-checking happens at assignment: 'bool b = 1' is an error. break break: causes execution to continue after the body of the current for/while/do-while loop. call call: used for invoking a user-defined subroutine. Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)' continue continue: causes execution to skip the remaining statements in the body of the current for/while/do-while loop. For-loop increments are still applied. do do: with "while", introduces a do-while loop. The body statements must be wrapped in curly braces. dump dump: prints all currently defined out-of-stream variables immediately to stdout as JSON. With >, >>, or |, the data do not become part of the output record stream but are instead redirected. The > and >> are for write and append, as in the shell, but (as with awk) the file-overwrite for > is on first write, not per record. The | is for piping to a process which will process the data. There will be one open file for each distinct file name (for > and >>) or one subordinate process for each distinct value of the piped-to command (for |). Output-formatting flags are taken from the main command line. Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump }' Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump > "mytap.dat"}' Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump >> "mytap.dat"}' Example: mlr --from f.dat put -q '@v[NR]=$*; end { dump | "jq .[]"}' edump edump: prints all currently defined out-of-stream variables immediately to stderr as JSON. Example: mlr --from f.dat put -q '@v[NR]=$*; end { edump }' elif elif: the way Miller spells "else if". The body statements must be wrapped in curly braces. else else: terminates an if/elif/elif chain. The body statements must be wrapped in curly braces. emit1 emit1: inserts an out-of-stream variable into the output record stream. Unlike the other map variants, side-by-sides, indexing, and redirection are not supported, but you can emit any map-valued expression. Example: mlr --from f.dat put 'emit1 $*' Example: mlr --from f.dat put 'emit1 mapsum({"id": NR}, $*)' Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. emit emit: inserts an out-of-stream variable into the output record stream. Hashmap indices present in the data but not slotted by emit arguments are not output. With >, >>, or |, the data do not become part of the output record stream but are instead redirected. The > and >> are for write and append, as in the shell, but (as with awk) the file-overwrite for > is on first write, not per record. The | is for piping to a process which will process the data. There will be one open file for each distinct file name (for > and >>) or one subordinate process for each distinct value of the piped-to command (for |). Output-formatting flags are taken from the main command line. You can use any of the output-format command-line flags, e.g. --ocsv, --ofs, etc., to control the format of the output if the output is redirected. See also mlr -h. Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, $*' Example: mlr --from f.dat put 'emit > "/tmp/data-".$a, mapexcept($*, "a")' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums' Example: mlr --from f.dat put --ojson '@sums[$a][$b]+=$x; emit > "tap-".$a.$b.".dat", @sums' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @sums, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > "mytap.dat", @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit >> "mytap.dat", @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "gzip > mytap.dat.gz", @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit > stderr, @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emit | "grep somepattern", @*, "index1", "index2"' Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. emitf emitf: inserts non-indexed out-of-stream variable(s) side-by-side into the output record stream. With >, >>, or |, the data do not become part of the output record stream but are instead redirected. The > and >> are for write and append, as in the shell, but (as with awk) the file-overwrite for > is on first write, not per record. The | is for piping to a process which will process the data. There will be one open file for each distinct file name (for > and >>) or one subordinate process for each distinct value of the piped-to command (for |). Output-formatting flags are taken from the main command line. You can use any of the output-format command-line flags, e.g. --ocsv, --ofs, etc., to control the format of the output if the output is redirected. See also mlr -h. Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a' Example: mlr --from f.dat put --oxtab '@a=$i;@b+=$x;@c+=$y; emitf > "tap-".$i.".dat", @a' Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf @a, @b, @c' Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > "mytap.dat", @a, @b, @c' Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf >> "mytap.dat", @a, @b, @c' Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf > stderr, @a, @b, @c' Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern", @a, @b, @c' Example: mlr --from f.dat put '@a=$i;@b+=$x;@c+=$y; emitf | "grep somepattern > mytap.dat", @a, @b, @c' Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. emitp emitp: inserts an out-of-stream variable into the output record stream. Hashmap indices present in the data but not slotted by emitp arguments are output concatenated with ":". With >, >>, or |, the data do not become part of the output record stream but are instead redirected. The > and >> are for write and append, as in the shell, but (as with awk) the file-overwrite for > is on first write, not per record. The | is for piping to a process which will process the data. There will be one open file for each distinct file name (for > and >>) or one subordinate process for each distinct value of the piped-to command (for |). Output-formatting flags are taken from the main command line. You can use any of the output-format command-line flags, e.g. --ocsv, --ofs, etc., to control the format of the output if the output is redirected. See also mlr -h. Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums' Example: mlr --from f.dat put --opprint '@sums[$a][$b]+=$x; emitp > "tap-".$a.$b.".dat", @sums' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @sums, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > "mytap.dat", @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp >> "mytap.dat", @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "gzip > mytap.dat.gz", @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp > stderr, @*, "index1", "index2"' Example: mlr --from f.dat put '@sums[$a][$b]+=$x; emitp | "grep somepattern", @*, "index1", "index2"' Please see https://miller.readthedocs.io://johnkerl.org/miller/doc for more information. end end: defines a block of statements to be executed after input records are ingested. The body statements must be wrapped in curly braces. Example: 'end { emit @count }' Example: 'end { eprint "Final count is " . @count }' eprint eprint: prints expression immediately to stderr. Example: mlr --from f.dat put -q 'eprint "The sum of x and y is ".($x+$y)' Example: mlr --from f.dat put -q 'for (k, v in $*) { eprint k . " => " . v }' Example: mlr --from f.dat put '(NR % 1000 == 0) { eprint "Checkpoint ".NR}' eprintn eprintn: prints expression immediately to stderr, without trailing newline. Example: mlr --from f.dat put -q 'eprintn "The sum of x and y is ".($x+$y); eprint ""' false false: the boolean literal value. filter filter: includes/excludes the record in the output record stream. Example: mlr --from f.dat put 'filter (NR == 2 || $x > 5.4)' Instead of put with 'filter false' you can simply use put -q. The following uses the input record to accumulate data but only prints the running sum without printing the input record: Example: mlr --from f.dat put -q '@running_sum += $x * $y; emit @running_sum' float float: declares a floating-point local variable in the current curly-braced scope. Type-checking happens at assignment: 'float x = 0' is an error. for for: defines a for-loop using one of three styles. The body statements must be wrapped in curly braces. For-loop over stream record: Example: 'for (k, v in $*) { ... }' For-loop over out-of-stream variables: Example: 'for (k, v in @counts) { ... }' Example: 'for ((k1, k2), v in @counts) { ... }' Example: 'for ((k1, k2, k3), v in @*) { ... }' C-style for-loop: Example: 'for (var i = 0, var b = 1; i < 10; i += 1, b *= 2) { ... }' func func: used for defining a user-defined function. Example: 'func f(a,b) { return sqrt(a**2+b**2)} $d = f($x, $y)' funct funct: used for saying that a function argument is a user-defined function. Example: 'func g(num a, num b, funct f) :num { return f(a**2+b**2) }' if if: starts an if/elif/elif chain. The body statements must be wrapped in curly braces. in in: used in for-loops over stream records or out-of-stream variables. int int: declares an integer local variable in the current curly-braced scope. Type-checking happens at assignment: 'int x = 0.0' is an error. map map: declares an map-valued local variable in the current curly-braced scope. Type-checking happens at assignment: 'map b = 0' is an error. map b = {} is always OK. map b = a is OK or not depending on whether a is a map. num num: declares an int/float local variable in the current curly-braced scope. Type-checking happens at assignment: 'num b = true' is an error. print print: prints expression immediately to stdout. Example: mlr --from f.dat put -q 'print "The sum of x and y is ".($x+$y)' Example: mlr --from f.dat put -q 'for (k, v in $*) { print k . " => " . v }' Example: mlr --from f.dat put '(NR % 1000 == 0) { print > stderr, "Checkpoint ".NR}' printn printn: prints expression immediately to stdout, without trailing newline. Example: mlr --from f.dat put -q 'printn "."; end { print "" }' return return: specifies the return value from a user-defined function. Omitted return statements (including via if-branches) result in an absent-null return value, which in turns results in a skipped assignment to an LHS. stderr stderr: Used for tee, emit, emitf, emitp, print, and dump in place of filename to print to standard error. stdout stdout: Used for tee, emit, emitf, emitp, print, and dump in place of filename to print to standard output. str str: declares a string local variable in the current curly-braced scope. Type-checking happens at assignment. subr subr: used for defining a subroutine. Example: 'subr s(k,v) { print k . " is " . v} call s("a", $a)' tee tee: prints the current record to specified file. This is an immediate print to the specified file (except for pprint format which of course waits until the end of the input stream to format all output). The > and >> are for write and append, as in the shell, but (as with awk) the file-overwrite for > is on first write, not per record. The | is for piping to a process which will process the data. There will be one open file for each distinct file name (for > and >>) or one subordinate process for each distinct value of the piped-to command (for |). Output-formatting flags are taken from the main command line. You can use any of the output-format command-line flags, e.g. --ocsv, --ofs, etc., to control the format of the output. See also mlr -h. emit with redirect and tee with redirect are identical, except tee can only output $*. Example: mlr --from f.dat put 'tee > "/tmp/data-".$a, $*' Example: mlr --from f.dat put 'tee >> "/tmp/data-".$a.$b, $*' Example: mlr --from f.dat put 'tee > stderr, $*' Example: mlr --from f.dat put -q 'tee | "tr \[a-z\\] \[A-Z\\]", $*' Example: mlr --from f.dat put -q 'tee | "tr \[a-z\\] \[A-Z\\] > /tmp/data-".$a, $*' Example: mlr --from f.dat put -q 'tee | "gzip > /tmp/data-".$a.".gz", $*' Example: mlr --from f.dat put -q --ojson 'tee | "gzip > /tmp/data-".$a.".gz", $*' true true: the boolean literal value. unset unset: clears field(s) from the current record, or an out-of-stream or local variable. Example: mlr --from f.dat put 'unset $x' Example: mlr --from f.dat put 'unset $*' Example: mlr --from f.dat put 'for (k, v in $*) { if (k =~ "a.*") { unset $[k] } }' Example: mlr --from f.dat put '...; unset @sums' Example: mlr --from f.dat put '...; unset @sums["green"]' Example: mlr --from f.dat put '...; unset @*' var var: declares an untyped local variable in the current curly-braced scope. Examples: 'var a=1', 'var xyz=""' while while: introduces a while loop, or with "do", introduces a do-while loop. The body statements must be wrapped in curly braces. ENV ENV: access to environment variables by name, e.g. '$home = ENV["HOME"]' FILENAME FILENAME: evaluates to the name of the current file being processed. FILENUM FILENUM: evaluates to the number of the current file being processed, starting with 1. FNR FNR: evaluates to the number of the current record within the current file being processed, starting with 1. Resets at the start of each file. IFS IFS: evaluates to the input field separator from the command line. IPS IPS: evaluates to the input pair separator from the command line. IRS IRS: evaluates to the input record separator from the command line, or to LF or CRLF from the input data if in autodetect mode (which is the default). M_E M_E: the mathematical constant e. M_PI M_PI: the mathematical constant pi. NF NF: evaluates to the number of fields in the current record. NR NR: evaluates to the number of the current record over all files being processed, starting with 1. Does not reset at the start of each file. OFS OFS: evaluates to the output field separator from the command line. OPS OPS: evaluates to the output pair separator from the command line. ORS ORS: evaluates to the output record separator from the command line, or to LF or CRLF from the input data if in autodetect mode (which is the default).
AUTHOR
Miller is written by John Kerl <kerl.john.r@gmail.com>. This manual page has been composed from Miller's help output by Eric MSP Veith <eveith@veith-m.de>.
SEE ALSO
awk(1), sed(1), cut(1), join(1), sort(1), RFC 4180: Common Format and MIME Type for Comma- Separated Values (CSV) Files, the Miller docsite https://miller.readthedocs.io 2022-01-10 MILLER(1)