Provided by: unicode_0.9.7_all
NAME
unicode - command line unicode database query tool
SYNOPSIS
unicode [options] string
DESCRIPTION
This manual page documents the unicode command. unicode is a command line unicode database query tool.
OPTIONS
-h --help Show help and exit. -x --hexadecimal Assume string to be a hexadecimal number -d --decimal Assume string to be a decimal number -o --octal Assume string to be an octal number -b --binary Assume string to be a binary number -r --regexp Assume string to be a regular expression -s --string Assume string to be a sequence of characters -a --auto Try to guess type of string from one of the above (default) -mMAXCOUNT --max=MAXCOUNT Maximal number of codepoints to display, default: 20; use 0 for unlimited -iCHARSET --io=IOCHARSET I/O character set. For maximal pleasure, run unicode on UTF-8 capable terminal and specify IOCHARSET to be UTF-8. unicode tries to guess this value from your locale, so with properly set up locale, you should not need to specify it. --fcp=CHARSET --fromcp=CHARSET Convert numerical arguments from this encoding, default: no conversion. Multibyte encodings are supported. This is ignored for non-numerical arguments. -cADDCHARSET --charset-add=ADDCHARSET Show hexadecimal reprezentation of displayed characters in this additional charset. -CUSE_COLOUR --colour=USE_COLOUR USE_COLOUR is one of on off auto --colour=on will use ANSI colour codes to colourise the output --colour=off won't use colours. --colour=auto will test if standard output is a tty, and use colours only when it is. --color is a synonym of --colour -v --verbose Be more verbose about displayed characters, e.g. display Unihan information, if available. -w --wikipedia Spawn browser pointing to Wikipedia entry about the character. --list List (approximately) all known encodings.
USAGE
unicode tries to guess the type of an argument. In particular, if the arguments looks like a valid hexadecimal representation of a Unicode codepoint, it will be considered to be such. Using unicode face will display information about U+FACE CJK COMPATIBILITY IDEOGRAPH-FACE, and it will not search for 'face' in character descriptions - for the latter, use: unicode -r face For example, you can use any of the following to display information about U+00E1 LATIN SMALL LETTER A WITH ACUTE (á): unicode 00E1 unicode U+00E1 unicode á unicode 'latin small letter a with acute' You can specify a range of characters as argumets, unicode will show these characters in nice tabular format, aligned to 256-byte boundaries. Use two dots ".." to indicate the range, e.g. unicode 0450..0520 will display the whole cyrillic and hebrew blocks (characters from U+0400 to U+05FF) unicode 0400.. will display just characters from U+0400 up to U+04FF Use --fromcp to query codepoints from other encodings: unicode --fromcp cp1250 -d 200 Multibyte encodings are supported: unicode --fromcp big5 -x aff3 and multi-char strings are supported, too: unicode --fromcp utf-8 -x c599c3adc5a5
BUGS
Tabular format does not deal well with full-width, combining, control and RTL characters.
SEE ALSO
ascii(1)
AUTHOR
Radovan Garabík <garabik @ kassiopeia.juls.savba.sk> 2003-01-31 UNICODE(1)