lunar (1) clevercsv-detect.1.gz

Provided by: python3-clevercsv_0.7.5+ds-1build1_amd64 bug

NAME

       clevercsv-detect - Detect the dialect of a CSV file

SYNOPSIS

       clevercsv detect [-c | --consistency] [-e ENCODING | --encoding=ENCODING]
                        [-n NUM_CHARS | --num-chars=NUM_CHARS] [ -p | --plain |
                        -j | --json ] [--no-skip] [--add-runtime] <path>

DESCRIPTION

       Detect the dialect of a CSV file.

OPTIONS

       -h, --help
           show this help message and exit

       -c, --consistency
           By default, the dialect of CSV files is detected using atwo-step process. First, a
           strict set of checks is used to see if the file adheres to a very basic format (for
           example, when all cells in the file are integers). If none of these checks succeed,
           the data consistency measure of Van den Burg, et al. (2019) is used to detect the
           dialect. With this option, you can force the detection to always use the data
           consistency measure. This can be useful for testing or research purposes, for
           instance.

       -e, --encoding
           The file encoding of the given CSV file is automatically detected using chardet. While
           chardet is incredibly accurate, it is not perfect. In the rare cases that it makes a
           mistake in detecting the file encoding, you can override the encoding by providing it
           through this flag. Moreover, when you have a number of CSV files with a known file
           encoding, you can use this option to speed up the code generation process.

       -n, --num-chars
           On large CSV files, dialect detection can sometimes be a bit slow due to the large
           number of possible dialects to consider. To alleviate this, you can limit the number
           of characters to use for detection.

           One aspect to keep in mind is that CleverCSV may need to read a specific number of
           characters to be able to correctly infer the dialect. For example, in the ``imdb.csv``
           file in the GitHub repository, the correct dialect can only be found after at least 66
           lines of the file are read. Therefore, if there is availability to run CleverCSV on
           the entire file, that is generally recommended.

       -p, --plain
           Print the components of the dialect on separate lines

       -j, --json
           Print the dialect to standard output in the form of a JSON object. This object will
           always have the 'delimiter', 'quotechar', 'escapechar', and 'strict' keys. If
           --add-runtime is specified, it will also have a 'runtime' key.

       --no-skip
           The data consistency score used for dialect detection consists of two components: a
           pattern score and a type score. The type score lies between 0 and 1. When computing
           the data consistency measures for different dialects, we skip the computation of the
           type score if we see that the pattern score is lower than the best data consistency
           score we've seen so far. This option can be used to disable this behaviour and compute
           the type score for all dialects. This is mainly useful for debugging and testing
           purposes.

       --add-runtime
           Add the runtime of the detection to the detection output.

       <path>
           Path to the CSV file

CLEVERCSV

       Part of the CleverCSV suite