Provided by: ccextractor_0.94+ds1-1build1_amd64 bug


       CCExtractor - closed captions extractor


       ccextractor [options] inputfile1 [inputfile2...] [-o outputfilename] [-o1 outputfilename1]
       [-o2 outputfilename2]


       Extracts closed captions and teletext subtitles from video streams.   DVB,  .TS,  ReplayTV
       4000  and  5000,  dvr-ms, bttv, Tivo, Dish Network, .mp4, HDHomeRun are known to work.  It
       can do two things:

       - Save the data to a "raw", unprocessed file which you can later use
         as input for other tools.

       - Generate a subtitles file (.srt,.smi, or .txt) which you can directly
         use with your favourite player.


   File name related options

       inputfile: file(s) to process

       -o outputfilename

              Use -o parameters to define output filename if you  don't  like  the  default  ones
              (same as infile plus _1 or _2 when needed and file extension, e.g. .srt).

       -cf filename

              Write 'clean' data to a file. Cleans means the ES without TS or PES headers.


              Write  output  to stdout (console) instead of file. If stdout is used, then -o, -o1
              and -o2 can't be used. Also -stdout will redirect all messages to stderr (error).


              Dump the PES Header to stdout (console). This is used for debugging purposes to see
              the contents of each PES packet header.


              Write the DVB subtitle debug traces to console.


              Ignore PTS jumps (default).


              fix  pts  jumps.  Use this parameter if you experience timeline resets/jumps in the


              Reads input from stdin (console) instead of file.

       You can pass as many input files as you need. They will be processed in order.  If a  file
       name  is  suffixed by +, ccextractor will try to follow a numerical sequence. For example,
       DVD001.VOB+ means DVD001.VOB, DVD002.VOB and so on until there are no more files.   Output
       will  be  one  single  file  (either  raw  or srt). Use this if you made your recording in
       several cuts (to skip commercials for  example)  but  you  want  one  subtitle  file  with
       contiguous timing.

   Output file segmentation

       -outinterval x

              output in interval of x seconds

       --segmentonkeyonly -key

              When segmenting files, do it only after a I frame trying to behave like FFmpeg

   Network support

       -udp port

              Read the input via UDP (listening in the specified port) instead of reading a file.

       -udp [host:]port

              Read the input via UDP (listening in the specified port) instead of reading a file.
              Host can be a hostname or IPv4 address. If host is not specified  then  listens  on
              the local host.

       -udp [src@host:]port

              Read the input via UDP (listening in the specified port) instead of reading a file.
              Host and src can be a hostname or IPv4 address.  If  host  is  not  specified  then
              listens on the local host.

       -sendto host[:port]

              Sends data in BIN format to the server according to the CCExtractor's protocol over
              TCP. For IPv6 use [address]:port

       -tcp port

              Reads the input data in BIN format according to CCExtractor's  protocol,  listening
              specified port on the local host

       -tcppassword password

              Sets server password for new connections to tcp server

       -tcpdesc description

              Sends to the server short description about captions e.g. channel name or file name

   Options that affect what will be processed

       -1, -2, -12

              Output Field 1 data, Field 2 data, or both (DEFAULT is -1)


              To prevent overwriting of existing files. The output will be appended instead.


              When in srt/sami mode, process captions in channel 2 instead of channel 1.

       -svc --service N1[cs1],N2[cs2]...

              Enable  CEA-708  (DTVCC) captions processing for the listed services. The parameter
              is a comma delimited list of services numbers, such as "1,2" to process the primary
              and secondary language services. Pass "all" to process all services found.

              If  captions  in  a  service  are  stored  in 16-bit encoding, you can specify what
              charset  or  encoding  was  used.  Pass  its  name  after  service   number   (e.g.
              "1[EUC-KR],3" or "all[EUC-KR]") and it will encode specified charset to UTF-8 using
              iconv. See iconv documentation to check if required encoding/charset is supported.

              In general, if you want English subtitles you don't need to use  these  options  as
              they  are broadcast in field 1, channel 1. If you want the second language (usually
              Spanish) you may need to try -2, or -cc2, or both.

   Input formats

       With the exception of McPoodle's raw format, which is just the closed caption data with no
       other info, CCExtractor can usually detect the input format correctly. To force a specific


       where format is one of these:

              ts   -> For Transport Streams.

              ps   -> For Program Streams.

              es   -> For Elementary Streams.

              asf  -> ASF container (such as DVR-MS).

              wtv  -> Windows Television (WTV)

              bin  -> CCExtractor's own binary format.

              raw  -> For McPoodle's raw files.

              mp4  -> MP4/MOV/M4V and similar.

              mkv  -> Matroska container and WebM.

       -ts, -ps, -es, -mp4, -wtv and -asf (or --dvr-ms) can be used as shorts.

   Output formats


       where format is one of these:

              srt     -> SubRip (default, so not actually needed).

              ass/ssa -> SubStation Alpha.

              webvtt  -> WebVTT format

              webvtt-full -> WebVTT format with styling

              sami    -> MS Synchronized Accesible Media Interface.

              bin     -> CC data in CCExtractor's own binary format.

              raw     -> CC data in McPoodle's Broadcast format.

              dvdraw  -> CC data in McPoodle's DVD format.

              txt     ->  Transcript  (no  time  codes,  no  roll-up  captions,  just  the  plain

              ttxt    -> Timed Transcript (transcription with time info)

              smptett -> SMPTE Timed Text (W3C TTML) format.

              spupng   -> Set of .xml and .png files for use with dvdauthor's spumux.  See "Notes
              on spupng output format"

              null -> Don't produce any file output

              report -> Prints to stdout information about captions  in  specified  input.  Don't
              produce any file output

   Options that affect how input files will be processed

       -gt --goptime

              Use  GOP  for  timing  instead  of  PTS.  This only applies to Program or Transport
              Streams with MPEG2 data and overrides the default PTS timing. GOP timing is  always
              used for Elementary Streams.

       -nogt --nogoptime

              Never  use  GOP  timing  (use  PTS),  even if ccextractor detects GOP timing is the
              reasonable choice.

       -fp --fixpadding

              Fix padding - some cards (or providers, or  whatever)  seem  to  send  0000  as  CC
              padding instead of 8080. If you get bad timing, this might solve it.


              Use  90090  (instead  of  90000) as MPEG clock frequency. (reported to be needed at
              least by Panasonic DMR-ES15 DVD Recorder)

       -ve --videoedited

              By default, ccextractor will process input files in sequence as if  they  were  all
              one  large  file  (i.e.  split  by  a  generic,  non  video-aware  tool. If you are
              processing video hat was split with a editing tool, use -ve so ccextractor  doesn't
              try to rebuild the original timing.

       -s --stream [secs]

              Consider  the  file as a continuous stream that is growing as ccextractor processes
              it, so don't try to figure  out  its  size  and  don't  terminate  processing  when
              reaching  the  current  end  (i.e.  wait  for more data to arrive). If the optional
              parameter secs is present, it means the number of  seconds  without  any  new  data
              after  which  ccextractor  should exit. Use this parameter if you want to process a
              live stream but not kill ccextractor externally.  Note: If -s is used then only one
              input file is allowed.

       -poc  --usepicorder

              Use  the  pic_order_cnt_lsb  in AVC/H.264 data streams to order the CC information.
              The default way is to use the PTS information. Use this switch only when needed.


              Force MythTV code branch.


              Disable MythTV code branch.  The MythTV branch is needed for analog captures  where
              the  closed  caption  data  is  stored  in  the  VBI, such as those with bttv cards
              (Hauppage 250 for example). This is detected automatically so  you  don't  need  to
              worry about this unless autodetection doesn't work for you.


              This switch works around a bug in Windows 7's built in software to convert *.wtv to
              *.dvr-ms. For analog NTSC recordings  the  CC  information  is  marked  as  digital
              captions. Use this switch only when needed.


              Read  the  captions  from the MPEG2 video stream rather than the captions stream in
              WTV files

       -pn --program-number

              In TS mode, specifically select a program to process.  Not needed if  the  TS  only
              has  one.  If this parameter is not specified and CCExtractor detects more than one
              program in the input, it will list the programs found and terminate  without  doing
              anything, unless -autoprogram (see below) is used.


              If there's more than one program in the stream, just use the first one we find that
              contains a suitable stream.


              Don't try to find out the stream for  caption/teletext  data,  just  use  this  one


              Instead  of selecting the stream by its PID, select it by its type (pick the stream
              that has this type in the PMT)


              Assume the data is of this type, don't autodetect. This parameter may be needed  if
              -datapid or -datastreamtype is used and CCExtractor cannot determine how to process
              the stream. The value will usually be 2 (MPEG video) or 6 (MPEG private data).

       -haup --hauppauge

              If the video was recorder using a Hauppauge card, it might need special processing.
              This parameter will force the special treatment.


              In  MP4  files  the  closed caption data can be embedded in the video track or in a
              dedicated CC track. If a dedicated track is detected it will be  processed  instead
              of  the  video  track. If you need to force the video track to be processed instead
              use this option.


              Some streams come with broadcast date information. When  such  data  is  available,
              CCExtractor will set its time reference to the received data. Use this parameter if
              you prefer your own reference. Note: Current this only affects  Teletext  in  timed
              transcript with -datets.


              Ignore SCTE-20 data if present.


              Create a separate file for CSS instead of inline.


              Enable  debug  so  the  calculated  distance for each two strings is displayed. The
              output  includes  both  strings,  the  calculated  distance,  the  maximum  allowed
              distance, and whether the strings are ultimately considered equivalent or not, i.e.
              the calculated distance is less or equal than the max allowed..

       -anvid --analyzevideo

              Analyze the video stream even if it's not used for subtitles. This  allows  one  to
              provide video information.

   Levenshtein distance

              When  processing  teletext  files  CCExtractor  tries to correct typos by comparing
              consecutive lines. If line N+1 is almost identical  to  line  N  except  for  minor
              changes  (plus  next  characters)  then it assumes that line N that a typo that was
              corrected in N+1. This is currently implemented  in  teletext  because  it's  where
              samples  files  that  could  benefit  from this were available.  You can adjust, or
              disable, the algorithm settings with the following parameters.


              Don't attempt to correct typos with Levenshtein distance.

       -levdistmincnt value

              Minimum distance we always allow regardless of the length of the strings.Default 2.
              This  means that if the calculated distance is 0,1 or 2, we consider the strings to
              be equivalent.

       -levdistmaxpct value

              Maximum distance we allow, as a percentage of the shortest string  length.  Default
              10%.0  For example, consider a comparison of one string of 30 characters and one of
              60 characters. We want to determine whether the first 30 characters of  the  longer
              string  are  more or less the same as the shortest string, i.e. whether the longest
              string is the shortest one plus new characters and maybe  some  corrections.  Since
              the  shortest  string is 30 characters and  the default percentage is 10%, we would
              allow a distance of up to 3 between the first 30 characters.

   Options that affect what kind of output will be produced

       -chapters (Experimental)

              Produces a chapter file from MP4 files. Note that this must only be used  with  MP4
              files, for other files it will simply generate subtitles file.


              Append  a  BOM  (Byte  Order Mark) to output files.  Note that most text processing
              tools in linux will not like BOM.  This is the default in Windows builds.


              Do not append a BOM (Byte Order Mark) to output files. Note  that  this  may  break
              files when using Windows. This is the default in non-Windows builds.


              Encode subtitles in Unicode instead of Latin-1.


              Encode subtitles in UTF-8 (no longer needed. because UTF-8 is now the default).


              Encode subtitles in Latin-1

       -nofc --nofontcolor

              For .srt/.sami/.vtt, don't add font color tags.


              For .srt/.sami/.vtt, don't covert html unsafe character

       -nots --notypesetting

              For .srt/.sami/.vtt, don't add typesetting tags.


              Trim lines.

       -dc --defaultcolor

              Select  a  different  default  color  (instead of white). This causes all output in
              .srt/.smi/.vtt files to have a font tag, which makes  the  files  larger.  Add  the
              color you want in RGB, such as -dc #FF0000 for red.

       -sc --sentencecap

              Sentence capitalization. Use if you hate ALL CAPS in subtitles.

       -sbs --splitbysentence

              Split  output text so each frame contains a complete sentence. Timings are adjusted
              based on number of characters.

       --capfile -caf file

              Add the contents of 'file' to the list of  words  that  must  be  capitalized.  For
              example, if file is a plain text file that contains


              Whenever  those  words are found they will be written exactly as they appear in the
              file.  Use one line per word. Lines starting with #  are  considered  comments  and

       -unixts REF

              For timed transcripts that have an absolute date instead of a timestamp relative to
              the file start), use this time reference (UNIX timestamp). 0 => Use current  system
              time.   ccextractor  will  automatically  switch to transport stream UTC timestamps
              when available.


              In transcripts, write time as YYYYMMDDHHMMss,ms.


              In transcripts, write time as ss,ms


              Transcripts are generated with a specific format that is convenient for a  specific
              project,  feel  free to play with it but be aware that this format is really live -
              don't rely on its output format not changing between versions.


              Use LF (UNIX) instead of CRLF (DOS, Windows) as line terminator.


              Based on position on screen, attempt to determine the different speakers and a dash
              (-) when each of them talks (.srt/.vtt only, -trim required).

       -xmltv mode

              produce  an  XMLTV  file containing the EPG data from the source TS file. Mode: 1 =
              full output 2 = live output. 3 = both


              Create a .sem file for each output file that is open and delete it on file close.


              For DVB subtitles, also output the color of the subtitles, if the output format  is
              SRT or WebVTT.


              In DVB subtitles, disable color in output.

              For  DVB  subtitles, select which language's caption stream will be processed. e.g.
              'eng' for English. If there are multiple languages, only  this  specified  language
              stream will be processed (default).


              Manually select the name of the Tesseract .traineddata file. Helpful if you want to
              OCR a caption stream of one language  with  the  data  of  another  language.  e.g.
              '-dvblang chs -ocrlang chi_tra' will decode the Chinese (Simplified) caption stream
              but perform OCR using the Chinese (Traditional) trained data This  option  is  also
              helpful  when  the  traineddata  file  has non standard names that don't follow ISO


              Select the OEM mode for Tesseract, could be 0, 1 or  2.   0:  OEM_TESSERACT_ONLY  -
              default  value,  the  fastest  mode.   1:  OEM_LSTM_ONLY  -  use LSTM algorithm for
              recognition.  2: OEM_TESSERACT_LSTM_COMBINED - both algorithms.


              For MKV subtitles, select which language's caption stream will be  processed.  e.g.
              'eng'  for  English.   Language  codes  can  be  either the 3 letters bibliographic
              ISO-639-2 form (like "fre" for french) or a language code followed by a dash and  a
              country code for specialities in languages (like "fre-ca" for Canadian French).


              When  processing  DVB  don't  use  the OCR to write the text as comments in the XML


              Specify the full path of the font that is to be used when generating SPUPNG files.

   Options that affect how ccextractor reads and writes (buffering)

       -bi --bufferinput

              Forces input buffering.

       -nobi -nobufferinput

              Disables input buffering.

       -bs --buffersize val

              Specify a size for reading, in bytes (suffix with K  or  or  M  for  kilobytes  and
              megabytes). Default is 16M.


              keep-output-close.  If  used  then  CCExtractor  will  close  the output file after
              writing each subtitle frame and attempt to create it again when needed.

       -ff --forceflush

              Flush the file buffer whenever content is written.

   Options that affect the built-in 608 closed caption decoder


              Direct Roll-Up. When in roll-up mode, write character by character instead of  line
              by line. Note that this produces (much) larger files.

       -noru --norollup

              If  you  hate  the  repeated  lines  caused  by the roll-up emulation, you can have
              ccextractor write only one line at a time, getting rid of these repeated lines.

       -ru1 / ru2 / ru3

              roll-up captions can consist of 2, 3 or 4 visible lines at any time (the number  of
              lines  is  part of the transmission). If having 3 or 4 lines annoys you you can use
              -ru to force the decoder to always use 1, 2 or 3 lines. Note that 1 line is  not  a
              real  mode  rollup  mode,  so  CCExtractor  does  what  it  can.  In -ru1 the start
              timestamp is actually the timestamp  of  the  first  character  received  which  is
              possibly more accurate.

   Options that affect timing

       -delay ms

              For  srt/sami/webvtt,  add  this  number of milliseconds to all times. For example,
              -delay 400 makes subtitles appear 400ms late. You can also use negative numbers  to
              make subs appear early.

       Notes on times: -startat and -endat times are used first, then -delay.  So if you use -srt
       -startat 3:00 -endat 5:00 -delay 120000, ccextractor will generate a .srt file, with  only
       data  from  3:00  to 5:00 in the input file(s) and then add that (huge) delay, which would
       make the final file start at 5:00 and end at 7:00.

   Options that affect what segment of the input file(s) to process

       -startat time

              Only write caption information that starts after  the  given  time.   Time  can  be
              seconds,  MM:SS  or HH:MM:SS.  For example, -startat 3:00 means 'start writing from
              minute 3.

       -endat time

              Stop processing after the given time (same format as -startat).  The  -startat  and
              -endat  options  are  honored  in  all  output formats.  In all formats with timing
              information the times are unchanged.

       -scr --screenfuls num

              Write 'num' screenfuls and terminate processing.

   Options that affect which codec is to be used have to be searched in input

       If codec type is not selected then  first  elementary  stream  suitable  for  subtitle  is
       selected, please consider -teletext -noteletext override this option.

       -codec dvbsub

              select  the dvb subtitle from all elementary stream, if stream of dvb subtitle type
              is not found then nothing is selected and no subtitle is generated

       -nocodec dvbsub

              ignore dvb subtitle and follow default behaviour

       -codec teletext

              select the teletext subtitle from elementary stream

       -nocodec teletext

              ignore teletext subtitle

              NOTE: option given in form -foo=bar ,-foo = bar and --foo=bar  are  invalid.  Valid
              option are only in form -foo bar.  nocodec and codec parameter must not be same. If
              found to be same then parameter of nocodec is ignored, this flag should  be  passed
              once,  more  then  one  are  not  supported  yet  and last parameter would taken in

   Adding start and end credits

       CCExtractor can _try_ to add a custom message (for credits for example) at the  start  and
       end  of  the  file,  looking for a window where there are no captions. If there is no such
       window, then no text will be added.  The start window must be between the times given  and
       must have enough time to display the message for at least the specified time.

       --startcreditstext txt

              Write  this  text  as start credits. If there are several lines, separate them with
              the characters \n, for example Line1\nLine 2.

       --startcreditsnotbefore time

              Don't display the start credits before this time (S, or MM:SS). Default: 0

       --startcreditsnotafter time

              Don't display the start credits after this time (S, or MM:SS). Default: 5:00

       --startcreditsforatleast time

              Start credits need to be displayed for at least this time (S, or MM:SS). Default: 2

       --startcreditsforatmost time

              Start credits should be displayed for at most this time (S, or MM:SS). Default: 5

       --endcreditstext txt

              Write this text as end credits. If there are several lines, separate them with  the
              characters \n, for example Line1\nLine 2.

       --endcreditsforatleast time

              End credits need to be displayed for at least this time (S, or MM:SS). Default: 2

       --endcreditsforatmost time

              End credits should be displayed for at most this time (S, or MM:SS). Default: 5

   Options that affect debug data


              Show lots of debugging output.


              Print  debug  traces  from the EIA-608 decoder. If you need to submit a bug report,
              please send the output from this option.


              Print debug information from the (currently in development) EIA-708 (DTV) decoder.


              Enable lots of time stamp output.


              Enable XDS debug data (lots of it).


              Print debug info about the analysed elementary video stream.


              Print debug trace with the raw 608/708 data with time stamps.


              Disable the syncing code. Only useful for debugging purposes.


              Disable the removal of trailing padding blocks when exporting to bin  format.  Only
              useful for for debugging purposes.


              Print  debug  info  about  the parsed container file. (Only for TS/ASF files at the


              Print Program Association Table dump.


              Print Program Map Table dump.


              Hex-dump defective TS packets.


              If no CC packets are detected based on the PMT, try to find data in all packets  by

   Teletext related options

       -tpage page

              Use  this page for subtitles (if this parameter is not used, try to autodetect). In
              Spain the page is always 888, may vary in other countries.


              Enable verbose mode in the teletext decoder.


              Force teletext mode even if teletext is not detected.  If  used,  you  should  also
              pass -datapid to specify the stream ID you want to process.


              Disable  teletext processing. This might be needed for video streams that have both
              teletext packets and CEA-608/708 packets (if teletext is processed then CEA-608/708
              processing is disabled).

   Transcript customizing options

       -customtxt format

              Use  the  passed format to customize the (Timed) Transcript output. The format must
              be like this: 1100100 (7 digits).  These indicate whether the next things should be
              displayed or not in the (timed) transcript. They represent (in order):

              — Display start time

              — Display end time

              — Display caption mode

              — Display caption channel

              — Use a relative timestamp ( relative to the sample)

              — Display XDS info

              — Use colors


              0000101 is the default setting for transcripts
              1110101 is the default for timed transcripts
              1111001 is the default setting for -ucla

              Make  sure  you  use  this  parameter after others that might affect these settings
              (-out, -ucla, -xds, -txt, -ttxt ...)

   Communication with other programs and console output


              Report progress and interesting events to stderr in a easy to parse format. This is
              intended to be used by other programs. See docs directory for details.


              Suppress the output of the progress bar


              Don't write any message.

       Notes  on the CEA-708 decoder: While it is starting to be useful, it's a work in progress.
       A number of things don't work yet in the decoder itself, and many of the  auxiliary  tools
       (case  conversion  to  name  one)  won't do anything yet. Feel free to submit samples that
       cause problems and feature requests.

       Notes on spupng output format: One .xml file is created per output field. A  set  of  .png
       files  are  created  in  a  directory  with  the  same base name as the corresponding .xml
       file(s), but with a .d extension. Each .png file will contain an  image  representing  one
       caption and named subNNNN.png, starting with sub0000.png.

       For example, the command:

              ccextractor -out=spupng input.mpg

       will create the files:


       The command:

              ccextractor -out=spupng -o /tmp/output -12 input.mpg

       will create the files:


   Burned-in subtitle extraction


              Enable the burned-in subtitle extraction subsystem.
              NOTE: The following options will work only if -hardsubx is specified before them:-


              Set the OCR mode to either frame-wise, word-wise or letter wise.
              e.g. -ocr_mode frame (default), -ocr_mode word, -ocr_mode letter


              Specify the color of the subtitles
              Possible values are in the set {white,yellow,green,cyan,blue,magenta,red}.
              Alternatively, a custom hue value between 1 and 360 may also be specified.
              e.g. -subcolor white or -subcolor 270 (for violet).
              Refer to an HSV color chart for values.


              Specify the minimum duration that a subtitle line must exist on the screen.
              The value is specified in seconds.
              A lower value gives better results, but takes more processing time.
              The recommended value is 0.5 (default).
              e.g. -min_sub_duration 1.0 (for a duration of 1 second)


              Specify whether italics are to be detected from the OCR text.
              Italic detection automatically enforces the OCR mode to be word-wise


              Specify the classifier confidence threshold between 1 and 100.
              Try and use a threshold which works for you if you get a lot of garbage text.
              e.g. -conf_thresh 50


              For white subtitles only, specify the luminance threshold between 1 and 100.
              This  threshold  is  content  dependent,  and  adjusting values may give you better
              Recommended values are in the range 80 to 100.
              The default value is 95

              An example command for burned-in subtitle extraction is as follows:

              ccextractor video.mp4 -hardsubx -subcolor white  -detect_italics  -whiteness_thresh
              90 -conf_thresh 60


              Display current CCExtractor version and detailed information.


       This tool homepage


       Originally  based  on  McPoodle's  tools. Check his page for lots of information on closed
       captions technical details.


CCExtractor 0.86, Carlos Fernandez Sanz, VolAprilu2018hke.                         CCEXTRACTOR(1)