Ubuntu Manpage: y4mscaler - Scale/crop/translate a YUV4MPEG2 stream

Provided by: mjpegtools_2.1.0+debian-4_amd64

NAME

       y4mscaler - Scale/crop/translate a YUV4MPEG2 stream

SYNOPSIS

       y4mscaler [options] < Y4Mstream > Y4Mstream

DESCRIPTION

y4mscaler is a general-purpose video scaler which operates on YUV4MPEG2 streams, as produced and consumed
by the MJPEGtools such as lav2yuv and mpeg2enc(1).

y4mscaler is meant to be used in a pipeline. Thus, input is from stdin, and output is to stdout.

The essential function of y4mscaler is to scale a specified "active" region of the input stream (the
source) into a specified active region of the output stream (the target). Pixels outside of the active
region of the source are ignored; pixels outside of the active region of the target are filled with a
background color. The source may additionally have a matte applied to it; pixels outside the source
matte are set to a separately specified background color.

y4mscaler correctly handles chroma subsampling, and thus it can also perform chroma subsampling
conversions. The YUV4MPEG2 stream format supports three varieties of 4:2:0 subsampling, as well as
4:1:1, 4:2:2, 4:4:4, a 4:4:4 modes with an alpha channel, and a monochrome luma-only mode. (See "NOTES
ON CHROMA MODES AND SUBSAMPLING".)

y4mscaler can perform simple interlacing conversions: switching from top-field-first to bottom-field-
first and vice-versa (by lossily discarding the first field), and creating a progressive stream from
interlaced by discarding every other field (effectively halving the vertical resolution).

The source and target are defined by many, many parameters, but y4mscaler has many, many heuristics
built-in to automagically set them appropriately. Most source parameters are taken from the input stream
header. Remaining source and target parameters which are not specified by the user are guessed in a sane
manner.

y4mscaler includes preset parameters for a number of common target streams: DVD, VideoCD (VCD), SuperVCD
(SVCD), associated still image formats, and DV.

EXAMPLES

       To create a stream appropriate for use in an SVCD:

            y4mscaler -O preset=svcd

       To create a stream for a VideoCD (a non-interlaced format), from a  DV  source  (an  interlaced  format),
       shifting the input frame 4 pixels to the left:

            y4mscaler -I ilace=bottom-only -I active=-4+0cc -O preset=vcd

       To  take  a  widescreen NTSC DV source, and convert it to a letterboxed stream, with blue bars on the top
       and bottom:

            y4mscaler -O sar=ntsc -O bg=RGB:0,0,255

       To take a widescreen NTSC DV source, and convert it to a "fullscreen" stream (i.e. the sides are clipped,
       just like on TV):

            y4mscaler -O sar=ntsc -O infer=clip

       To take a centered, letterboxed NTSC source, and convert it to a widescreen (16:9) format stream for DVD,
       with the black bars removed:

            y4mscaler -O preset=dvd -O sar=ntsc_wide -O infer=clip

       To take the center 100x100 pixel chunk of an NTSC DV stream, surround it with a 20-pixel blue border, and
       blow that up to a full-screen SuperVCD stream:

            y4mscaler -I active=140x140+0+0cc -I matte=100x100+0+0cc -I bg=RGB:0,0,255 -O preset=svcd

OPTIONS

The first three options, -v, -V, and -h, are simple straightforward options which take either no
arguments or one numeric argument.

-v [0,1,2]
Set verbosity level.
0 = warnings and errors only.
1 = add informative messages, too (default).
2 = add chatty debugging message, too.

-V Show version information and exit.

-h Show a help message (synopsis of options).

The -I, -O, and -S options each take one argument of the form parameter=value, which specify parameters
for the input, output, and scaling, respectively. These options can be used repeatedly to specify
multiple parameters. The parameter names and values are not case-sensitive. Definitions of the form
"parameter=[AAA|BBB|CCC]" mean that only one of the listed keywords AAA, BBB, or CCC may be chosen.
Succeeding options will override earlier ones.

-I input_parameter
Specify parameters for the source/input stream. All '-I' arguments are evaluated in order, and
later arguments on the command-line will override earlier ones. All '-I' arguments are evaluated
before any '-O' arguments.

active=WxH+X+Yaa
Specify the active region of the source frame, which is scaled to fit the active region of the
target frame. The default is the full frame. (The "WxH" may be omitted, and the region size
defaults to the size of of the source frame.) W and H are width and height. X and Y are the
offset of the anchor point. "aa" is the anchor mode (default: TL); see "NOTES ON REGION
GEOMETRY" for details.
Example: active=200x180+30+24cc

matte=WxH+X+Y
Specify a matte region for the source frame. All pixels outside of this region are set to the
source background color. The default matte is the full frame. (The "WxH" may be omitted, and the
region size defaults to the size of of the source frame.) W and H are width and height. X and Y
are the offset of the anchor point. "aa" is the anchor mode (default: TL); see "NOTES ON REGION
GEOMETRY" for details.
Example: matte=200x180+30+24cc

bg=RGB:r,g,b
bg=YCBCR:y,cb,cr
bg=RGBA:r,g,b,a
bg=YCBCRA:y,cb,cr,a
Set the source background color. Pixels outside of the source's matte region are set to this
color. One can specify the color as either a R'G'B' or Y'CbCr triplet. For example, the default
color is black, specified as "bg=YCBCR:16,128,128" or "bg=RGB:0,0,0". The 'A' versions will set
the alpha (transparency) value of the color. The alpha range is [0,255] for RGBA and [16,235]
for YCBCRA. The default is fully-opaque (255 for RGBA, 235 for YCBCRA).

norm=[NTSC|PAL|SECAM]
Specify the "norm" of the source stream. This is normally inferred from the stream header.

ilace=[NONE|TOP_FIRST|BOTTOM_FIRST|TOP_ONLY|BOTTOM_ONLY]
Specify the interlacing used by the source stream. NONE, TOP_FIRST, and BOTTOM_FIRST correspond
to non-interlaced, top-field-first, and bottom-field-first. These values are normally inferred
from the stream header; specifying them will override the stream header.
TOP_ONLY and BOTTOM_ONLY specify that only the top or bottom field of each frame should be used;
the other field is discarded. These options can only be used with an interlaced input, and cause
the interlaced stream to be treated as a progressive stream with half the height. (This is
particularly useful in creating a VCD from a full-size interlaced input stream.) These two
special options can only be used when the source is a pure progressive stream (as opposed to a
YUV4MPEG2 "mixed-mode" stream).

chromass=[420JPEG|420MPEG2|420PALDV|444|422|411|mono|444alpha]
Specify the chroma subsampling mode used in the source stream. This parameter is inferred from
the stream header, so this keyword should almost never be used in a source specification. The
only useful reason to specify this keyword is to override one variety of 4:2:0 with another. Any
other use will cause processing to fail.

sar=N:D
sar=[NTSC|PAL|NTSC_WIDE|PAL_WIDE]
Specify the sample-aspect-ratio of the source stream. The value can either be or numeric ratio
(such as "10:11") or one of the keywords, which correspond to the CCIR-601 values for 4:3 or 16:9
displays, respectively. This parameter is usually inferred from the stream header.

-O output_parameter
Specify parameters for the destination/output stream. All '-O' arguments are evaluated in order,
and later arguments on the command-line will override earlier ones. All '-O' arguments are
evaluated after any '-I' arguments.

size=WxH
size=SRC
Set the output/target frame size, as width W and height H in pixels. Use the keyword SRC to
specify that the target frame size should match the source frame size.

active=WxH+X+Yaa
Specify the active region of the target frame, into which the active region of the source frame
is scaled. The default is the full target frame. (The "WxH" may be omitted, and the region size
defaults to the size of of the target frame.) W and H are width and height. X and Y are the
offset of the anchor point. "aa" is the anchor mode (default: TL); see "NOTES ON REGION
GEOMETRY" for details.
Example: active=200x180+30+24cc

bg=RGB:r,g,b
bg=YCBCR:y,cb,cr
bg=RGBA:r,g,b,a
bg=YCBCRA:y,cb,cr,a
Set the target background color. Pixels outside of the target's active region are set to this
color. One can specify the color as either a R'G'B' or Y'CbCr triplet. For example, the default
color is black, specified as "bg=YCBCR:16,128,128" or "bg=RGB:0,0,0". The 'A' versions will set
the alpha (transparency) value of the color. The alpha range is [0,255] for RGBA and [16,235]
for YCBCRA. The default is fully-opaque (255 for RGBA, 235 for YCBCRA).

ilace=[NONE|TOP_FIRST|BOTTOM_FIRST]
Specify the interlacing used by the target stream. NONE, TOP_FIRST, and BOTTOM_FIRST correspond
to non-interlaced, top-field-first, and bottom-field-first. The default if to match the source
stream.
If the source and target are both interlaced, but with different modes (i.e. one is bottom-first,
and the other is top-first), then y4mscaler will convert one mode to the other by dropping the
first source field.

sar=N:D
sar=[SRC|NTSC|PAL|NTSC_WIDE|PAL_WIDE]
Specify the sample-aspect-ratio of the source stream. The value can either be or numeric ratio
(such as "10:11") or one of the keywords, which correspond to the CCIR-601 values for 4:3 or 16:9
displays, respectively. The keyword SRC specifies that the target SAR should match the source.

scale=N/D
Xscale=N/D
Yscale=N/D
Set the scaling ratios, as a fraction; for example, scale=1/2. "scale=" sets both X and Y
factors simultaneously. "Xscale=" and "Yscale=" can be used to set them independently.

infer=[PAD|CLIP|PRESERVE_X|PRESERVE_Y]
Set the mode used to infer scaling ratios from active regions and SAR's. The keywords are
mutually exclusive. The default is PAD.

infer=[SIMPLIFY|EXACT]
Set whether the above heuristic uses exact ratios, or whether it is allowed to slightly adjust
active regions to simplify the scaling ratios. The keywords are mutually exclusive. The default
is SIMPLIFY.

align=[TL|TC|TR|CL|CC|CR|BL|BC|BR]
Set the alignment point between the source and target active regions. The keywords specify "top-
left", "top-center", "top-right", etc. The specified corner or point from the source region will
be mapped to the same spot in the target region; and cropping or padding which is applied to the
active regions will preserve this mapping. The default is CC, for "center-center", i.e. the
source and target regions are mutually centered. The keywords are mutually exclusive. The
default is CC. See "NOTES ON SOURCE AND TARGET ALIGNMENT" for details.

VCD - 352-wide VideoCD, progressive

CVD - 352-wide (full-height) ChinaVideoDisc

SVCD - 480-wide SuperVCD

DVD - 720-wide DVD

DVD_WIDE - 720-wide DVD, anamorphic pixels

DV - 720-wide DV (bottom-field-first, 4:1:1)

DV_WIDE - 720-wide DV, anamorphic pixels

SVCD_STILL_HI - high-resolution SVCD still image

SVCD_STILL_LO - low-resolution SVCD still image

VCD_STILL_HI - high-resolution VCD still image

VCD_STILL_LO - low-resolution SVCD still image

ATSC_720P - ATSC 720p (progressive HDTV)

ATSC_1080I - ATSC 1080i (interlaced HDTV)

ATSC_1080P - ATSC 1080p (HDTV)

-S scaling_parameter
Specify parameters for the scaling engine. All '-S' arguments are evaluated in order, and later
arguments on the command-line will override earlier ones.

mode=MONO
Request monochrome scaling. The source is treated as monochrome and its chroma channels are
ignored. The chroma channels of the output stream will be zeroed to yield a grayscale output.

mode=LINESWITCH
Request line swapping. Effectively, the top and bottom fields within each frame will be swapped.
This may help with malformed streams that have a messed up spatial order. This option is only
effective on interlaced streams.

scaler=scaler-name
Use a particular scaling engine. The available engines are:
'default' - Matto's Generic Scaler (the default)

option=scaler-option
Specify an option for the chosen scaling engine. To see all the available options, use
"option=help".

For the default engine, the available scaler-options select the filter kernel:

box - box filter

linear - linear interpolation

quadratic - quadratic interpolation

cubic - cubic interpolation, Mitchell-Netravali spline

cubicCR - cubic interpolation, Catmull-Rom spline

cubicB - cubic interpolation, B-spline

cubicK4 - Keys 4th-order cubic

sinc:N - sinc with Lanczos window, N cycles

To select kernels for the x and y scaling directions independently, use two kernel names
separated by a comma, e.g. option=box,quadratic.

sinc:N will give the best quality results (least aliasing), but is the slowest. The quality
improves with larger values of N, as does processing time. cubic is generally regarded in the
graphics world as the 3rd-order cubic spline with the best trade-off between smoothing and
aliasing. box yields the worst quality results (most aliasing), but is the fastest. The default
kernel is cubicK4, which has a flatter passband and sharper cutoff than cubic. (It requires the
same computational power as sinc:4, but produces less ringing artifacts.)

NOTES ON TARGET PRESETS

       The  following  table  details  the settings provided by the various target "preset=" keywords.  When two
       values are given the primary is for NTSC streams; the value in {braces} is for PAL streams.  If interlace
       value is unspecified, it is inherited from the source, otherwise  the  indicated  target  interlacing  is
       required.

        Preset         Frame Size    Interlace     SAR            Subsampling
       -----------------------------------------------------------------------
         VCD           352x240{288}  none          10:11{59:54}   4:2:0-JPEG
         CVD           352x480{576}  ---           20:11{59:27}   4:2:0-MPEG2
        SVCD           480x480{576}  ---           15:11{59:36}   4:2:0-MPEG2
         DVD           720x480{576}  ---           10:11{59:54}   4:2:0-MPEG2
         DVD_WIDE      720x480{576}  ---           40:33{118:81}  4:2:0-MPEG2
         DV            720x480{576}  bottom-first  10:11{59:54}   4:1:1
         DV_WIDE       720x480{576}  bottom-first  40:33{118:81}  4:1:1
        SVCD_STILL_HI  704x480{576}  none          10:11{59:54}   4:2:0-MPEG2
        SVCD_STILL_LO  480x480{576}  none          15:11{59:36}   4:2:0-MPEG2
         VCD_STILL_HI  704x480{576}  none          10:11{59:54}   4:2:0-JPEG
         VCD_STILL_LO  352x240{288}  none          10:11{59:54}   4:2:0-JPEG
        ATSC_720p         1280x720   none          1:1            4:2:0-MPEG2
        ATSC_1080i        1920x1080  (required)    1:1            4:2:0-MPEG2
        ATSC_1080p        1920x1080  none          1:1            4:2:0-MPEG2

NOTES ON REGION GEOMETRY

       Active  and  matte regions are specified using a geometry string of the form "WxH+X+Yaa".  The "WxH" part
       specifies the size of the region, as a Width and Height in pixels.  (In some  cases,  the  "WxH"  may  be
       omitted,  and the region size defaults to the full frame size.)  The "+X+Y" specifies the position of the
       region, as an offset relative to the anchor point specified by "aa".

       The "aa" code can be one of TL, TC, TR, CL, CC, CR, BL, BC, or BR.  These  stand  for  "top-left",  "top-
       center", ..., "bottom-center", "bottom-right".  These codes are not case-sensitive.

       The "+X+Y" specifies the offset of the region's anchor point from the frame's anchor point.  For example,
       "+20+30TL"  means  that  the  top-left  corner of the region will be offset 20 pixels to the right and 30
       pixels down from the top-left corner of the frame.

       The offset values can also be negative.  For example,  "-4+0CC"  means  that  the  center  (vertical  and
       horizontal) of the region is offset 4 pixels to the left of the center of the frame.

       The default anchoring point for geometry strings is TL, i.e. the top-left corner.

NOTES ON SOURCE AND TARGET ALIGNMENT

       Often,  the source and target active regions do not match exactly.  This happens when, using the given or
       calculated scaling ratios, the source region scales to a different size or shape than the target  region.
       In  this case, the source and target regions are mutually clipped, so that only the portion of the source
       which fits will be scaled into the target.

       Before any clipping or padding, the source and target regions are aligned so that  the  points  specified
       via the "align=aa" parameter coincide.  The "aa" code specifies an anchor point as described above.

       For  example,  "align=BC"  specifies that the bottom-center of the source region should get mapped to the
       bottom-center of the target region.  In other words, the source region will be horizontally centered  and
       vertically aligned to the bottom of the target region before clipping:

               ----------------  source
               |abcdefghijklmn|
            ---|opqrstuvwxyz01|---  target      ----------------
            |  |234567890ABCDE|  |              |234567890ABCDE|
            |  |FGHIJKLMNOPQRS|  |              |FGHIJKLMNOPQRS|
            |  |TUVWXYZabcdefg|  |              |TUVWXYZabcdefg|
            ----------------------              ----------------
                   Before                       Mutually Clipped

       If  instead "align=TR" were centered, the source would be clipped in a different place, and scaled into a
       different region of the target frame:

            ----------------------                 ----------------
            |     |abcdefghijklmn|                 |abcdefghijklmn|
            |     |opqrstuvwxyz01|                 |opqrstuvwxyz01|
            |     |234567890ABCDE|                 |234567890ABCDE|
            ------|FGHIJKLMNOPQRS|                 ----------------
         target   |TUVWXYZabcdefg| source
                  ----------------
                   Before                       Mutually Clipped

       The default alignment mode is "CC", that is, the source and target are mutually centered.

NOTES ON SCALE FACTOR INFERENCE

If the X and Y scaling factors are not explicitly provided, y4mscaler will infer the factors from the
source and target active regions and sample aspect ratios (SAR's).

If the active regions are not compatible shape-wise (given the SAR's), the source and target regions will
be clipped or padded according to one of four policies. The policy is selected using the "infer="
parameter and one of the keywords PAD, CLIP, PRESERVE_X, or PRESERVE_Y. (The default is PAD.)

PAD
Pick scaling factors which will pad the source, but ensure that all of the source image content
ends up in the target.

CLIP
Pick scaling factors which will clip the source, but which will fill the target region as much as
possible.

PRESERVE_X
Pick scaling factors which preserve as much of the horizontal source content as possible.

PRESERVE_Y
Pick scaling factors which preserve as much of the vertical source content as possible.

The policy is further affected by a choice of two other keywords, SIMPLIFY, or EXACT. (The default is
SIMPLIFY.)

EXACT
Calculate exact scaling factors.

SIMPLIFY
Adjust the active regions and scaling factors (within 10% or so), to simplify the ratios as much as
possible. (For example, crop or pad slightly to achieve a ratio of 2/1 rather than 45/22.)

NOTES ON CHROMA MODES AND SUBSAMPLING

y4mscaler can convert streams from one chroma subsampling mode to another. Such conversions are always
lossy operations, even if the overall frame is undergoing 1/1 scaling.

y4mscaler will infer the source's subsampling mode from tags in the input stream header. The target
presets ("preset=XXX") will attempt to set the target subsampling mode appropriately. Otherwise, by
default the target subsampling mode will match the source. One can explicitly set the subsampling mode
for the source and/or the target by using the "chromass=" parameter.

y4mscaler is capable of reading and writing streams in the 4:4:4, 4:2:2, 4:1:1, and 4:2:0 (all three
varieties) subsampling modes. The first three, however, are a relatively new addition to the YUV4MPEG2
standard, and many MJPEGtools will fail to process them correctly, if at all. smil2yuv and raw2yuv can
produce native 4:1:1 streams from NTSC DV video, which can then be converted to 4:2:0 by y4mscaler before
further processing by other tools.

If the source has an alpha-channel (i.e. 444ALPHA mode) and the target does not, the alpha channel will
simply be discarded. On the other hand, if the target has an alpha-channel but the source does not, a
constant alpha-channel will be created using the alpha-value of the target's background color (as set by
"-O bg="). The default is fully-opaque.

Similarly, if the target has chroma channels but the source does not (i.e. a luma-only MONO stream), then
the chroma channels in the output will be set according to the background color.

NOTES ON ANOMALOUS INTERLACE MIXTURES

       The  YUV4MPEG2  format  allows  for  "mixed-mode  interlacing"  streams,  which  may contain a mixture of
       progressive and interlaced frames.  Each frame is tagged as temporally  interlaced  or  progressive,  and
       vertically-subsampled  frames  (4:2:0  formats)  are  further  tagged  as  spatially  interlaced  or not.
       Unfortunately, this allows for the possibility  of  anomalous  frames,  which  happen  to  be  temporally
       interlaced  (fields  sampled  at different times) but spatially progressive (subsampling performed across
       entire frame), or vice-versa.  The only  reasonable  thing  to  do  with  such  anomalous  frames  is  to
       vertically-upsample the chroma, essentially making to problem go away as quickly as possible.

       y4mscaler  will  only  process such frames if the target output format is non-vertically-subsampled (e.g.
       4:4:4, 4:2:2, etc.) and no other vertical processing is  required.   Otherwise  y4mscaler  will  bail  on
       processing  in  midstream  when  it  encounters  an  anomalous  frame.   If  there  is any possibility of
       encountering such an error, y4mscaler will print a warning when processing begins.

EXIT STATUS

       0      Successful program execution.

       1      Usage, syntax, or operational error.

AUTHOR

       This manual page is copyright 2005 by Matthew Marjanovic.
       Feel free to direct any questions, remarks, problems, or bug reports  concerning  this  tool  to  <dmg  @
       mir.com>.

       For more info, see our website at:
              <http://www.mir.com/DMG/>

       For more information on MJPEGtools, consult:
              <http://mjpeg.sourceforge.net/>