Provided by: sox_14.0.0-5_i386 bug


       SoX - Sound eXchange, the Swiss Army knife of audio manipulation


       The  core  problem is that you need some experience in using effects in
       order to say ‘that any old sound file sounds  with  effects  absolutely
       hip’.  There  isn’t  any  rule-based  system which tell you the correct
       setting of all the parameters for every effect.  But  after  some  time
       you will become an expert in using effects.

       Here are some examples which can be used with any music sample.  (For a
       sample where only a single instrument  is  playing,  extreme  parameter
       setting   may   make  well-known  ‘typically’  or  ‘classical’  sounds.
       Likewise, for drums, vocals or guitars.)

       Single effects will be explained and some given parameter settings that
       can  be  used  to  understand the theory by listening to the sound file
       with the added effect.

       Using multiple effects in parallel or in series can result either in  a
       very  nice sound or (mostly) in a dramatic overloading in variations of
       sounds such that your ear may  follow  the  sound  but  you  will  feel
       unsatisfied.  Hence,  for  the  first time using effects try to compose
       them as minimally as possible.  We  don’t  regard  the  composition  of
       effects  in the examples because too many combinations are possible and
       you really need a very fast machine and a lot of memory to play them in

       However,  real-time  playing  of  sounds will greatly speed up learning
       and/or tuning the parameter settings for your sounds in  order  to  get
       that ‘perfect’ effect.

       Basically,  we  will use the ‘play’ front-end of SoX since it is easier
       to listen sounds coming out of  the  speaker  or  earphone  instead  of
       looking at cryptic data in sound files.

       For easy listening of (‘xxx’ is any sound format):

            play effect-name effect-parameters

       Or more SoX-like (for ‘dsp’ output on a UNIX/Linux computer):

            sox -t oss -2 -s /dev/dsp effect-name effect-parameters

       or (for ‘au’ output):

            sox  -t  sunau  -2  -s  /dev/audio  effect-name  effect-

       And for date freaks:

            sox file.yyy effect-name effect-parameters

       Additional options can be used. However, in this  case,  for  real-time
       playing you’ll need a very fast machine.


       I  played  all  examples  in  real-time on a Pentium 100 with 32 MB and
       Linux 2.0.30 using a self-recorded sample (  3:15  min  long  in  ‘wav’
       format  with  44.1  kHz  sample  rate  and stereo 16 bit ).  The sample
       should not contain any  of  the  effects.  However,  if  you  take  any
       recording of a sound track from radio or tape or CD, and it sounds like
       a live concert or ten people are playing the  same  rhythm  with  their
       drums  or  funky-grooves, then take any other sample.  (Typically, less
       then four different instruments and no synthesizer  in  the  sample  is
       suitable. Likewise, the combination vocal, drums, bass and guitar.)

       An  echo  effect  can  be  naturally  found  in the mountains, standing
       somewhere on a mountain and shouting a single word will result  in  one
       or  more  repetitions  of  the  word (if not, turn a bit around and try
       again, or climb to the next mountain).

       However, the time difference between  shouting  and  repeating  is  the
       delay  (time),  its  loudness  is  the  decay.  Multiple echos can have
       different delays and decays.

       It is very popular to use echos  to  play  an  instrument  with  itself
       together, like some guitar players (Queen’s Brian May) or vocalists do.
       For music samples of more than one instrument, echo can be used to  add
       a second sample shortly after the original one.

       This  will  sound  as  if  you  are  doubling the number of instruments
       playing in the same sample:

            play echo 0.8 0.88 60 0.4

       If the delay is very short, then  it  sound  like  a  (metallic)  robot
       playing music:

            play echo 0.8 0.88 6 0.4

       Longer delay will sound like an open air concert in the mountains:

            play echo 0.8 0.9 1000 0.3

       One mountain more, and:

            play echo 0.8 0.9 1000 0.3 1800 0.25

       Like  the  echo  effect,  echos stand for ‘ECHO in Sequel’, that is the
       first echos takes the input, the second the input and the first  echos,
       the  third the input and the first and the second echos, ... and so on.
       Care should be taken using many  echos  (see  introduction);  a  single
       echos has the same effect as a single echo.

       The sample will be bounced twice in symmetric echos:

            play echos 0.8 0.7 700 0.25 700 0.3

       The sample will be bounced twice in asymmetric echos:

            play echos 0.8 0.7 700 0.25 900 0.3

       The sample will sound as if played in a garage:

            play echos 0.8 0.7 40 0.25 63 0.3

       The  chorus effect has its name because it will often be used to make a
       single vocal sound like a chorus.  But  it  can  be  applied  to  other
       instrument samples too.

       It  works  like the echo effect with a short delay, but the delay isn’t
       constant.  The  delay  is  varied  using  a  sinusoidal  or  triangular
       modulation.  The modulation depth defines the range the modulated delay
       is played before or after the delay. Hence the delayed sound will sound
       slower  or  faster, that is the delayed sound tuned around the original
       one, like in a chorus where some vocals are a bit out of tune.

       The typical delay is around 40ms to 60ms, the speed of  the  modulation
       is best near 0.25Hz and the modulation depth around 2ms.

       A single delay will make the sample more overloaded:

            play chorus 0.7 0.9 55 0.4 0.25 2 -t

       Two delays of the original samples sound like this:

            play chorus 0.6 0.9 50 0.4 0.25 2 -t 60 0.32 0.4 1.3 -s

       A big chorus of the sample is (three additional samples):

            play  chorus 0.5 0.9 50 0.4 0.25 2 -t 60 0.32 0.4 2.3 -t
       40 0.3 0.3 1.3 -s

       The flanger effect is like the chorus  effect,  but  the  delay  varies
       between  0ms  and  maximal  5ms.  It sound like wind blowing, sometimes
       faster or slower including changes of the speed.

       The flanger effect is widely used in funk and  soul  music,  where  the
       guitar sound varies frequently slow or a bit faster.

       Now, let’s groove the sample:

            play flanger

       listen  carefully  between  the difference of sinusoidal and triangular

            play flanger triangle

       The reverb effect is often used in audience hall which are to small  or
       contain too many many visitors which disturb (dampen) the reflection of
       sound at the walls.  Reverb will make the sound be perceived as  if  it
       were  in  a large hall.  You can try the reverb effect in your bathroom
       or garage or sport hall by shouting loud some words.  You’ll  hear  the
       words reflected from the walls.

       The  biggest  problem in using the reverb effect is the correct setting
       of the (wall) delays such that the sound is realistic and doesn’t sound
       like  music  playing  in  a  tin  can  or has overloaded feedback which
       destroys any illusion of playing in a big hall.   To  help  you  obtain
       realistic  reverb  effects, you should decide first how long the reverb
       should take place until it is not loud enough to be registered by  your
       ears.  This  is  be  done  by varying the reverb time ‘t’.  To simulate
       small halls, use 200ms.  To simulate large halls, use 1000ms.  Clearly,
       the  walls  of  such  a  hall aren’t far away, so you should define its
       setting be given every wall its delay time.  However, if the wall is to
       far away for the reverb time, you won’t hear the reverb, so the nearest
       wall will be best at ‘t/4’ delay and the farthest at ‘t/2’. You can try
       other  distances as well, but it won’t sound very realistic.  The walls
       shouldn’t stand to close to each other and not in  a  multiple  integer
       distance  to each other ( so avoid wall like: 200 and 202, or something
       like 100 and 200 ).

       Since audience halls do have a lot of walls, we  will  start  designing
       one beginning with one wall:

            play reverb 1 600 180

       One wall more:

            play reverb 1 600 180 200

       Next two walls:

            play reverb 1 600 180 200 220 240

       Now, why not a futuristic hall with six walls:

            play reverb 1 600 180 200 220 240 280 300

       If  you  run  out  of  machine  power  or  memory,  then  stop  as many
       applications as possible (every interrupt will consume  a  lot  of  CPU
       time which for bigger halls is absolutely necessary).

       The  phaser  effect  is  like  the flanger effect, but it uses a reverb
       instead of an echo and does phase shifting. You’ll hear the  difference
       in  the  examples  comparing both effects.  The delay modulation can be
       sinusoidal  or  triangular,  preferable  is  the  later  for   multiple
       instruments. For single instrument sounds, the sinusoidal phaser effect
       will give a sharper phasing effect.  The decay shouldn’t be to close to
       1 which will cause dramatic feedback.  A good range is about 0.5 to 0.1
       for the decay.

       We will take a parameter setting as before  (gain-out  is  lower  since
       feedback can raise the output dramatically):

            play phaser 0.8 0.74 3 0.4 0.5 -t

       The drunken loudspeaker system (now less alcohol):

            play phaser 0.9 0.85 4 0.23 1.3 -s

       A popular sound of the sample is as follows:

            play phaser 0.89 0.85 1 0.24 2 -t

       The sample sounds if ten springs are in your ears:

            play phaser 0.6 0.66 3 0.6 2 -t

       The  compander  effect  allows  the  dynamic  range  of  a signal to be
       compressed or expanded.  It works by calculating the input signal level
       averaged  over time according to the given attack and decay parameters,
       and setting the output signal level according to  the  given  transfer-
       function parameters.

       For  most  situations,  the  attack time (response to the music getting
       louder) should be shorter than the decay time because our ears are more
       sensitive to suddenly loud music than to suddenly soft music.

       For  example,  suppose  you  are  listening  to  Strauss’s ‘Also Sprach
       Zarathustra’ in a noisy environment such as a moving vehicle.   If  you
       turn  up  the  volume  enough  to  hear the soft passages over the road
       noise, the loud sections will be too loud.  So you could try this:

            sox asz.flac asz-car.flac compand 0.3,1 6:-70,-60,-20 -5 -90 0.2

       The transfer function (‘6:-70,...’) says that very soft  sounds  (below
       -70dB)  will  remain  unchanged.   This  will  stop  the compander from
       boosting the volume on ‘silent’ passages  such  as  between  movements.
       However,  sounds  in  the  range  -60dB to 0dB (maximum volume) will be
       boosted so that the 60dB dynamic range of the original  music  will  be
       compressed  3-to-1 into a 20dB range, which is wide enough to enjoy the
       music but narrow enough to get around the road noise.  The ‘6:’ selects
       6dB  soft-knee  companding.  The -5 (dB) output gain is needed to avoid
       clipping (the number is inexact, and was derived  by  experimentation).
       The  -90  (dB)  for  the  initial volume will work fine for a clip that
       starts with near silence, and the delay of 0.2 (seconds) has the effect
       of  causing  the compander to react a bit more quickly to sudden volume

       In order to visualise the transfer function, SoX can  be  invoked  with
       the --plot option, e.g.

            sox -n -n --plot gnuplot compand 0,0 6:-70,-60,-20 -5 > my.plt
            gnuplot my.plt

       The  following  (one  long)  command shows how multi-band companding is
       typically used in FM radio:

            play vol -3dB filter 8000- 32 100 mcompand \
            "0.005,0.1 -47,-40,-34,-34,-17,-33" 100 \
            "0.003,0.05 -47,-40,-34,-34,-17,-33" 400 \
            "0.000625,0.0125 -47,-40,-34,-34,-15,-33" 1600 \
            "0.0001,0.025 -47,-40,-34,-34,-31,-31,-0,-30" 6400 \
            "0,0.025 -38,-31,-28,-28,-0,-25" \
            vol 15dB highpass 22 highpass 22 filter -17500 256 \
            vol 9dB lowpass -1 17801

       The audio file is played with a simulated FM radio sound (or  broadcast
       signal  condition  if  the lowpass filter at the end is skipped).  Note
       that the pipeline is set up with US-style 75us preemphasis.

   Changing the Rate of Playback
       You can use stretch to change the rate of playback of an  audio  sample
       while preserving the pitch.  For example to play at half the speed:

            play file.wav stretch 2

       To play a file at twice the speed:

            play file.wav stretch 0.5

       Other  related  options  are  ‘speed’  to change the speed of play (and
       changing the pitch accordingly), and pitch, to alter  the  pitch  of  a
       sample.   For  example  to  speed a sample so it plays in half the time
       (for those Mickey Mouse voices):

            play file.wav speed 2

       To raise the pitch of a sample 1 while note (100 cents):

            play file.wav pitch 100

   Reducing noise in a recording
       First find a period of silence in your recording, such as the beginning
       or  end  of  a  piece.  If  the  first 1.5 seconds of the recording are
       silent, do

            sox file.wav -n trim 0 1.5 noiseprof /tmp/profile

       Next, use the noisered effect to actually reduce the noise:

            play file.wav noisered /tmp/profile

   Making a recording
       Thanks to Douglas Held for the following suggestion:

            rec parameters filename other-effects silence 1 5 2%

       This use of the silence effect allows you to start a recording  session
       but only start writing to disk once non-silence is detect. For example,
       use this to start your favorite command line  for  recording  and  walk
       over  to  your record player and start the song.  No periods of silence
       will be recorded.

   Scripting with SoX
       One of the benefits of a command-line tool is that it is easy to use it
       in  scripts  to  perform more complex tasks.  In marine radio, a Mayday
       emergency call is transmitted preceded by a 30-second alert sound.  The
       alert  sound comprises two audio tones at 1300Hz and 2100Hz alternating
       at a rate of 4Hz.  The following shows how SoX can be used in a  script
       to  construct  an audio file containing the alert sound.  The scripting
       language shown is ‘Bourne shell’  (sh)  but  it  should  be  simple  to
       translate  this to another scripting language if you do not have access
       to sh.

       # Make sure we append to a file that’s initially empty:
       rm -f 2tones.raw

       for freq in 1300 2200; do
         sox -c1 -r8000 -n -t raw - synth 0.25 sine $freq vol 0.7 >> 2tones.raw

       # We need 60 copies of 2tones.raw (0.5 sec) to get 30 secs of audio:

       # Make sure we append to a file that’s initially empty:
       rm -f alert.raw

       while [ $iterations -ge 1 ]; do
         cat 2tones.raw >> alert.raw
         iterations=`expr $iterations - 1`

       # Add a file header and save some disc space:
       sox -s2 -c1 -r8000 alert.raw alert.ogg

       play alert.ogg

       If you try out the above script, you may want to hit Ctrl-C fairly soon
       after  the  alert tone starts playing - it’s not a pleasant sound!  The
       synth effect is used to generate each of the tones; -c1 -r8000  selects
       mono,  8kHz sampling-rate audio (i.e. relatively low fidelity, suitable
       for the marine radio transmission channel); each tone is generated at a
       length  of 0.25 seconds to give the required 4Hz alternation.  Note the
       use of  ‘raw’  as  the  intermediary  file  format;  a  self-describing
       (header)  format  would  just get in the way here.  The self-describing
       header is added only at the final stage; in this case, .ogg is  chosen,
       since lossy compression is appropriate for this application.

       There are further practical examples of scripting with SoX available to
       download from the SoX web-site [1].


       sox(1), libsox(3)

       [1]    SoX       -       Sound        eXchange        |        Scripts,


       This    man    page    was   written   largely   by   Juergen   Mueller
       (  Other SoX authors and contributors are listed
       in the AUTHORS file that is distributed with the source code.