More vocoder
examples
Technical
data:
Processor
is dsPIC30F6012A 16-bit DSC from Microchip running at 25MHz. Audio sample rate
is 24kHz. Vocoder is 20 bands (20 analysis + 20 synthesis), each
1/3rd octave. Filters are 8th order Linkwitz Riley
(Butterworth squared) band-pass response (24dB/oct
slope each side.) 160 separate bi-quad
filters in total. Lowest band is 75Hz, highest band is 7kHz, (although the bands in the
lowest octave do little with speech signals.)
Sibilance from modulator above 6kHz is sometimes
high-pass filtered and mixed in with the modulated carrier to increase
intelligibility. +3dB/oct spectral tilt is added for classic sawtooth/square
based carriers, and -3dB/oct spectral tilt is added
for white noise carrier to compensate for the natural spectrum of the carrier
signal. No user-interface or spectral
display yet! Formant shifting was done by directing the envelope detector level
from each band to the gain-control element in another band. Multi-rate DSP tricks are used to allow this
basic 16-bit DSP to process 160 bi-quad filters in realtime,
and to mitigate state/coefficient rounding problems like limit-cycling.
Picture
of the current Microchip dsPICDEM dev board:
Audio
examples:
Original male speech recorded from the radio.
Vocoded with fixed pitch sawtooth carrier.
Sounds very robotic.
Vocoded, using slight amplitude
envelope fed to sawtooth pitch input to sweep pitch
up with increasing speech amplitude. Sounds a bit more
natural. Presumably when we
speak louder our voice tends to also naturally go up in pitch? Okay, so maybe some peoples
voices go up in pitch more than others!
Raw sawtooth carrier swept more
drastically in response to speech amplitude envelope. Rawwww baby. It’s amazing how this can be carved into
anything that vaguely resembles speech.
Original speech sample vocoded
with the more drastically swept sawtooth
carrier. Sounding more
natural. There’s no proper
pitch-tracking or anything going on here.
Just simply sweeping the pitch of the carrier
excitation in sympathy with the instantaneous speech amplitude.
Speech vocoded again, but this
time with multiple detuned sawtooths as the
carrier. Each swept by the amplitude
envelope. Sounds like multiple voices. All the slightly detuned saws being passed
through the same filter bank voicing gives the impression of several different
pitched voices speaking in unison. You
can also get a similar effect using noise as the carrier.
Same trick with more detuned sawtooths
and slower attack and decay times on the vocoder. Sounds like lots of voices chanting. The slower attack and decay times make it
sound like more people all chanting together but with sloppy timing. Also sounds a bit “reverby.”
Vocoded speech, formant shift down 1/3 octave. Deep! Shifting too far down in pitch becomes hard
to comprehend quite quickly unless the words are spoken quite slowly.
Vocoded speech, carrier pitched up
and formants shifted up 1 octave I think.
Sounds more female or maybe child like.
Vocoded speech, carrier pitched up
more but less formant shift. Again
sounds more child like, or maybe chipmunk like!
Original female speech sample from radio.
Vocoded with monosynth
melody, and stereo chorus added afterwards. Similar to autotune?
Vocoded with white noise, formants
shifted down. Scary! You can also get some very sinister ghostly
voices by speaking slowly, using white noise excitation, and slow attack/decay
times to make it sound more indistinct.
Vocoded with white noise, formants
shifted up. Whispery!
Original male speech sample from radio again. Getting bored of hearing about fuel price
rises now!
Vocoded with synthesiser chords. Singing news reader.
Vocoded with synth
chords and formant shift down.
Sounds a bit more mellow.
Vocoded with synth
chords and rich stereo chorus added afterwards. Just being daft now…
Vococded with synth
chords, formants shifted up 1 octave and stereo chorus.
Thanks
for listening,
-Richie Burnett,