More vocoder examples

Technical data:

Processor is dsPIC30F6012A 16-bit DSC from Microchip running at 25MHz. Audio sample rate is 24kHz. Vocoder is 20 bands (20 analysis + 20 synthesis), each 1/3rd octave. Filters are 8th order Linkwitz Riley (Butterworth squared) band-pass response (24dB/oct slope each side.) 160 separate bi-quad filters in total. Lowest band is 75Hz, highest band is 7kHz, (although the bands in the lowest octave do little with speech signals.) Sibilance from modulator above 6kHz is sometimes high-pass filtered and mixed in with the modulated carrier to increase intelligibility. +3dB/oct spectral tilt is added for classic sawtooth/square based carriers, and -3dB/oct spectral tilt is added for white noise carrier to compensate for the natural spectrum of the carrier signal. No user-interface or spectral display yet! Formant shifting was done by directing the envelope detector level from each band to the gain-control element in another band. Multi-rate DSP tricks are used to allow this basic 16-bit DSP to process 160 bi-quad filters in realtime, and to mitigate state/coefficient rounding problems like limit-cycling.

Picture of the current Microchip dsPICDEM dev board:

Audio examples:

Original male speech recorded from the radio.

Vocoded with fixed pitch sawtooth carrier. Sounds very robotic.

Vocoded, using slight amplitude envelope fed to sawtooth pitch input to sweep pitch up with increasing speech amplitude. Sounds a bit more natural. Presumably when we speak louder our voice tends to also naturally go up in pitch? Okay, so maybe some peoples voices go up in pitch more than others!

Raw sawtooth carrier swept more drastically in response to speech amplitude envelope. Rawwww baby. It’s amazing how this can be carved into anything that vaguely resembles speech.

Original speech sample vocoded with the more drastically swept sawtooth carrier. Sounding more natural. There’s no proper pitch-tracking or anything going on here. Just simply sweeping the pitch of the carrier excitation in sympathy with the instantaneous speech amplitude.

Speech vocoded again, but this time with multiple detuned sawtooths as the carrier. Each swept by the amplitude envelope. Sounds like multiple voices. All the slightly detuned saws being passed through the same filter bank voicing gives the impression of several different pitched voices speaking in unison. You can also get a similar effect using noise as the carrier.

Same trick with more detuned sawtooths and slower attack and decay times on the vocoder. Sounds like lots of voices chanting. The slower attack and decay times make it sound like more people all chanting together but with sloppy timing. Also sounds a bit “reverby.”

Vocoded speech, formant shift down 1/3 octave. Deep! Shifting too far down in pitch becomes hard to comprehend quite quickly unless the words are spoken quite slowly.

Vocoded speech, carrier pitched up and formants shifted up 1 octave I think. Sounds more female or maybe child like.

Vocoded speech, carrier pitched up more but less formant shift. Again sounds more child like, or maybe chipmunk like!

Original female speech sample from radio.

Vocoded with monosynth melody, and stereo chorus added afterwards. Similar to autotune?

Vocoded with white noise, formants shifted down. Scary! You can also get some very sinister ghostly voices by speaking slowly, using white noise excitation, and slow attack/decay times to make it sound more indistinct.

Vocoded with white noise, formants shifted up. Whispery!

Original male speech sample from radio again. Getting bored of hearing about fuel price rises now!

Vocoded with synthesiser chords. Singing news reader.

Vocoded with synth chords and formant shift down. Sounds a bit more mellow.

Vocoded with synth chords and rich stereo chorus added afterwards. Just being daft now…

Vococded with synth chords, formants shifted up 1 octave and stereo chorus.

Thanks for listening,

-Richie Burnett,