I thought that some people might be interested in how an audio drama podcast is put together. This isn't about all podcasts, but it's how I did the sound for Earbud Theater's summer 2018 double-sized episode, The Current.
Grab a good pair of headphones, or crank up some decent stereo speakers, to get the most from the sample audio. Oh, and if you haven't listened to the episode yet, you probably want to do that.
It starts, of course, with the story. Jared Rivet wrote the script and directed the episode. The cast was gathered and recorded, and the selects (preferred takes) were assembled by Casey Wolfe and shipped to me, hundreds of miles north of where Earbud's primary talent base lives.
A challenge is that we are rarely able to record in an actual studio. It's usually a conference room, and that means air conditioner noise, room reflections, and audio from the various actors bleeding into each other's microphones. Sometimes the sessions can even be split across multiple rooms. When I first get the episode, it sounds like this:
My first step is always removing noise and isolating one track per character. In the past I've been very surgical about the noise removal. Now iZotope RX 6 has a Dialogue De-noise module which lets me do a really good job on huge batches of audio. (It may seem as though this is an iZotope commercial. It isn't. I'm just a very satisfied user, and their software is a vital part of my workflow.) They also have a De-Bleed module which lets me mute the sound of one actor bleeding onto the track of another.
I then drop three more iZotope plugins on each character track: De-Plosive (for when puffs of air hit the mic), Mouth De-Click, and then Alloy 2 running as a noise gate. A gate is just an automatic way of turning down the volume when there's no signal. Used judiciously it doesn't affect the sound of the performance, and gives the impression of a very low noise floor.
Here is what it sounds like after that pass. I've artificially panned characters broadly across the sound stage so you can tell how well they are isolated:
Now we start the fun part: Putting this into a believable space. The first step is often adding ambience. That's built up from multiple layers. Since The Current is set on a boat on a lake at night, we cue up Earbud Theater's favorite chorus, the crickets. I added two more layers of wind (from the people who provide David Lynch with his winds), then waves lapping on a boat. Finally a recording of some walla (unintelligible background voices) for a distant party. You can hear those layers build up here:
Filtering and Pitching
This episode required five kinds of voice filtering. We had Carly's walkie talkie, which needed to sound small and close to her. Over on the left we had the radio that the divers communicated with. Carly had her bullhorn, and then there was the muffled sound of the divers talking through their dive masks while still on the boat. Those effects are achieved primarily through EQ, again using iZotope Alloy 2, followed by some distortion from iZotope Trash 2. In the case of the muffled mask there's also a convolver reverb called Space (more about that later) placing the voice inside a small cup. The bullhorn was Alloy 2 again, using a preset that essentially comb filtered, saturated, and heavily compressed the voice. And, finally, Carly's "internal" monologue used another Alloy 2 preset, this one meant for hip-hop background singers. Go figure. This clip has a quick example of each:
Our cast was all adults, and we needed some voices to sound a little younger, so they were tweaked slightly using the PolyVox module of iZotope Vocal Synth 2 to pitch up their formants. The effect is subtle. Take a listen to Travis without Polyvox, followed by Travis with Polyvox:
Creating The Current's Voice
While the malevolent force at the bottom of the lake isn't articulate, I felt it important to give it a subtle vocal quality. The base sound was a combination of some submarine ambience and a low-pitched slow pumping machine. It sounds OK on its own:
Are we happy with that? No. We are not happy. Fortunately we have VocalSynth 2, which boasts a Talk Box module. Readers of a certain age will remember Peter Frampton making his guitar talk with one. This is a digital simulation. All I needed was a vocal performance, so I recorded myself:
Ooh! Who else smells an Emmy? Anyway, let's use that to modulate the raw current effect using the Talk Box:
Subtle, but more organic and less mechanical than the raw sounds.
Naturally we can't skip the step of adding sound effects. For this episode I was able to find just about everything I needed in my sound effects library. It's a matter of locating something that sounds right, often layering multiple effects together, to get the desired result. For the most part each effect gets its own track. Pretty soon the Pro Tools session looks like this:
Some of the footsteps and bumping around were performed Foley using the Edward Ultimate Suite from Tovusound. It allows using a MIDI keyboard to perform right to picture (as when working on a film) or to "sonic picture", as in this case. As effects are added the dialogue timing is adjusted to fit.
Remember I mentioned that convolver reverb? I used two of them on this show: Space, and Waves IR-1. They both function in a similar way, and have totally transformed how I work. It turns out that you can record an IR, or "impulse response" of a space and use math to make any source sound like it's getting the reverb of that space. Lucky for me, computers are really fast at math.
Sharp-eared listeners already noticed that even the filtering samples above had a sense of space to them. They were being convolved with an IR sampled at a location ominously called Divorce Beach. And the dive masks were, as I said, the inside of a glass. In episodes where the action moves from one place to another I have several of these reverb returns set up, one for each environment, and automate the sending of audio to the appropriate one. That's how we can get out of a car, go outdoors, and then into a house and have you believe that you're really in those spaces. I did a lot of that for our 5-part serial, After the Haunting. This time it was all outdoors on a boat, except for Carly's watery demise. Compare and contrast:
Parallel with all of this is the addition of music. For this episode the entire score was written for us by Brandon Moore. It really kicks the episode up to a new level. Check out a scene without, then with music.
You can listen to Brandon's entire score for the episode on his web site. Check it out. It's awesome.
By this time the director has heard a lot of IP (in progress) versions of the episode and is getting weary of my emails. The goal here is to get all the right sounds in all the right places. I usually start panning the characters and sound effects to roughly their proper spatial locations. That's to make close things sound close and distant things sound distant, and to present a stereo sound stage where you can localize everything that's happening. But we're not done yet.
At this point we know how long the episode is. We know what the dialogue, effects, and music are. But it still doesn't sound right. It's kind of a mess, actually. That's when I shift gears and reconfigure my studio (read "carry the computer downstairs") for the mix. All of the final decisions are made listening to my reference system, which is built around Wilson Audio Specialties Sasha Series 1 speakers.
We won't go deep down that rabbit hole. Suffice it to say that if you want the best speakers possible just get the Wilson Audio Specialties loudspeakers that fit your budget.
At long last, after a little mix love, we get our final product.
That's a Wrap
I hope this glimpse behind the curtain gives you a better idea what your favorite podcasts are doing to bring you a compelling and rich sonic environment. It's a lot of fun.
Who knows? Maybe now you'll want to re-listen to the whole episode and see if you can pick apart all the elements. Maybe drink in my moving performance as The Current.
Or just scare yourself again.
If I left you a little confused about anything, leave a comment and I'll try to confuse you more thoroughly.