YMMV: Write for Audio

📻 Girls, rock your boys, get wild, wild, wild...

I've blogged before about how, in art, form follows format. If I may quote myself:

"...there's a reason hour-long television shows used to have a 4-act structure with A and B plots--they were built around commercial breaks and you needed things to cut between. Then, with the rise of prestige television and on-demand streaming, that formula largely disappeared and was replaced with serialized cinematic storytelling because that's more bingeable."

This holds true across media, and right now fiction is cross-format, existing in print, digital print, and audio. The preferred format is still print, but I have a feeling that this is going to change in a few years and that we should all...

Write for Audio

Let me first admit my bias. The physical act of reading has always been a bit of a struggle for me (hellooooooo ADHD), so I consume most of the books I read as audio. And there are certain things that books do when translated to audio that bug me. Scene breaks don't translate cleanly at all, for example. There's an extra long gap, but since everyone listens to everything at double-speed anyway, that space can get lost, and suddenly you've changed perspective character without it being explicitly flagged.

Books that have a lot of art or accompanying PDFs irritate me as well, because I feel like I'm being excluded. I was listening to Brandon Sanderson's Yumi and the Nightmare Painter and periodically the narrative would stop to describe the accompanying picture! That's just a bad user experience. Celebrity memoirs are particularly bad about this. You want to listen to the audiobook because it's read by the author and the author is frequently a comedian, but then there are always accompanying childhood photos that you feel like you're missing out on. Ben Folds' memoir included piano pieces in the audio book and personal photos in the print edition. Which version am I supposed to get, Ben!?

And don't even get me started on footnotes. Nope, it's too late, I'm starting. I hate them. I hate them in academic pieces, I hate when Terry Pratchett uses them, I hate hate hate them. I compulsively feel like I have to read them, but they always break the flow of the narrative, and I never feel like the effect is worth it. I'm a huge Eddie Izzard fan--I've seen her live three times now and can probably recite Dressed to Kill from memory ("in heels, as well")--but couldn't finish the ebook of her memoir because of all the gull-durned footnotes. And there's not consistent way to hit them in audio. They're frequently dropped, but sometimes they're clumsily inserted, and sometimes they're just... alluded to? I was trying to listen to Infinite Jest and the narrator announced that the footnotes were a huge part of the story, but there wasn't a good way to include them, so whenever one came up there would be a beep to let you know that one was there so you could look it up while following along in a print edition.

The f**k? I'm not going to follow along in a print edition. I'm driving!

Emily St. John Mendel's Station Eleven has graphic novel pages that aren't even alluded to in the audiobook, so it just feels incomplete. But I think the absolute worst was Edgar Cantero's Meddling Kids, which has whole passages that are script pages and are read verbatim--with every attribution given before every line of dialog.

READER

That sounds tedious.

It's even worse than you're imagining.

READER

How?

Just read this out loud, including the names.

READER

Oh goodness...

Yep.

READER

And there's no blocking or--

Nope.

READER

And it goes on for--

Pages.

READER

Oh no. Oh no no no. This is awful. Dear God! Let me out of here. HAAAALP!

You get the idea...

None of these were written with audio in mind. Much like electronic editions, it's something of an afterthought, not something that's baked into the work from the get-go. And I am a firm believer in the power of intentionality. And this means thinking about how your work is going to be presented in audio while you're writing it, and this means having an understanding of what the differences are when a word leaves the page and enters your ears.

So what is it that changes?

First off, punctuation gets very fuzzy. It's not going to be obvious where paragraph breaks are, the difference between an em-dash and a parenthetical aside, or the difference between a period and a semi-colon. Type-faces too. Italics will cue the audiobook reader about what things to emphasize, but the audiobook reader is going to emphasize a lot of things that aren't italicized. This means that your main tool is going to be word choice. So it helps to go through a draft and look at places where you have relied on punctuation or italics to help convey meaning, and see if you can do that with prose alone.

By extension, it means your sentences need to have a digestible rhythm to them. Your paragraphs need to have very clear entrances and exits. Because written word shouldn't be conversational and doesn't even need to feel conversational, but spoken word necessarily must, and that means striking the balance between the two. And that means having a lot of care and deliberation in your editing, which we will talk about in a future post.

Long exchanges of pure dialog get tedious in audio. The dialog tags "asked" and "said" are largely invisible on the page, but when read aloud they start to stand out. So when writing with audio in mind, it becomes more important to intersperse dialog with other action, other character business, because this allows you to do more indirect tagging. You do have to be careful, because there's no longer a visual signifier that delineates dialog from narrative, but a good audiobook reader will voice them differently anyway, so it's not a huge deal. What becomes trickier is differentiating between literal thoughts and literal speech. Consider the following paragraph:

Karen put her key in the ignition. Why was she doing this? "Let's see what this baby can do," she said. Here goes nothing!

We have a line of narration followed by a line of free indirect speech followed by a line of literal speech followed by a line of literal thought. How do you differentiate those? There is not a standard way to do this in audio. So it becomes incumbent on you, the author, to not stack them so close together. They're all useful tools to have in your tool box, so I'm not advocating eschewing any of them, but you do have to be deliberate in how you apply them.

In fact, it's a good idea to be erring on the side of clarity anyway. You can over-explain things a little. It's a lot easier to lose the thread of the story in an audiobook because it keeps playing when you get distracted, and you can't just jump up and re-read a paragraph like you can print. But if you need to re-read because you can't quite parse what you just heard, that's just hella annoying.

You also need to keep chapters a consumable length. When you're listening to an audiobook, it's nice to be able to stop listening at a chapter break. Since people listen to these things on commutes or while running errands, having a chapter that takes ten or fifteen minutes to consume is a pretty reasonable length. This is another reason I advocate for 3,000 word chapters. That's about twenty minutes of reading time, but remember that nobody listens to these things at regular speed. Longer chapters are, of course, fine depending on genre expectations, but it is useful to be consistent from chapter to chapter. A listener should be able to start another chapter with a rough idea of how long it will take to finish. This is another reason I advocate writing to a chosen word count.

To give an extreme example of this being done badly, the chapter The Final Battle from the final book in the Wheel of Time sequence is over 50,000 words long. Once more for the kids in the cheap seats: This chapter is over fifty thousand words. It is broken up into many, many individual scenes. It's really cool on the page, because that length is making a statement. But it borders on word salad in audio--not because it's badly written, but because you just kind of get lost in it. You cannot consume that in one sitting, and there are not obvious places to stop, so you lose the arc of the chapter. Because it's not a chapter; it's a freaking novella.

Character names and spellings should be kept fairly intuitive. They don't need to be absolutely basic, but the last thing you want is for someone to hear a name and read it and not realize that those are the same. Visual aids should be non-essential. By all means include them. They're cool, but someone should be able to comprehend your epic fantasy without having to consult the maps at the beginning. And cross-references like footnotes, endnotes, and appendices should also be kept non-essential.

Now, exceptions abound to everything. Short fiction does get podcasted a lot, but flash fiction doesn't. Given that flash fiction is such a great place for experimental storytelling, you can pretty much ignore all of this advice in that case. Also, feel free to be experimental in longer form. Do not let me talk you out of it. Sometimes you have a really awesome idea that only works on paper and, you know what, just shine on you crazy diamond. Go to town! Or perhaps you're working in a medium where audio just isn't very likely to ever be an issue. I use italics liberally in my blog because it will never be anything other than a blog.

Because I have been informed that if I ever start a podcast I will be putting my marriage in jeopardy. Which is fair.

So, I've talked a lot about how, but I haven't really gone into why I think this is important. I've talked about why it's important to me personally, but not why I think it's a good practice in general. And the short answer is... it's just a hunch.

Given the current state of GenAI tools and their ability to create believable human speech, I think in the next ten years audiobooks are going to become far cheaper to produce and less expensive to purchase. Hell, if you have a library card, you can already download many, many titles to your phone, but this is not something people necessarily think about when they hear the word "audiobook" or "library." But it's just a matter of time before someone assembles a pool of voices to train an audio language model on with enough fidelity to be commercially viable. You give it a script or a recording of the author reading their own work, and it gives you back a recording. The tech isn't there yet, but it will be sooner rather than later. This reduces the barriers to entry for an indie publisher immensely, especially if the burden of proofreading... er... proof-listening... is shifted to the creator.

And you can bet your ass the big houses will be getting in on this too. It would not surprise me in the least if I heard that Tor was doing this already training up voice models from their many, many hours of Michael Kramer recordings with an aim of eventually licensing that voice from his estate. Which is a hella cynical thing to say, but it's not like they haven't been caught recently using AI-generated book covers.

I also think that there will be a shift away from prioritizing physical media in general. A lot of people prefer physical books--I'm married to one of those people--so they will never ever go away, but the last five years have taught us that complex supply chains are brittle and paper can get expensive. I think the big publishers are going to start thinking about digital presence a more aggressively than they do right now, and it's going to be focused on audio because the ebook markets are being overrun with bots. This hasn't happened with audio yet. Yet. Look, the industry is a garbage fire right now, but eventually some new blood is going to come in and the landscape is going shift, and I honestly believe that this shift will include pivoting towards cheap and accessible audiobooks and more of a presence in the podcast-sphere.

But, who knows, I've been wrong before.

Next week, I'm taking the week off. I'm burning through my material faster than I'd expected and I need to re-jigger my schedule. Also, these posts skew much longer than my normal blogging and I have things to do. Helldivers 2 ain't gonna play itself.

After next week, we're going to start looking at how we edit...

In YOUR MILEAGE MAY VARY, Kurt is outlining some of the more unusual bits of authorial wisdom he's amassed over the years. See more posts.

Comments

ConFigures said…

Contrarily, I hardly ever listen at doublespeed, or anything but as recorded (though I'll hit the skip 30 seconds ahead liberally). But even I get annoyed at audiobooks that handle breaks badly, and at podcasts/audio dramas that jump into commercials without a warning ding or any sort of audio cue.

April 14, 2024 at 5:24 PM

Kurt Pankau Dot Com

Search This Blog