“a lot of research into things that have very little meaning”

I usually have captions on for most videos/streaming, although I tend to turn them off on YouTube – because most of them are awful. Even major video producers seem to use automated captions that have barely been cleaned up. I can hear well enough that it’s easy to tell that they’re wildly inaccurate, they’re poorly formatted, and it’s often impossible to tell who’s talking on a video with more than one person.

But over the last few months, I’ve been pleasantly surprised by the folks at Polygon. I’m a long-time fan of the Monster Factory series, for which the captions have been somewhere between mediocre and competent.

In the last year or so, Polygon has upped their game, and I think it’s worth pointing out as an example of YouTube captioning done very well.

In addition to being clear, readable, and accurate, these captions go above and beyond to deliver a great experience for the caption-reading audience. They get the basics right and make captioning not just bearable but enjoyable for those of us who use them regularly.

Screenshots1)Unraveled screenshots are via the Unraveled Out of Context Twitter account, just so I didn’t have to spend four hours taking a few dozen screenshots. Thanks, I GUESS, to Dylan for noting that the tweets from @nocontextunrav didn’t have alt text, which meant that I ended up writing descriptions of images and realized that 85% of Unraveled is just Brian David Gilbert standing in front of a board with little pieces of paper on it. Or a whiteboard. below are all from Unraveled, which is an ongoing series of video game explainers. It’s one of the smartest, funniest things on the internet right now in any format, and the captions are a great part of that. These exemplify the work that Polygon is doing with captioning.

Actually using quotation marks where they make sense:

[image description: Brian David Gilbert, host of Unraveled, a younger white man in a suit and glasses (hereafter BDG), stands in front of a board with little pieces of paper, overlaid with a graphic of a basketball. Caption reads So we have “Adult Link,” “Child Link,” “Basketball.”]

Clearly indicating who is speaking:

[image description: Tweet contains four images, each is a split screen of what appears to be a Skype call with BDG on the right and his mother on the left. Each image is captioned with a line of dialogue from one or the other, starting with Mom – “I can’t tell you how, what an exciting time this is for me.”]

Including the fumbling of normal human speech:

[image description: BDG standing in front of a whiteboard with what looks like a complex formula. Caption reads I actually have… um, I’m a little confused now.]

Describing music:

[image description: Split screen of headshots, BDG on the right, his mother on the left. Caption reads Mom – *sings the entire Tetris song*]

Showing changes in speech volume (in the time-honored format of ALL CAPS IS SHOUTING):

[image description: two images, both of BDG standing in front of a board with many little pieces of paper with the only readable words being “TENETS OF THE SONIC BIBLE”. Captions read THEN EITHER SONIC IS A GOD/OR COULD KILL GOD]

Making important points important (in this case, there’s also some verbal emphasis, so it’s both indicating something important in the content and the sound that goes with that emphasis):

[image description: BDG (now with mustache), stands in front of a white board with lots of writing, headed with The 10 Keys. Caption reads BONUS POINTS for horny Todd.]

Not just describing a sound, but giving it emotional context:

[image description: three images of BDG in the same as the previous image, but close up, with expressions of increasingly uncomfortable laughter. Captions read *frightening laughter*/*pained laughter*/*pain*]

And then, sometimes, just commentary on the action. Is this overkill? Maybe. Does it increase my enjoyment of the experience, as someone who uses captions regularly? Absolutely.

[image description: BDG in a convention hall, sitting on the floor with his hair over his eyes. Caption reads *sits sadly*]

[image description: BDG holds up two spiral-bound texts labeled EVERY BOOK IN SKYRIM VOLUME ONE and EVERY BOOK IN SKYRIM VOLUME TWO. Caption reads *pained grin*]

[image description: red-tinted translucent BDG with murderboard overlaid onto image of The Legend of Zelda chess game. Caption reads *exasperated look at the camera*]

[image description: BDG standing in front of a board covered with small pieces of paper, some are renderings of video game characters, others have military ranks written on them. Caption reads *realizing an unfortunate truth*]

So: yay Polygon! Nice work! Unfortunately, it’s not quite perfect. My main gripe: captions overlaying text on the screen. Videos in this series often include extra textual information, like this Dark Souls Boss Title Generator:

[image description: rules for a Dark Souls Boss Title Generator using the first letter of your last name and the day you were born. No captions. (and no, I’m not typing out the whole thing. my apologies)]

I’ve “had” to watch them multiple times to be able to see the video with captioning, as well as the text covered by that captioning.

Occasionally these are the same thing: when someone is talking offscreen, there’s often text on the screen that’s both the same as the captions, and overlapping the captions. A true screenshot of this in context would have caption overlapping the the “Pat (offscreen)” text, unfortunately!

[image description: two images of BDG standing in front of a board with a grid of strings with a listing of military ranks down the left side. The first image has open-captioned text reading Pat (offscreen): Look at this Galoomba!. The second has closed-captions reading Brian: LOOK AT THIS GALOOMBA!]

Honestly, I think the solution should probably come from YouTube, with an option for a captions bar to appear below the video. In the meantime, it would be great if video creators took captioning into account in creating those visuals. (A lot like postcards and magazines design in areas for address labels and post office stickers, perhaps.)

And these aren’t just features of a single Polygon series! Below is a screenshot2)I actually took this screenshot myself, so I can use real alt text. from a video about what makes a good Star Wars game, and it has both great captioning, showing that the presenter is making a very specific noise, and not so great, because the captions overlay a bit of text in the video:

Screenshot of a video with captions reading Does it have the (blaster noises) overlaying text that reads 2. PEW PEW Does it sound like - additional text is covered.

But in comparison to other YouTube video creators, the folks at Polygon are doing a truly admirable job in making accessible work that is both competent and enjoyable. I hope others follow in their footsteps. 3)Bon Appétit, I’m looking at you.


Also published on Medium.

Notes   [ + ]

1.Unraveled screenshots are via the Unraveled Out of Context Twitter account, just so I didn’t have to spend four hours taking a few dozen screenshots. Thanks, I GUESS, to Dylan for noting that the tweets from @nocontextunrav didn’t have alt text, which meant that I ended up writing descriptions of images and realized that 85% of Unraveled is just Brian David Gilbert standing in front of a board with little pieces of paper on it. Or a whiteboard.
2.I actually took this screenshot myself, so I can use real alt text.
3.Bon Appétit, I’m looking at you.

Author: Elaine Nelson

Elaine Nelson was directionless with an English degree in the late 90s and then: GODDAMN INTERNET. In her current gig, she wrangles content and content management systems, but her last job was Webmaster, so she's dabbled in all sorts of web work. She's an editor at The Interconnected, previously published in The Pastry Box, and once had a poem published in an anthology of GenX writing, when that was the big new thing.