Summary: You can create a video from an audio clip by adding a static image, upload it to YouTube (privately) and it will add subtitles / closed captions automatically. Download the subtitle file (which is time-stamped) and you have an audio transcription of your interview. You can use a Python script to remove the timestamps.
[Stop press: here’s a post from 2014 ‘Dirty, Fast, and Free Audio Transcription with YouTube‘ that recommends using TunesToTube to upload an mp3 file to YouTube directly without needing to make a video. It seems in 2014 that the subtitle quality was poorer, looks like it has improved since. In 2014 it also didn’t seem to include timestamps in the subtitle file though.]
This is a more of a suggestion rather than a solution. I’ve not tried it in anger and there may be a balance between saving you effort and creating more work for you (while it looks like cutting out timestamps can be automated you still have to correct the text and knit it back together) – but I thought it was worth highlighting the possibility.
I’ve been impressed with Google’s subtitling capabilities – both live during Google Meet events and on uploaded YouTube videos. As part of my job I’ve been editing videos (using free iMovie, bundled with Macs) and uploading them to YouTube. Each video is about an hour long and not long after uploading it to YouTube subtitles are automatically added and they’re fairly accurate.
You can download a copy of the entire subtitles and you end up with something that looks like this. The text below comes from a video I edited of my boss giving a Zoom presentation to computing teachers on making a Turing Machine, out of chocolate.
Image 1: Above is a screenshot of a resulting subtitle file divided into ‘paragraphs’ each containing the equivalent of approximately five seconds of speech, with a timestamp above the text.
If used as a subtitling file the timestamp would map the snippet of text below it to an on-screen subtitle. Each block is about five seconds. Imagine how many blocks you’d have if you’ve interviewed someone for an hour or more (I tested it on a friend’s and it stretched to 48 pages of a Word doc!) – it’s 60 mins x 12 [lots of 5 second block] = 720 blocks.
So at the end of this process you’ll end up with an enormous .sbv file (which opens with TextEdit on a Mac and I’d assume Notepad will also open it). There will be a lot of timestamp wrangling if you need to produce a text containing just the speech parts. If your interview is rather long it may seem less onerous to write the transcription afresh than fight with several hundred timestamps, deleting and back-space-ing to clean it up.
Try automating the removal of timestamps
You can partially automate this by using a copy of this Python script (I’ve not tried this bit, but it seems you just upload a copy of your file (though I haven’t worked out how!) and it does the rest). I say a copy of your file as probably want to keep a version that also has timestamps in, particularly if you’re coding the interview and need to know when something was said.
This is the results of a Google search for .sbv remove timestamps which shows other solutions.
So from a technical point of view this is straightforward, but its success will depend on how easily Google can translate the speech, and how much the resulting output file (a mass of text) suits your needs.
How to do this, example in iMovie
- First create your video
- Import your audio file into a new project – maybe test the system to see how well this will work for you with a short 2m clip first.
- Import any image you like, then add it above your audio and ‘stretch’ it until the end of the audio clip – this creates a video that contains a single image and your audio track.
- ‘Share’ it as a file
- Add it to YouTube
- Upload as a private or unlisted video
- Download the video’s subtitles. Go to https://studio.youtube.com/ » Click Videos » Click on the relevant video to bring up the ‘Video details’ page » Click Subtitles in the menu on the left
- (See image below) Hover over the subtitle record (it will likely say “English (Automatic)” until the 3 dots appear on the right, click them and choose Download. You can also ‘Edit on Classic Studio’ which lets you correct the subtitles as you ‘watch’ (listen) to your video before downloading. As you type your corrections the playback is paused.
- You’ll end up with an .sbv file
- Make a copy (it may be useful to have a copy that still has timestamps)
- Either remove the timestamps manually if it’s a small clip, or try something automated, see ‘Try automating the removal of timestamps‘ above for links.
Image 2: A screenshot showing two views of the same subtitles panel on YouTube. The upper view indicates the three dots (at the right) that need to be clicked to bring up the options; the lower view shows what the options are once clicked, including Download.
Featured image in blog post header: a view of the iMovie editing window split in two with the image file above (I’ve used a copy of a test card from Wikipedia, the image used was made by Zacabeb) and an audio clip below it.
For people with Android phones this may be of interest