Getting Google / YouTube to do your audio transcription for you – up to a point

Summary: You can create a video from an audio clip by adding a static image, upload it to YouTube (privately) and it will add subtitles / closed captions automatically. Download the subtitle file (which is time-stamped) and you have an audio transcription of your interview. You can use a Python script to remove the timestamps.

[Stop press: here’s a post from 2014Dirty, Fast, and Free Audio Transcription with YouTube‘ that recommends using TunesToTube to upload an mp3 file to YouTube directly without needing to make a video. It seems in 2014 that the subtitle quality was poorer, looks like it has improved since. In 2014 it also didn’t seem to include timestamps in the subtitle file though.]

This is a more of a suggestion rather than a solution. I’ve not tried it in anger and there may be a balance between saving you effort and creating more work for you (while it looks like cutting out timestamps can be automated you still have to correct the text and knit it back together) – but I thought it was worth highlighting the possibility.

I’ve been impressed with Google’s subtitling capabilities – both live during Google Meet events and on uploaded YouTube videos. As part of my job I’ve been editing videos (using free iMovie, bundled with Macs) and uploading them to YouTube. Each video is about an hour long and not long after uploading it to YouTube subtitles are automatically added and they’re fairly accurate.

You can download a copy of the entire subtitles and you end up with something that looks like this. The text below comes from a video I edited of my boss giving a Zoom presentation to computing teachers on making a Turing Machine, out of chocolate.

Eb20rlyXkAIRzBX.png
Image 1: Above is a screenshot of a resulting subtitle file divided into ‘paragraphs’ each containing the equivalent of approximately five seconds of speech, with a timestamp above the text.

If used as a subtitling file the timestamp would map the snippet of text below it to an on-screen subtitle. Each block is about five seconds. Imagine how many blocks you’d have if you’ve interviewed someone for an hour or more (I tested it on a friend’s and it stretched to 48 pages of a Word doc!) – it’s 60 mins x 12 [lots of 5 second block] = 720 blocks.

So at the end of this process you’ll end up with an enormous .sbv file (which opens with TextEdit on a Mac and I’d assume Notepad will also open it). There will be a lot of timestamp wrangling if you need to produce a text containing just the speech parts. If your interview is rather long it may seem less onerous to write the transcription afresh than fight with several hundred timestamps, deleting and back-space-ing to clean it up.

Try automating the removal of timestamps
You can partially automate this by using a copy of this Python script (I’ve not tried this bit, but it seems you just upload a copy of your file (though I haven’t worked out how!) and it does the rest). I say a copy of your file as probably want to keep a version that also has timestamps in, particularly if you’re coding the interview and need to know when something was said.

This is the results of a Google search for .sbv remove timestamps which shows other  solutions.

So from a technical point of view this is straightforward, but its success will depend on how easily Google can translate the speech, and how much the resulting output file (a mass of text) suits your needs.

How to do this, example in iMovie

  1. First create your video
    1. Import your audio file into a new project – maybe test the system to see how well this will work for you with a short 2m clip first.
    2. Import any image you like, then add it above your audio and ‘stretch’ it until the end of the audio clip – this creates a video that contains a single image and your audio track.
    3. ‘Share’ it as a file
  2. Add it to YouTube
    1. Upload as a private or unlisted video
    2. Wait…
    3. Download the video’s subtitles. Go to https://studio.youtube.com/ » Click Videos » Click on the relevant video to bring up the ‘Video details’ page » Click Subtitles in the menu on the left
    4. (See image below) Hover over the subtitle record (it will likely say “English (Automatic)” until the 3 dots appear on the right, click them and choose Download. You can also ‘Edit on Classic Studio’ which lets you correct the subtitles as you ‘watch’ (listen) to your video before downloading. As you type your corrections the  playback is paused.
  3. You’ll end up with an .sbv file
    1. Make a copy (it may be useful to have a copy that still has timestamps)
    2. Either remove the timestamps manually if it’s a small clip, or try something automated, see ‘Try automating the removal of timestamps‘ above for links.

 

How to access the automatic subtitles on a YouTube video you have uploaded

Image 2: A screenshot showing two views of the same subtitles panel on YouTube. The upper view indicates the three dots (at the right) that need to be clicked to bring up the options; the lower view shows what the options are once clicked, including Download.

Featured image in blog post header: a view of the iMovie editing window split in two with the image file above (I’ve used a copy of a test card from Wikipedia, the image used was made by Zacabeb) and an audio clip below it.

For people with Android phones this may be of interest

 

Using the Internet Archive’s Wayback Machine to find copies of deleted pages

Ever clicked on a link and found that the page doesn’t exist? This post is for you.

1. Manage your expectations
2. Check search engine caches for recent deleted pages
3. Try the Internet Archive’s Wayback Machine for much older pages
4. Other relevant posts on this blog
5. Troubleshooting and alternative options

1. Manage your expectations

Although there are several tools to uncover deleted pages there’s no guarantee that you’ll find the page you’re looking for. Not all websites are captured or sometimes the particular page you want hasn’t been saved. Best to always approach these searches as a pleasant surprise if you find anything.

2. Check search engine caches for recently deleted pages

Search engines index websites by crawling through all their links, they sometimes keep a cached copy of the page. When you type in a search term and press enter you’re shown a list of possible hits and if you click on the main link you’ll go straight to the page. On Google and Bing (and I’m sure many other search engines) this tiny little arrow will show you a copy that the search engine has saved in its cache. No arrow = no available cache.

Click arrow to access Google cache

Cached pages are continually overwritten and updated so the cache of a page deleted today may disappear in a few days so this option only works for recently deleted pages (sometimes it works for tweets too, try searching for the person’s profile and see if anything shows up.

If you find what you’re looking for you might like to save a copy of the webpage as a file (eg in Firefox this is File / Save Page As…) or save it as a screenshot.

3. Try the Internet Archive’s Wayback Machine for much older pages

If you don’t find a copy using a search engine then try the Wayback Machine. This tool captures all sorts of websites automatically but people can also ask it to save a copy of a website (from now onwards) if it’s not currently there.

Go to https://archive.org/

Internet Archive Wayback Machine.png

and type in the address of the website (homepage) or particular link (blog post etc) that you’re interested in, then press enter on your keyboard or click anywhere outside the text box.

Wayback Machine with address typed in.png

Either you’ll see a page telling you nothing’s been saved (see 5. Troubleshooting and alternative options) or you’ll see something like this.

Wayback results page example.png

This tells me that pages from this very blog have been saved 23 times in three and a half years and I can use the year tabs at the top to scroll back. Each black bar represents a month, its length indicates the number of copies made. Here’s 2016 – two copies saved – one on October 17th (highlighted) and another on 14 November.

2016 saves for this blog on Wayback Machine from Internet Archive

To access the saved copy hover over the the blue dot on the date it was collected and a moment later the little pop up will show with a link to one or more snapshots taken. The timestamp is the link to a copy of the site / page taken at that time on that date. Click to visit, the example for this website is below – you can see that the numbers in the link relate to the year, month, day and time

https://web.archive.org/web/20161017175533/https://howtodotechystuff.wordpress.com/

There’s a video showing the full process below (includes a slight delay as the archived page takes longer to open).

4. Other relevant posts on this blog

5. Troubleshooting and alternative options

Sometimes a page you’re after hasn’t been captured and that’s the end of the search. You might be given the option to look at all pages within a site so that’s worth a look. I’ve also  been presented with a page that looks like this – it’s displayed while you’re redirected to something. Before closing the tab you might as well wait and see where you end up.

Wayback Machine redirect notice.png

Page I was trying to reach: https://web.archive.org/web/20130929051516/https://twitter.com/JoBrodie/followers

Page I ended up being taken to: https://web.archive.org/web/20171002100900/https:/twitter.com/login?redirect_after_login=%2FJoBrodie%2Ffollowers – you can see where you’re going to be redirected to on the page (though you won’t know what it looks like until you’ve been redirected there).

There are other services like the Wayback Machine, here’s a selection.

It’s also helpful to search Twitter and search engines for references to the page you’re after. Even if your page has gone people might have taken screenshots and shared them via Twitter or in blog posts / newspaper articles.

 

 

How to create bookmarks / anchor tags on Google Docs documents

Sometimes you want to link within a document so that when you click a link you leap to the relevant bit without having to scroll, this can easily be done in Google Docs using the Bookmark facility. On web pages it’s usually known as anchors.

The thing buried at the bottom of your document that you will bookmark is ‘the pointee’ and the link in your table of contents that points to it is ‘the pointer’. The pointer points to the pointee 🙂

Quick ‘tl;dr’ instructions

  1. Select your Pointee (the bit of area you want to point to), click Insert » Bookmark
  2. Select Pointer (text that will hyperlink to it), click Insert » Link and expand the Bookmarks section, to select your bookmark. Done!
  3. An example here: make a copy of this document to edit and try it yourself.

 

Detailed instructions with pictures

  1. Select the text of the Pointee. It could just be the first word in a heading or paragraph, or a full sentence.
    01 Select the Pointee
  2. From the menu at the top click Insert » Bookmark, it will then have a little blue flag next to it.
  3. Select the text of the Pointer. That could be a word or phrase in a table of contents, or any word or phrase that you want to make clickable so that clicking it takes the reader to the pointe04 Select the text of The Pointer
  4. Click Insert » Link, then expand the Bookmarks sections (it’ll have just one bookmark in it for now) and select the one you’ve previously created. Click Apply.
    05 Insert a Link
    06 Expand the Bookmarks
    07 Click Apply
  5. Go to your pointer, click on its new link, it will pop up a “where you’re going to be taken to” tiny window, click on the link again and off you go.
    08 Click the Pointer link
  6. Try it out yourself, make a copy of this document and edit it.

 

 

 

Occasional workaround for reading US websites which are skittish about EU visitors, GDPR and cookies

tl;dr version
Search for the URL itself, read the cached copy.

Audio version of this post:

Recently there has been a spate of American news sites returning a page that says the content isn’t available to me since I’m in the UK / EU and, because I’m subject to some unspecified horror to do with the GDPR and cookies, the website is worried about me seeing it and hopes I might just go away.

About 90 per cent of the time this problem is rapidly solved by searching for and reading Google’s cache of the page. The appearance may be a little different but the text is usually there and perfectly readable. Here’s an example of how to do this.

Yesterday I wanted to read the awful story about a young black woman who died after it was assumed she’d not be able to pay for the ambulance service that she needed. Her mother had found her slumped in the bath after she’d collapsed with a suspected stroke. She’d given birth via C section a few days before.

Here’s the address I clicked on (via a tweet)

https://www.wpbf.com/article/mom-of-woman-who-died-claims-medics-assumed-daughter-couldn-t-afford-ambulance-ride/22558170

On clicking the link the page said

Screenshot 2018-07-28 11.04.50.png
Fig 1. “Sorry, this content is not available in your region.”

Try this – it doesnt always work though
The next stage is to copy that address / URL (the wpbf.com bit next to the green padlock) – the quickest way to do that is to put the cursor into that address bar, it should automatically select the URL but if not Ctrl+A will do that. Then Ctrl+C to copy and open a new tab with your preferred search engine and paste (Ctrl+V) into the search bar and search [see also: handy keyboard shortcuts]

Screenshot 2018-07-28 11.04.21.png
Fig 2. Search results returned after searching for the web address / URL itself

Ignore the top stories option. You might just about be able to make out a tiny little green arrowhead pointing downwards to the right of the green URL for this search result. That’s where Google hides the cache of its pages. Here’s a close-up.

screen-shot-2016-10-16-at-21-24-12
Fig 3. Where to find cached copies of pages, if available

Clicking on the green arrow will bring up a menu saying ‘Cached’ and clicking on that usually, but not always, bring up the page you want – it did in this case too.

Screenshot 2018-07-28 11.30.21
Fig 4. In this instance the cached copy was available and readable

The entire text is visible but for copyright reasons I’ll leave it at that. Here’s the link if you want to read it yourself, it’s a sobering read.

This is a very useful and more widely applicable trick
There are other cases (*cough*) where content isn’t shown to you, for all sorts of un-GDPR related reasons. It is nearly always worth checking the cached version first before either admitting defeat, asking a friend for a copy or reading a different newspaper’s story.

For the exceptionally patient
At the bottom of Fig 2 there’s a paragraph of text beneath the green URL and the green padlock. Google can nearly always read the page (whether there’s a cached version or not) even if you can’t. If you search for a phrase that appears there (put it in ” ” marks when searching) then Google will show that phrase in the search results, often in context which means it may show other bits of surrounding text. Frankly it takes ages but it may be possible (I’ve done it to uncover and reference a quote for work once) to work your way through very slowly and uncover a large portion or even the entirety of the otherwise hidden text.

Further reading
Google cache (& other search engines): finding deleted pages or seeing your words on the page in colour (this blog)

• Google is fiddling about with mobile search results, using ‘AMP’. Not sinister, bit annoying though

tl;dr
You might have noticed ‘amp’ appearing in mobile search engine results on Google. This began in Oct 2015 and makes mobile pages load much faster (effectively loaded from Google’s cached copy), but the page looks like it’s from Google, quite a few users who’ve noticed it have found it puzzling and it’s a bit fiddly to share the ‘real’ address. Your device hasn’t been hacked and it’s not particularly sinister but lots of web publishers are a bit ‘hmm’ about it and feel Google’s put its metaphoric bag on the seat next to it and taken up a bit more space.

Recently I was mildly alarmed / irritated to notice that a page I’d failed to open on iPhone Safari (that had nothing to do with Google) somehow had ‘Google’ at the top of the page, instead of ‘The Guardian’, and the URL had ‘amp’ in it – I briefly wondered if I’d been hacked or something exciting like that, but it turns out – no, nothing quite that sinister but this new amp thing is annoying plenty of people, though when it does work it can actually make pages load ridiculously fast (which is great). AMP stands for Accelerated Mobile Pages.

Before I discovered that, and while trying to open the Guardian article I retraced my steps which showed me that ‘AMP’ was appearing in a few of my search results, next to a lightning bolt, and I found that it wasn’t always that straightforward to remove it from the address, to get the right link*, because it seemed pretty well embedded into the address.

I’ve just recreated the experience, with an example that turned out to be fairly straightforward to edit (I was hoping to find the one that wasn’t but couldn’t remember what I’d originally searched in November).

A more recent mobile search was for the frequency of the chiltern radio beacon╚ and the search results included the following amp-containing URL https://www.google.co.uk/amp/s/amp.theguardian.com/media/2008/jun/26/gcapmedia.radio, after deleting the bits in bold gave https://theguardian.com/media/2008/jun/26/gcapmedia.radio  which worked fine. Note that if the website doesn’t support https then you might have to delete that bit too to make it work. Or use a different search engine! I’m reluctant though, on principle 😉

In the replies to Deb’s tweet above someone has highlighted an applet that will return ‘canonical’ (for purposes of argument this just means correct^) URLs though I’m afraid it’s github which is beyond my technical skill.

Ardan (according to their bio) works for Google search.

Further reading
Google Helping Mobile Publishing? Some Publishers Are Not So Sure New York Times (1 January 2017)

Google will change AMP display to make it easier to find & share publishers’ direct URLs Change will be to the header in AMP content, expected in early 2017  Search Engine Land (21 December 2016)

Footnotes

╚ If you are not far from the Chiltern radio beacon (a non-directional radio beacon / aviation navigation aid) you can hear it emitting its Morse callsign (C -.-. H …. T – for Chiltern) on 277MHz. I once found it by accident and was intrigued, wanted to find it again. It also features in a song.

*Related to this – if you’re sharing a link to Wikipedia from your phone please remember to delete the m otherwise you send readers on PCs to the mobile version (they can select the desktop version by scrolling to the end of the page, which is a bit of a faff). If you share the non-mobile version then people on mobile devices will be shown the mobile version anyway, and people on PCs will see the desktop version. I’ve no idea why computers can’t ‘de-resolve’ a mobile link to show the desktop version but… not yet it seems.

Compare and contrast these links below (if you’re reading on a mobile both will take you to the mobile site but you can select the desktop version at the bottom of the Wikipedia page).
https://en.m.wikipedia.org/wiki/Wikipedia
https://en.wikipedia.org/wiki/Wikipedia

^for a more technical definition of canonical url see https://en.wikipedia.org/wiki/Canonical_link_element

• Downloading your old Twitter faves, setting up IFTTT to capture new ones

Table of Contents

  1. Capturing old favourites
  2. Capturing new favourites ‘going forwards’
  3. Useful background info

1. Capturing old favourites
To download your already-liked favourites do the following

  1. Log into Twitter
  2. Go to tweetbook.in and authorise it to access your account
  3. Select a time range, choose Favorites and create your PDF e-book of your favourited tweets

If you have as many favourites as I have (3,502 over 7 years, oops) you probably won’t be able to get them all in one go (2012 alone yielded a 134 page PDF!) but you have the option of trying to grab them all at once.

screen-shot-2016-10-16-at-13-53-41

Fig 1. Authorise Tweetbook.in with Twitter

screen-shot-2016-10-16-at-13-55-31

Fig 2. Pick a date range… or leave blank to pick all (it may fail if you have lots)

screen-shot-2016-10-16-at-14-08-59

Fig 3. Once your tweetbook is ready the green ‘Download’ button will appear

The output
Each page of the PDF has only a handful of tweets on it (it’s not very efficient) but the timestamp is hyperlinked so you can search for a tweet (Ctrl+F or Command+F to search within any document) and then find the original on Twitter.

Caution: I don’t know if it will display only public tweets that you’ve followed or, because you’ve logged in, if it can pick up any tweets from locked (private) accounts that you follow. Be aware that if you publishly share the contents you might be sharing tweets that people want kept private.

2. Capturing new favourites ‘going forwards’
You can use an IFTTT recipe so that every time you click favourite / like on a tweet it will be saved in some way of your choosing – for example you might use a Google spreadsheet to capture the tweet, or email it to yourself.

To do this… do this

  1. Log in to Twitter and Google Drive / Gmail*
  2. Visit IFTTT and create an account.
  3. This is an example of a recipe you can use:
    Twitter Likes (Favorites) to Google Spreadsheet (other recipe options available*)
  4. You’ll be taken through the steps of connecting your Google Drive as one ‘channel’ and your Twitter  account as another channel – this allows your Twitter account to save your favourites to a Google Drive spreadsheet directly (you don’t need to set that up, it happens automatically).
  5. Favourite a tweet then go and visit your Google Drive and you’ll find a new spreadsheet created with your favourite in. After 1,000 tweets the system will create a fresh spreadsheet (same name with ‘1’ appended, and so on).

*or Evernote, or some other capturing system, examples here and here

screen-shot-2016-10-16-at-16-54-01

3. Useful background info
Favouriting a tweet does not trap it permanently – if the original is deleted then you do not have a copy of it so ‘post-favouriting-processing’ would be necessary to capture it.

Other ways to capture a tweet include

  • taking a screenshot (it can be helpful to include its address / URL)
  • embedding it in a blog or Storify (in both cases subsequent deletion of the original won’t matter as your copy will remain)
  • use Freezepage to capture a copy of the ‘page’ on which the tweet appears (you need to use the tweet’s own address – you can find this in its timestamp – and remove the S from the httpS bit of the address

I’ve written a short post on ‘forensic’ use of Twitter (where you’re collecting someone’s tweets for legal reasons) but note that I’m not a lawyer so bear that in mind.

Further reading
Capturing web pages (remember a tweet IS a web page as it has its own address!) – Nightingale Collaboration

 

• Google cache (& other search engines): finding deleted pages or seeing your words on the page in colour

Search engines crawl and index webpages and save copies of them. This can be useful if a page has been deleted and you want to see what was last on it or if you need to take screenshots as evidence etc. Some search engines will also show you your search terms highlighted in different colours – this is useful in showing you the relevance of the page, ie whether or not your words are closely located in a paragraph or randomly scattered on the page.

When working at Diabetes UK I used Google’s cached pages for almost every search I ran until Google stopped providing this service to logged in users (!), though it’s still available if you log out, and on other search engines (see below). “If the page has the word diabetes in some side-bar or mentioned in passing (not useful to me, I want stuff about my search terms) this is immediately cued to me in a delightful display of colours.” (Source, my main blog).

1. Finding a deleted page on Google

Search for the page* but instead of clicking on the blue linked title in your search results click instead on the small green arrow next to the address (URL) and then choose the Cached option. If there’s no green arrow there might not be a cached version, but have a look at other cache options including the Wayback Machine.

Screen Shot 2016-03-12 at 09.42.22

If the page has since been deleted then the Cached version will give you the last-saved-by-Google option. Other search engines do similar things.

2. Seeing your search terms helpfully highlighted

Google no longer offers this to logged-in users (if you’re happy to try out browser add-ons and scripts there’s some advice in the link above) but other search engines do – Bing is one example. Here’s what a search result looks like and then what the highlighted page looks like.

Screen Shot 2016-03-12 at 09.50.42

Cached page below showing highlighted search terms –

Screen Shot 2016-03-12 at 09.55.44.png

3. Finding words onscreen on any page

Even without the useful highlighting of cached copies you can still find your search terms on any page (website, Word document, PDF, spreadsheet) by using the Find option.

  • On a PC it’s Ctrl+F (or Edit menu, Find)
  • On Macs it’s cmd+F
  • On iPhones you can find a word on Safari by clicking the URL to highlight it and type your word. Although this deletes the URL (it will return if you press Cancel, or you can copy it to paste back later) it will show you a range of options including, if you scroll down, any evidence that your word appears on that page. I’d agree that it’s not a very intuitive system.

*Search tips – obviously “words appearing on the page” is always a good search strategy but you can also restrict your search to a particular site, eg site:www.diabetes.org.uk or inurl:diabetes, you can even search for the web address itself, in the example given in (1) you would type http://www.diabetes.org.uk/kidneys into Google’s search bar.

 

Google spreadsheets timestamp – US to UK date format settings

tl;dr
File » Spreadsheet settings
In Locale (in general tab) change to United Kingdom, adjust time settings if necessary

If you use Google Forms the chances are that your data will go to a Google Spreadsheet, which you can view at your leisure. Each time someone fills in the form a new record (row) is created in the spreadsheet and the time they did it is added in the timestamp column with the date.

Google defaults to American settings so will generally show US date format (month/day/year).

If you want to make it UK date format (day/month/year) do the following: Open the form, click on File, then Spreadsheet settings… then change ‘United States’ to ‘United Kingdom’ and click the blue Save settings button. The spreadsheet will refresh and the timestamps will now be UK style.

1. File / Spreadsheet settings…

Screen Shot 2016-03-06 at 23.30.50

2. Locale – change United States to United Kingdom, click Save settings.

Screen Shot 2016-03-06 at 23.31.08

3. What the setting looks like with United Kingdom (I got carried away taking screenshots).

Screen Shot 2016-03-06 at 23.31.24

 

 

How to embed Google forms in WordPress.com websites

While you can link to a Google form and people can go off and fill it in, it’s quite nice to have it seamlessly appearing within your page using a bit of code that Google provides. Here’s how, for people using WordPress.com websites (it’s possibly the same for WordPress.org self-hosted ones but I’ve never used one so don’t know).

A note on iframes
While it’s true that WordPress.com sites don’t support iframes the iframe code that you’ll collect from your form is automatically converted to a WordPress-approved shortcode the minute you paste it into your blog, it’ll look like this.

iframeconverted

1. Find the embed code

1a. ‘Old’ Google Forms

Look for File in the menu

Screen Shot 2016-02-28 at 12.38.19

Click on it

Screen Shot 2016-02-28 at 12.39.27

Select Embed…

Screen Shot 2016-02-28 at 12.39.36

Copy the iframe src text for embedding.

1b. New Google Forms

They’ve hidden it in the Send button (took me a while to realise this cos I thought the send button would annoyingly ‘send’ a blank copy of the form to the email address associated with it, but it doesn’t).

Screen Shot 2016-02-28 at 12.43.33

The ‘Send via’ bit has several options, look for the angle brackets for the embed code

Screen Shot 2016-02-28 at 12.43.42

Once you click on the angle brackets icon the iframe link will appear for you to copy.

Screen Shot 2016-02-28 at 12.43.51

2. Insert the embed code

On WordPress.com you can use either the VISUAL (how the post will end up looking) or the TEXT (underlying html) to add or embed different types of links. For iframes you need to use the TEXT view to add it in, and when you return to the VISUAL there’s a high chance that a small square area will be blocked out which is where your form will appear when your website page is published.

Note that with future editing it’s entirely possible that WordPress.com will automatically convert the iframe code info into a Google shortcode – if you have one of those and want to move it around do this in the VISUAL and not the TEXT editing window.

From experience of using Blogger you would also add the iframe directly into the html editing window too.

As I’m using the free version of WordPress.com you’ll probably see some terrible advert below, sorry about that.