Occasional workaround for reading US websites which are skittish about EU visitors, GDPR and cookies

tl;dr version
Search for the URL itself, read the cached copy.

Audio version of this post:

Recently there has been a spate of American news sites returning a page that says the content isn’t available to me since I’m in the UK / EU and, because I’m subject to some unspecified horror to do with the GDPR and cookies, the website is worried about me seeing it and hopes I might just go away.

About 90 per cent of the time this problem is rapidly solved by searching for and reading Google’s cache of the page. The appearance may be a little different but the text is usually there and perfectly readable. Here’s an example of how to do this.

Yesterday I wanted to read the awful story about a young black woman who died after it was assumed she’d not be able to pay for the ambulance service that she needed. Her mother had found her slumped in the bath after she’d collapsed with a suspected stroke. She’d given birth via C section a few days before.

Here’s the address I clicked on (via a tweet)

https://www.wpbf.com/article/mom-of-woman-who-died-claims-medics-assumed-daughter-couldn-t-afford-ambulance-ride/22558170

On clicking the link the page said

Screenshot 2018-07-28 11.04.50.png
Fig 1. “Sorry, this content is not available in your region.”

Try this – it doesnt always work though
The next stage is to copy that address / URL (the wpbf.com bit next to the green padlock) – the quickest way to do that is to put the cursor into that address bar, it should automatically select the URL but if not Ctrl+A will do that. Then Ctrl+C to copy and open a new tab with your preferred search engine and paste (Ctrl+V) into the search bar and search [see also: handy keyboard shortcuts]

Screenshot 2018-07-28 11.04.21.png
Fig 2. Search results returned after searching for the web address / URL itself

Ignore the top stories option. You might just about be able to make out a tiny little green arrowhead pointing downwards to the right of the green URL for this search result. That’s where Google hides the cache of its pages. Here’s a close-up.

screen-shot-2016-10-16-at-21-24-12
Fig 3. Where to find cached copies of pages, if available

Clicking on the green arrow will bring up a menu saying ‘Cached’ and clicking on that usually, but not always, bring up the page you want – it did in this case too.

Screenshot 2018-07-28 11.30.21
Fig 4. In this instance the cached copy was available and readable

The entire text is visible but for copyright reasons I’ll leave it at that. Here’s the link if you want to read it yourself, it’s a sobering read.

This is a very useful and more widely applicable trick
There are other cases (*cough*) where content isn’t shown to you, for all sorts of un-GDPR related reasons. It is nearly always worth checking the cached version first before either admitting defeat, asking a friend for a copy or reading a different newspaper’s story.

For the exceptionally patient
At the bottom of Fig 2 there’s a paragraph of text beneath the green URL and the green padlock. Google can nearly always read the page (whether there’s a cached version or not) even if you can’t. If you search for a phrase that appears there (put it in ” ” marks when searching) then Google will show that phrase in the search results, often in context which means it may show other bits of surrounding text. Frankly it takes ages but it may be possible (I’ve done it to uncover and reference a quote for work once) to work your way through very slowly and uncover a large portion or even the entirety of the otherwise hidden text.

Further reading
Google cache (& other search engines): finding deleted pages or seeing your words on the page in colour (this blog)

Advertisements

• Google is fiddling about with mobile search results, using ‘AMP’. Not sinister, bit annoying though

tl;dr
You might have noticed ‘amp’ appearing in mobile search engine results on Google. This began in Oct 2015 and makes mobile pages load much faster (effectively loaded from Google’s cached copy), but the page looks like it’s from Google, quite a few users who’ve noticed it have found it puzzling and it’s a bit fiddly to share the ‘real’ address. Your device hasn’t been hacked and it’s not particularly sinister but lots of web publishers are a bit ‘hmm’ about it and feel Google’s put its metaphoric bag on the seat next to it and taken up a bit more space.

Recently I was mildly alarmed / irritated to notice that a page I’d failed to open on iPhone Safari (that had nothing to do with Google) somehow had ‘Google’ at the top of the page, instead of ‘The Guardian’, and the URL had ‘amp’ in it – I briefly wondered if I’d been hacked or something exciting like that, but it turns out – no, nothing quite that sinister but this new amp thing is annoying plenty of people, though when it does work it can actually make pages load ridiculously fast (which is great). AMP stands for Accelerated Mobile Pages.

Before I discovered that, and while trying to open the Guardian article I retraced my steps which showed me that ‘AMP’ was appearing in a few of my search results, next to a lightning bolt, and I found that it wasn’t always that straightforward to remove it from the address, to get the right link*, because it seemed pretty well embedded into the address.

I’ve just recreated the experience, with an example that turned out to be fairly straightforward to edit (I was hoping to find the one that wasn’t but couldn’t remember what I’d originally searched in November).

A more recent mobile search was for the frequency of the chiltern radio beacon╚ and the search results included the following amp-containing URL https://www.google.co.uk/amp/s/amp.theguardian.com/media/2008/jun/26/gcapmedia.radio, after deleting the bits in bold gave https://theguardian.com/media/2008/jun/26/gcapmedia.radio  which worked fine. Note that if the website doesn’t support https then you might have to delete that bit too to make it work. Or use a different search engine! I’m reluctant though, on principle 😉

In the replies to Deb’s tweet above someone has highlighted an applet that will return ‘canonical’ (for purposes of argument this just means correct^) URLs though I’m afraid it’s github which is beyond my technical skill.

Ardan (according to their bio) works for Google search.

Further reading
Google Helping Mobile Publishing? Some Publishers Are Not So Sure New York Times (1 January 2017)

Google will change AMP display to make it easier to find & share publishers’ direct URLs Change will be to the header in AMP content, expected in early 2017  Search Engine Land (21 December 2016)

Footnotes

╚ If you are not far from the Chiltern radio beacon (a non-directional radio beacon / aviation navigation aid) you can hear it emitting its Morse callsign (C -.-. H …. T – for Chiltern) on 277MHz. I once found it by accident and was intrigued, wanted to find it again. It also features in a song.

*Related to this – if you’re sharing a link to Wikipedia from your phone please remember to delete the m otherwise you send readers on PCs to the mobile version (they can select the desktop version by scrolling to the end of the page, which is a bit of a faff). If you share the non-mobile version then people on mobile devices will be shown the mobile version anyway, and people on PCs will see the desktop version. I’ve no idea why computers can’t ‘de-resolve’ a mobile link to show the desktop version but… not yet it seems.

Compare and contrast these links below (if you’re reading on a mobile both will take you to the mobile site but you can select the desktop version at the bottom of the Wikipedia page).
https://en.m.wikipedia.org/wiki/Wikipedia
https://en.wikipedia.org/wiki/Wikipedia

^for a more technical definition of canonical url see https://en.wikipedia.org/wiki/Canonical_link_element

• Google cache (& other search engines): finding deleted pages or seeing your words on the page in colour

Search engines crawl and index webpages and save copies of them. This can be useful if a page has been deleted and you want to see what was last on it or if you need to take screenshots as evidence etc. Some search engines will also show you your search terms highlighted in different colours – this is useful in showing you the relevance of the page, ie whether or not your words are closely located in a paragraph or randomly scattered on the page.

When working at Diabetes UK I used Google’s cached pages for almost every search I ran until Google stopped providing this service to logged in users (!), though it’s still available if you log out, and on other search engines (see below). “If the page has the word diabetes in some side-bar or mentioned in passing (not useful to me, I want stuff about my search terms) this is immediately cued to me in a delightful display of colours.” (Source, my main blog).

1. Finding a deleted page on Google

Search for the page* but instead of clicking on the blue linked title in your search results click instead on the small green arrow next to the address (URL) and then choose the Cached option. If there’s no green arrow there might not be a cached version, but have a look at other cache options including the Wayback Machine.

Screen Shot 2016-03-12 at 09.42.22

If the page has since been deleted then the Cached version will give you the last-saved-by-Google option. Other search engines do similar things.

2. Seeing your search terms helpfully highlighted

Google no longer offers this to logged-in users (if you’re happy to try out browser add-ons and scripts there’s some advice in the link above) but other search engines do – Bing is one example. Here’s what a search result looks like and then what the highlighted page looks like.

Screen Shot 2016-03-12 at 09.50.42

Cached page below showing highlighted search terms –

Screen Shot 2016-03-12 at 09.55.44.png

3. Finding words onscreen on any page

Even without the useful highlighting of cached copies you can still find your search terms on any page (website, Word document, PDF, spreadsheet) by using the Find option.

  • On a PC it’s Ctrl+F (or Edit menu, Find)
  • On Macs it’s cmd+F
  • On iPhones you can find a word on Safari by clicking the URL to highlight it and type your word. Although this deletes the URL (it will return if you press Cancel, or you can copy it to paste back later) it will show you a range of options including, if you scroll down, any evidence that your word appears on that page. I’d agree that it’s not a very intuitive system.

*Search tips – obviously “words appearing on the page” is always a good search strategy but you can also restrict your search to a particular site, eg site:www.diabetes.org.uk or inurl:diabetes, you can even search for the web address itself, in the example given in (1) you would type http://www.diabetes.org.uk/kidneys into Google’s search bar.