The menace of link rot

I have been writing a blog on financial markets for nearly a dozen years now, and it is distressing to find that when I look up one of my older posts on that blog, many of the links in that post no longer work. This phenomenon is known as link rot.

Recently, there was a reorganization of our Institute web site and we had to set up a fair amount of URL redirection and rewriting to keep the links working. To check that everything was fine on my webpage, I ran LinkChecker to check that all internal links are working. Then out of curiosity, I ran a check on external links as well on my blog year by year:

  linkchecker --check-extern -r 1 --no-warnings --no-status URLyyyy

where URLyyyy is the url that displays all blog posts for year yyyy. What I found was that there were no broken links in the blog posts of 2016 and only one broken link in 2015. Beyond that, there were 10-20 broken links in most years (as many as 25-35 broken links in some years) out of about 300-500 links each year. This means that about 5% of all links break after about 2 years.

I could not see any pattern in the broken links. Links to government reports and court judgements from the official websites were broken presumably because they reorganized their websites and changed the URLs. Working papers and other academic papers from academic websites were broken too which makes me grateful to SSRN and ArXiv for providing permanent URLs for academic papers. Some links to media sites were broken because they went behind a paywall. I found many large financial institutions and exchanges in my list of broken links. It makes me wonder whether URL redirection and rewriting is so hard in practice for large organizations that spend millions of dollars on their websites.

What can we do about this? One solution is the Internet Archive Wayback Machine. I was happy to find that I could access some of my broken links using the Wayback Machine; at least, it confirmed that the link was correct at the time of the blog entry. Another solution that I have adopted (not as systematically as I would like) is to convert the web pages to PDF and save them locally when I link to them. This way, at least I would not lose access to these pages when link rot sets in. Unfortunately, that does not help my readers.


