My last blog post described how I migrated my web pages (except my Financial Markets blog) from my Institute’s server to my own server using Nikola. Legacy issues made the migration of the Financial Markets blog a more complex and time consuming process that I cover in this blog post. The Computing Blog on which this post appears is a totally different blog hosted only on WordPress, and there are no immediate plans to migrate this blog to my server.
I first set up my Financial Markets blog nearly two decades ago when the blogging ecosystem looked very different. My Institute ran a web server, and each faculty was given one folder on the server to host their web pages, but the Institute did not provide a blogging platform. My web pages did have CGI access, but running a data base like MySql
was probably too much to ask for. So, I chose a very simple open source Perl
blogging software called Blosxom
. It consisted of a single CGI
program written in Perl
(less than 600 Source Lines of Code), and installation consisted simply of copying this file to an appropriate folder on the web server. Each blog post lived in a separate HTML
file: the timestamp of the blog entry came from the timestamp of the file, the post title came from the first line of the file, and the file name became the slug from which the URL of the blog post was derived. The blosxom
community created plugins to extend the functionality; these too were just Perl
scripts. Particularly useful was a comments plugin which allowed readers to post comments. The plugin stored the comments as text files in the web server. Customization of the look and feel of the blog was accomplished by writing CSS. In short, the file system served as the database for this blogging software.
Converting blosxom
blog posts into Nikola
needed only a small Python
script. Nikola
does not use a database, but nor does it use the file system as a database. Nikola
uses a separate file for each blog post, and stores the metadata in a YAML
block at the top of the file. The job of my Python
script was to read the blosxom
metadata from the file system and create the YAML
block from that. I also took the opportunity to add tags to each blog post. After I had manually created a mapping from each blog post to a list of tags, the Python
script could add this too to the YAML
metadata.
Moving an entire website to another domain should be unproblematic if all the internal links in the website are relative links. But I realized that I had quite frequently used absolute links while linking to images, documents and other files on the old server. My Python
script therefore had a couple of lines to convert all absolute links to relative links so that they would work on the new domain as well.
Converting blosxom
comments was a little more effort. The comments plugin stores comments as Perl
expressions. Ironically, back in 2005, one of my reasons for choosing blosxom
was that I was comfortable with Perl
, but in the intervening years, I had shifted to R
and Python
, and now I wanted to write as little Perl
as possible. My solution required only half a dozen lines of Perl
to read the file, evaluate it as a Perl
expression and output the resulting dictionary as a JSON
string which could then be processed in Python
.
In Nikola, I decided to use Isso
for the comments for several reasons. First, it allowed me to self host the commenting system easily instead of relying on an external provider like Disqus. Second, Isso
uses SQLite
which has a small footprint (it is a server-less in-process database engine). Third, all that Isso
requires at the client end is the embedding of a single small JavaScript
file. Isso
allows comments to be imported from a JSON
file, and the import format is a list of dictionaries. All that I needed to do to convert blosxom
comments to Isso
was to remap the keys of the dictionary coming from the Perl
script, consolidate them all into a single list of dictionaries, and output the whole beast as a JSON
file for Isso
to consume.
In my previous blog post I explained why I did not use any of Nikola’s ready made themes, and instead wrote my own templates from scratch. I followed the same approach for the blog as well. I began with a bare bones post.tmpl
template to render the individual blog posts. This outputs a blog_header
(the header common to all blog posts), blog_entry
(the contents of the post using post.title
and post.text
) and blog_footer
at the bottom of the blog. This template is all that is needed to create a blog provided we disable taxonomies, but then we do not get the main blog index, the archive index and the category index.
To get the various blog indices, we must enable taxonomies, and create several other templates. To keep these templates as minimalist as possible, I split the post.tmpl
template into a set of macros for blog_header
, blog_entry
and blog_footer
. Then I created the following relatively short and simple templates by taking the templates from one of the ready made themes and stripping them down to their bare essentials:
index.tmpl
which loops through posts and inserts their contents usingblog_entry
to produce the main blog index.list_post.tmpl
which loops through posts and inserts links to these posts.- The main use of this template is in
archive.tmpl
andtag.tmpl
which produce year-wise archives and tag-wise lists of blog posts. These two templates consist of a single line{% extends 'list_post.tmpl' %}
. list.tmpl
which loops through years and inserts links to the index for each year to produce the main archive file.tags.tmpl
which loops through tags and inserts links to the index for each tag to produce the tag index.comments_isso.tmpl
needed only to enableisso
comments- I also copied a few helper templates (
index_helper.tmpl
,archive_navigation_helper.tmpl
andpagination_helper.tmpl
) from one of the ready made templates. These helper templates are imported by many of the above templates, and I was too lazy to try to eliminate them or at least strip them down.
Another issue that came up during migration related to the WordPress mirror of my Financial Markets blog. This mirror was set up around 2008 in response to the complaints about the uptime and response time of the Institute web server. Even though the uptime issues disappeared when the Institute web server moved to a high quality web hosting service, the WordPress
mirror was retained because the social media integration of WordPress
allowed blog posts to be automatically posted on Twitter
and Facebook
, and many readers came to my blog from these feeds and posted comments on the WordPress
mirror. The mirror linked to the Institute server for image and document files, and all these links needed to be updated.
My solution to this problem was crude but effective. I logged in to my WordPress site and exported everything from the site. The exported data consists of a single XML
file. Since this was a text file, I edited all the links there using a Python
script to created an edited XML
file. I tested it out by importing the edited XML
file into a temporary WordPress site, and verifying a sample of links. Then I updated the main WordPress site by first deleting all posts and pages (there were no tags, categories, media to delete), and then importing the edited XML
file into the empty site. It worked flawlessly.
The end result is that my Financial Markets blog has been successfully migrated to my server, and has also been correctly mirrored at WordPress.