Migrating blog with comments to Nikola

My last blog post described how I migrated my web pages (except my Financial Markets blog) from my Institute’s server to my own server using Nikola. Legacy issues made the migration of the Financial Markets blog a more complex and time consuming process that I cover in this blog post. The Computing Blog on which this post appears is a totally different blog hosted only on WordPress, and there are no immediate plans to migrate this blog to my server.

I first set up my Financial Markets blog nearly two decades ago when the blogging ecosystem looked very different. My Institute ran a web server, and each faculty was given one folder on the server to host their web pages, but the Institute did not provide a blogging platform. My web pages did have CGI access, but running a data base like MySql was probably too much to ask for. So, I chose a very simple open source Perl blogging software called Blosxom. It consisted of a single CGI program written in Perl (less than 600 Source Lines of Code), and installation consisted simply of copying this file to an appropriate folder on the web server. Each blog post lived in a separate HTML file: the timestamp of the blog entry came from the timestamp of the file, the post title came from the first line of the file, and the file name became the slug from which the URL of the blog post was derived. The blosxom community created plugins to extend the functionality; these too were just Perl scripts. Particularly useful was a comments plugin which allowed readers to post comments. The plugin stored the comments as text files in the web server. Customization of the look and feel of the blog was accomplished by writing CSS. In short, the file system served as the database for this blogging software.

Converting blosxom blog posts into Nikola needed only a small Python script. Nikola does not use a database, but nor does it use the file system as a database. Nikola uses a separate file for each blog post, and stores the metadata in a YAML block at the top of the file. The job of my Python script was to read the blosxom metadata from the file system and create the YAML block from that. I also took the opportunity to add tags to each blog post. After I had manually created a mapping from each blog post to a list of tags, the Python script could add this too to the YAML metadata.

Moving an entire website to another domain should be unproblematic if all the internal links in the website are relative links. But I realized that I had quite frequently used absolute links while linking to images, documents and other files on the old server. My Python script therefore had a couple of lines to convert all absolute links to relative links so that they would work on the new domain as well.

Converting blosxom comments was a little more effort. The comments plugin stores comments as Perl expressions. Ironically, back in 2005, one of my reasons for choosing blosxom was that I was comfortable with Perl, but in the intervening years, I had shifted to R and Python, and now I wanted to write as little Perl as possible. My solution required only half a dozen lines of Perl to read the file, evaluate it as a Perl expression and output the resulting dictionary as a JSON string which could then be processed in Python.

In Nikola, I decided to use Isso for the comments for several reasons. First, it allowed me to self host the commenting system easily instead of relying on an external provider like Disqus. Second, Isso uses SQLite which has a small footprint (it is a server-less in-process database engine). Third, all that Isso requires at the client end is the embedding of a single small JavaScript file. Isso allows comments to be imported from a JSON file, and the import format is a list of dictionaries. All that I needed to do to convert blosxom comments to Isso was to remap the keys of the dictionary coming from the Perl script, consolidate them all into a single list of dictionaries, and output the whole beast as a JSON file for Isso to consume.

In my previous blog post I explained why I did not use any of Nikola’s ready made themes, and instead wrote my own templates from scratch. I followed the same approach for the blog as well. I began with a bare bones post.tmpl template to render the individual blog posts. This outputs a blog_header (the header common to all blog posts), blog_entry (the contents of the post using post.title and post.text) and blog_footer at the bottom of the blog. This template is all that is needed to create a blog provided we disable taxonomies, but then we do not get the main blog index, the archive index and the category index.

To get the various blog indices, we must enable taxonomies, and create several other templates. To keep these templates as minimalist as possible, I split the post.tmpl template into a set of macros for blog_header, blog_entry and blog_footer. Then I created the following relatively short and simple templates by taking the templates from one of the ready made themes and stripping them down to their bare essentials:

  • index.tmpl which loops through posts and inserts their contents using blog_entry to produce the main blog index.
  • list_post.tmpl which loops through posts and inserts links to these posts.
  • The main use of this template is in archive.tmpl and tag.tmpl which produce year-wise archives and tag-wise lists of blog posts. These two templates consist of a single line {% extends 'list_post.tmpl' %}.
  • list.tmpl which loops through years and inserts links to the index for each year to produce the main archive file.
  • tags.tmpl which loops through tags and inserts links to the index for each tag to produce the tag index.
  • comments_isso.tmpl needed only to enable isso comments
  • I also copied a few helper templates (index_helper.tmpl, archive_navigation_helper.tmpl and pagination_helper.tmpl) from one of the ready made templates. These helper templates are imported by many of the above templates, and I was too lazy to try to eliminate them or at least strip them down.

Another issue that came up during migration related to the WordPress mirror of my Financial Markets blog. This mirror was set up around 2008 in response to the complaints about the uptime and response time of the Institute web server. Even though the uptime issues disappeared when the Institute web server moved to a high quality web hosting service, the WordPress mirror was retained because the social media integration of WordPress allowed blog posts to be automatically posted on Twitter and Facebook, and many readers came to my blog from these feeds and posted comments on the WordPress mirror. The mirror linked to the Institute server for image and document files, and all these links needed to be updated.

My solution to this problem was crude but effective. I logged in to my WordPress site and exported everything from the site. The exported data consists of a single XML file. Since this was a text file, I edited all the links there using a Python script to created an edited XML file. I tested it out by importing the edited XML file into a temporary WordPress site, and verifying a sample of links. Then I updated the main WordPress site by first deleting all posts and pages (there were no tags, categories, media to delete), and then importing the edited XML file into the empty site. It worked flawlessly.

The end result is that my Financial Markets blog has been successfully migrated to my server, and has also been correctly mirrored at WordPress.

Leave a comment