My approach to cloud services is “Head in the cloud, feet on the ground”. By “head in the the cloud” I mean the effective use of cloud services to allow seamless access across multiple devices from anywhere. By “feet on the ground”, I mean a fully functional offline backup of the cloud so that all old data is usable even if the cloud is not available.
Head in the cloud
In my experience, backing things up to the cloud is the easier part of my “Head in the cloud, feet on the ground” strategy.
- I do not want to lose important data if my hard disk crashes or I lose my phone.
- I might need some of this data when I am not carrying my laptop with me.
- I might want to work on the same data from multiple devices.
Using multiple cloud providers ensures some degree of redundancy so that if one provider fails or imposes unacceptable terms, I can quickly shift to another.
Sensitive files are always stored encrypted. I use 7-Zip to compress lots of files into a single archive and encrypt the archive. Both the cloud password and the archive password are quite strong (say 20 letters, numbers and punctuation characters with about 100 bits of entropy). These two layers of protection should be sufficient to thwart a casual intruder, though they would not of course protect my data against the resources of a nation state. But then a nation state would probably find ways to get hold of my laptop and grab the files from there.
Feet on ground
“Feet on ground” means a fully functional offline backup of the cloud. I regard this as absolutely critical for several reasons:
* Many cloud services die: for example, see this “virtual graveyard” of tens of discontinued products from just one provider (Google). The death of Google Reader in particular was instrumental in convincing me of the need for a “feet on ground” strategy. As I write this post, another cloud service that I used to rely on is being shut down: Yahoo Pipes.
- Many cloud services change their terms of service: privacy agreements may be changed; free services may become paid or freemium services.
Data in the cloud can be accessed only using the interface supported by the provider. It is hard to run tools of our choice on the cloud data unless the (a) the API allows it, and (b) we are willing to expend the time and effort to understand that API.
There are situations where my internet connectivity is poor or non existent and downloading large amounts of data from the cloud becomes impractical.
“Feet on ground” is by far the harder half of my “Head in the cloud, feet on the ground” strategy. It is not enough to have a complete data dump in a format which the cloud provider (or even a competing cloud provider) understands. “Feet on ground” means having data in a format that I can use immediately. In other words, I want offline data and an offline client that gives me convenient access to the data. It also means that the data backup must be done quite frequently (at least once a day).
To accomplish this objective, we have to find suitable methods of offline access to each cloud service that we use. Each service is different and needs its own method.
Offline email access
In this post, I will explain how I implement this strategy for one cloud service – email. Email is critical for several reasons:
- Email by itself accounts for a major part of my dependence on the cloud.
A lot of other things can be converted into email. For example, SMS messages, call logs, WhatsApp chats can all be backed up from my phone as email. In a subsequent post, I will describe how I convert blog feeds into email.
At the same time, the online+offline email solution is relatively easy because:
- There are good email cloud services from all the major cloud service providers.
There are plenty of good desktop email clients.
For example, at one time, I used Mozilla Thunderbird as my email client (set up to never delete mails from the server) and as long as I ran
Thunderbird daily, I had a complete local backup of all my email. I did not have to use
Thunderbird as my regular email client; I simply had to run it regularly to download mail. At any point, if the cloud was not accessible, I could launch
Thunderbird and access all my old mail. The same method would work with any other desktop email client on any platform (Linux, Mac or Windows).
But running a GUI email client is a pain if one is not actually using it as an email client. AFAIK, it is not possible to run
Thunderbird or any other GUI email client in the background (headless) to just download the mails and exit. Moreover, I found it difficult to remain within the size limits of
Thunderbird’s mbox file format and decided to move to a maildir based mail store. Maildir has practically no size limit because each email is in a separate file unlike the mbox format where all emails in a single folder are physically in a single massive mbox file.
The solution that I have adopted works in Linux and might work in Mac as well, but would be non trivial to replicate on Windows. I now use offlineimap to sync email between the cloud and an offline maildir repository. This is a pure command line tool that runs in the background automatically without any manual intervention and downloads emails regularly. Many of the Linux GUI desktop email clients (like Evolution and Balsa) will happily read email from a local maildir folder and so this provides a complete solution.
In most Linux distributions,
offlineimap is available in the repositories and can be easily installed using your favourite package manager. Setting up
offlineimap is also quite easy; it ships with a sample minimal configuration file which can be quickly edited to suit one’s needs:
# Sample minimal config file. Copy this to ~/.offlineimaprc and edit to get started fast. [general] accounts = Test [Account Test] localrepository = Local remoterepository = Remote [Repository Local] type = Maildir localfolders = ~/Test [Repository Remote] type = IMAP remotehost = examplehost remoteuser = jgoerzen
offlineimap has downloaded the mail, many desktop email clients can read it. My current choice of email client is very minimalist – notmuchmail with either alot or emacs as the front end. The primary reason for this choice is the fantastic search capability of
notmuchmail. In my experience, it is far superior to that of any of the webmail services and any of the desktop email clients that I have used. It also scales very well: I have been using it on a maildir folder with about 150,000 emails (≈ 40 GB) without any difficulties, and it is reported to be working nicely even with millions of emails (for example, here is a user with an antiquated Pentium 4 machine managing 8 million emails with notmuchmail). An important benefit of a command line tool to download email is that I am not wedded to any particular email client. Years from now, if I decide to use a different client, I can do so by merely pointing it to this local maildir folder. In fact, I can use multiple email clients on the same folder at the same time.
Another advantage that I have gained from the offline email solution is that I have a single searchable repository of all my email from many different email addresses spanning over two decades including a lot of email from pre-webmail email servers.
The offline email solution makes possible a major privacy protection that I have not yet implemented. Since I have a complete offline email repository, it is possible to delete all but the last few months of email from the cloud. The idea is that if the webmail is compromised, the hacker does not get access to many years’ of email. By limiting the amount of email that a hacker can read, we also limit his ability to compromise other accounts by using social engineering techniques.
Offline Access to Other Cloud Services
Offline access to some cloud services like calendar is relatively easy. For example, Google Calendar provides a private address for each calendar that is created there. Using this private address, any command line downloading tool like cURL or wget or the Python Requests module can download the calendar as an ics file. Almost all major calendar software support the iCalendar format and therefore setting up a “Head in the cloud, feet on the ground” calendar is quite easy.
But some cloud services are much more challenging. In a subsequent blog post, I will talk of building a “feet on ground” feed reader – a process that requires putting together a number of tools to work together.