I have long desired a personal accounting software that can be populated with data from downloaded bank statements and other source documents with minimal manual intervention.
GnuCash is a widely used open source software with enough capability to be used by small businesses. However, its ability to import data from other formats is very limited. It can import from
Quicken, but since
Quicken is not a proper double entry software, this facility is not particularly useful. So even if I convert a downloaded bank statement into accounting entries, I still have to manually input these into
GnuCash. That is more work than I am willing to do.
A couple of months back, I discovered Plain Text Accounting and in particular
Ledger. The brilliant idea behind this class of software is that the entire accounting data resides in a plain text file as a series of journal entries. Importing data into
Ledger is just a question of writing text files. It is hard to imagine anything simpler than this. The only reports that
Ledger produces are again plain text and parsing them is not difficult to anybody familiar with regular expressions.
Suddenly, personal accounting became something that could be done relatively painlessly. I do not participate in the underground (black money) economy and so all my income flows through my bank account. Most of my large expenses are paid for through credit cards, cheques or online transfers. All of this is visible in my bank statements that I can download from the web sites of my banks or in credit card statements that also come to me by email. If I buy or sell stocks or mutual funds, I get statements by email from the broker, or the mutual fund or from the depository. The only thing that is left out is the relatively small expenses incurred in cash. Since I am too lazy to manually record and keep track of these, I use the simple accounting policy of treating cash withdrawals as expenses even though they may not yet have been spent. This may appear strange to corporate accountants, but is not unreasonable. If a company gets 5,000 copies of its letterhead printed, it expenses them in the same period even though 4,000 of those may still be in stock. I so no reason to treat pieces of paper printed by the central bank any differently. Accountants who might want to object can be silenced with one magic word – “materiality”. In any case, my personal accounting is not subject to any accounting standards.
So I choose to base all my personal accounting only on the statements received from banks and other various financial firms. These statements come in many different formats – XLS, TXT, HTML and PDF. All these formats other than PDF can be read by a spreadsheet program which can then save them as CSV text files. PDF files are harder, but over a period of time, I have figured out that I can extract the text from these file and then use regular expressions to parse this text into CSV files. (After trying various PDF conversion tools I have found that good old
ghostscript does the best job: even if the PDF file is ill-formed or slightly damaged,
gs -dBATCH -dNOPAUSE -sDEVICE=txtwrite will produce usable text.)
I decided to use Python to manage the whole process from parsing bank statements to creating the journal entries to running ledger and generating balance sheets and other statements. CSV files can be read into a Python
pandas DataFrame which can be sliced and diced in many ways quite easily. It is quite straightforward to use a series of regular expressions to determine the journal entries for each transaction. For example, the regular expression
ATM WDL identifies cash withdrawals in one of my bank accounts. To identify online purchases , I use a long regular expression, a part of which reads as follows:
(INB Flipkart)|(INB Avenues)|(INB Indigo)|(INB Amazon)||(INB Makemytrip)|(INB EBAY)
Using such techniques and some manual intervention, each line item in each source document is turned into a journal entry. I choose to represent a journal entry as a Python
list whose first element is the date and narration of the entry and the remaining elements are the line items of the entry each of which consists of an account name and amount (a Python
tuple). The whole journal is simply a list of such journal entries. This internal representation will be turned into a
Ledger input text file at a later stage.