Making the mundane interesting (a work tale)

Published 2024-11-26

tag(s): #programming #emacs

At work we have this newish system that ingests data daily, using different types of files as input.
I setup a couple of processes in my first months here, that make sure the same data that goes into the old school RDBMS also flows to the new, fancy, cloud setup.

And now that this thing has been running for a while and seems stable, it was time to also migrate the historical data. A little over 10 years of it.
After checking with everyone involved, it was determined that the best way to do it was to generate input files in the same format than the daily ones.

#1: Little proud moment

I exported the tables from the old system, and the new one. The idea was to remove from the generated files data that was already ingested and superseded.
And I had a little proud moment with Datum, since I was able to export a bunch of big files[1] without a hiccup. It even took less time than I would have expected.

For the record, the real winners here are pyodbc and Python's csv module, since they are doing all the work.
But I have been programming long enough to know that Datum's own code could have screwed the work of either module, or added its own inefficiencies. And it didn't. So I am happy that I was wise enough to get out of the way.

#2: The return of Common Lisp

A couple years ago, at Starz, I started replacing Python with Common Lisp. Purely for one-off scripts or tools that I built for my own use.
It was a great experience!!! And I became a little more proficient with CL. I feel I really learn a language only when I use it to solve "real" problems. It doesn't matter how mundane the problem seems at first sight, there's always some hidden challenge. Even more so when you are a beginner.

In this case, I started the diffing scripts using Python, which is the main language for my team at work. And it was fine, but while I was letting the first of these multi-GB files process, I figured I could give CL a try.[2]

So from the third file onward, the diffs and writing of cleaned up files ran in CL.
And it was awesome. First, because I love Lisps :) There's something to the simplicity of their syntax that hits my brain in just the right way.
And also, it was quite a bit faster than the Python version. But please don't draw any big conclusions from this, as neither script was particularly optimized (since they are one-offs).

#3: A toolbox well equipped

The weekend after I had to deal with Azure Blob Storage for the first time, I wrote a quick wrapper for az, the proto-package azcli. It supports listing files, uploads, and downloads.
It is pretty crude, but ever since I wrote it, I rarely needed the browser to work with blob storage. So it is a win in my book.

And it became handy once again, as I was wrapping up my work with the (in)famous files right around the end of the day, and working from home. Instead of babysitting each upload, I setup Emacs to dolist with all the files, and a reasonable sleep time. The uploads are async, I didn't want to choke either Emacs, or the AZ CLI, or my connection. After all, it was going to run unattended, at night!
It was done after a couple hours :)

I know we Emacsers have a tendency to shave too many yaks, so I reckon there is a bit of an art to not overdo it.
But lately, the packages and functions I have for my own use have come super handy in random tasks, just like this one. So yay for yak shaving.

Footnotes
  1. The biggest one was 9 GB.
  2. I guess I could have started working in another task, but this was less of a context switch that picking something completely different.

Back to top

Back to homepage