Published 2024-11-26
tag(s): #programming #emacs
At work we have this newish system that ingests data daily, using different types of files as
input.
I setup a couple of processes in my first months here, that make sure the same data that goes
into the old school RDBMS also flows to the new, fancy, cloud setup.
And now that this thing has been running for a while and seems stable, it was time to also
migrate the historical data. A little over 10 years of it.
After checking with everyone involved, it was determined that the best way to do it was to
generate input files in the same format than the daily ones.
I exported the tables from the old system, and the new one. The idea was to remove from the
generated files data that was already ingested and superseded.
And I had a little proud moment with Datum,
since I was able to export a bunch of big files[1] without a hiccup. It
even took less time than I would have expected.
For the record, the real winners here are pyodbc
and Python's csv
module, since they are doing all the work.
But I have been programming long enough to know that Datum's own code could have screwed the
work of either module, or added its own inefficiencies. And it didn't. So I am happy that I
was wise enough to get out of the way.
A couple years ago, at Starz, I started replacing Python with Common Lisp. Purely for one-off
scripts or tools that I built for my own use.
It was a great experience!!! And I became a little more proficient with CL. I feel I really
learn a language only when I use it to solve "real" problems. It doesn't matter how mundane
the problem seems at first sight, there's always some hidden challenge. Even more so when you
are a beginner.
In this case, I started the diffing scripts using Python, which is the main language for my team at work. And it was fine, but while I was letting the first of these multi-GB files process, I figured I could give CL a try.[2]
So from the third file onward, the diffs and writing of cleaned up files ran in CL.
And it was awesome. First, because I love Lisps :) There's something to the
simplicity of their syntax that hits my brain in just the right way.
And also, it was quite a bit faster than the Python version. But please don't draw any big
conclusions from this, as neither script was particularly optimized (since they are one-offs).
The weekend after I had to deal with Azure Blob Storage for the first time, I wrote a quick
wrapper for az
, the
proto-package azcli. It
supports listing files, uploads, and downloads.
It is pretty crude, but ever since I wrote it, I rarely needed the browser to work with blob
storage. So it is a win in my book.
And it became handy once again, as I was wrapping up my work with the (in)famous files right
around the end of the day, and working from home. Instead of babysitting each upload, I setup
Emacs to dolist
with all the files, and a reasonable sleep time. The uploads are
async, I didn't want to choke either Emacs, or the AZ CLI, or my connection. After all, it was
going to run unattended, at night!
It was done after a couple hours :)
I know we Emacsers have a tendency to shave too many yaks, so I reckon there is a bit of an
art to not overdo it.
But lately, the packages and functions I have for my own use have come super handy in
random tasks, just like this one. So yay for yak shaving.