Published 2024-11-13
tag(s): #failures #programming
I am waiting for a deploy to finish, and I completed a task while waiting, but I don't think I can complete another one before it is done, and then remembered that I have a website that I can use to complain! Yay, venting!
Will preface this rambling by saying, I have been on the other end of this: you sometimes make
decisions that later turn out to have really bad trade offs. Sometimes things are
forced on you, use this tool, because we paid for it/I used it before/it has support
.
And sometimes you make a decision without even realizing and then it is too late to change.
I feel I have to have a preface, because I know I can be cruel when I criticize something.[1] That doesn't mean I don't sympathize or understand the owner(s) of this mess.
This is a deploy for database tools. The way the project is structured, you have scripts for
your objects: tables, procedures, etc. One per entity. There also separate scripts to insert
data in reference tables and configuration tables that setup processes in the DB.
I am not sure I agree with separating the structure declaration from the data. But I don't
think that is "bad", just a matter of taste.
But then, for the sake of the deploy tool, you have to build a script that has all the things
you are adding in your release, in one gigantic .sql
file. Adding some crafty
code to delete older versions of the same tables and whatnot.
The second part is somewhat common, scripts you can re-run as many times as needed. But the duplication is so annoying. If you want to change say a column name, you have to remember to change it in both places. Which hey, it is my fault for forgetting, and probably if I worked in this codebase more often, I would remember.[2]. But having scripts that easily run into the 500 lines territory, and keeping them in sync with at least 4 other files (usually more) during development, is pretty messy.
OK, you have your scripts ready, you think they are in sync. Commit & push, go to GitHub and
run the action to deploy[3]. Now you have to wait until it finishes
running.
As spoiled in the post title, a deploy takes between 25 and 45 minutes to complete, although
the one that prompted this post finished a few minutes ago at 50 mins. They do get slower over
time.
Lower numbers mean that something is wrong in the script, but it can fail until literally the
last minute. And then you have to scour the logs, which are super noisy, to identify the
actual error, which is buried in between tooling messages about exit codes and "error"
messages that simply say "something went wrong" (thank heavens for M-x occur
to
deal with this part).
Why it takes so long? Because it builds a whole database from scratch. Which is good in some ways...well, only in one sense: testing your changes in a clean state. But the nature of the tooling means that if someone else adds a release script, and you didn't pull their changes into yours, your deploy will fail after 30+ minutes running, even if everything is correct.
Which is why I was in such a bad mood just now :) and why I am venting now. Because I lost 30 minutes just waiting, only because other team member added a script and gave it a lower version number than mine. But he had no way of knowing I have a larger version number release script in my branch. And if two people are working at the same time, whichever merges first, will screw up the merge to the main branch for the other. Which I think is crazy.
I have no idea how they arrived to this solution.
I reckon having your own database to test is handy, and even somewhat impressive from a tech
POV, since it recreates everything from zero.
But as more and more release scripts are added, deploying becomes so slow as to negate the
benefit. And you would think by now this would have been addressed by the team that owns this
project, but here we are.
And also as more people work in the project, we keep stepping on each other's toes as scripts
are added, with no way of knowing other than pulling from each other's code constantly. And if
some merge were to go wrong, or a script with an error is merged (which is wrong and bad, but
it can, and has, happened) then that breaks the deployments for everyone else.
Which you will find out about after 25+ minutes of waiting, if you can interpret the
error messages in the logs and realize this one wasn't on you.
I hate how this works.[4]