45 minute deploys and duplicate scripts

Published 2024-11-13

tag(s): #failures #programming

I am waiting for a deploy to finish, and I completed a task while waiting, but I don't think I can complete another one before it is done, and then remembered that I have a website that I can use to complain! Yay, venting!

Will preface this rambling by saying, I have been on the other end of this: you sometimes make decisions that later turn out to have really bad trade offs. Sometimes things are forced on you, use this tool, because we paid for it/I used it before/it has support. And sometimes you make a decision without even realizing and then it is too late to change.

I feel I have to have a preface, because I know I can be cruel when I criticize something.[1] That doesn't mean I don't sympathize or understand the owner(s) of this mess.

The duplication

This is a deploy for database tools. The way the project is structured, you have scripts for your objects: tables, procedures, etc. One per entity. There also separate scripts to insert data in reference tables and configuration tables that setup processes in the DB.
I am not sure I agree with separating the structure declaration from the data. But I don't think that is "bad", just a matter of taste.

But then, for the sake of the deploy tool, you have to build a script that has all the things you are adding in your release, in one gigantic .sql file. Adding some crafty code to delete older versions of the same tables and whatnot.

The second part is somewhat common, scripts you can re-run as many times as needed. But the duplication is so annoying. If you want to change say a column name, you have to remember to change it in both places. Which hey, it is my fault for forgetting, and probably if I worked in this codebase more often, I would remember.[2]. But having scripts that easily run into the 500 lines territory, and keeping them in sync with at least 4 other files (usually more) during development, is pretty messy.

The deploy process

OK, you have your scripts ready, you think they are in sync. Commit & push, go to GitHub and run the action to deploy[3]. Now you have to wait until it finishes running.
As spoiled in the post title, a deploy takes between 25 and 45 minutes to complete, although the one that prompted this post finished a few minutes ago at 50 mins. They do get slower over time.

Lower numbers mean that something is wrong in the script, but it can fail until literally the last minute. And then you have to scour the logs, which are super noisy, to identify the actual error, which is buried in between tooling messages about exit codes and "error" messages that simply say "something went wrong" (thank heavens for M-x occur to deal with this part).

Why it takes so long? Because it builds a whole database from scratch. Which is good in some ways...well, only in one sense: testing your changes in a clean state. But the nature of the tooling means that if someone else adds a release script, and you didn't pull their changes into yours, your deploy will fail after 30+ minutes running, even if everything is correct.

Which is why I was in such a bad mood just now :) and why I am venting now. Because I lost 30 minutes just waiting, only because other team member added a script and gave it a lower version number than mine. But he had no way of knowing I have a larger version number release script in my branch. And if two people are working at the same time, whichever merges first, will screw up the merge to the main branch for the other. Which I think is crazy.

But... why?

$Ryan Reynolds looking puzzled, then says \$

I have no idea how they arrived to this solution.

I reckon having your own database to test is handy, and even somewhat impressive from a tech POV, since it recreates everything from zero.
But as more and more release scripts are added, deploying becomes so slow as to negate the benefit. And you would think by now this would have been addressed by the team that owns this project, but here we are.

And also as more people work in the project, we keep stepping on each other's toes as scripts are added, with no way of knowing other than pulling from each other's code constantly. And if some merge were to go wrong, or a script with an error is merged (which is wrong and bad, but it can, and has, happened) then that breaks the deployments for everyone else.
Which you will find out about after 25+ minutes of waiting, if you can interpret the error messages in the logs and realize this one wasn't on you.

Conclusion

I hate how this works.[4]

Footnotes

We'll see by the end how mean I really was.
Or add some Emacs tooling to handle it for me. Yes I've thought about it, but like I said, it don't work on this often enough.
Or if you are a man of culture, kick off the workflow using the GH CLI from Emacs 😉
OK I wasn't THAT cruel. I think because I really like the team that owns the project, they are cool people.

Share your thoughts (via email)

Go to homepage