Messing up at work @ pablobm

30 April 2017 by Pablo Brasero Moreno

Messing up at work

#work #failure

Many years ago, I wrote (bad) code that sent a single marketing email, repeatedly, to addresses in a subscribers list. Imagine your inbox filling up with copies of the same email because some idiot got their code wrong. On the bright side, I realised of what I had done quickly enough that most recipients only received 4 copies of the email (I think). At the time, I had 3 years of experience as a backend web developer.

How did you even manage that?

These were the ingredients:

A web application that, among other things, sent marketing emails.
At an admin’s request, a background process was created that sent all emails.
The process would loop through a subscribers list and send an email to each entry.
The background processing library would re-run a job if an exception was raised at any point during its execution.
At least one address in the subscribers list was malformed enough to cause the email library to raise an exception.

So there you are. Email sending job runs, fails halfway through the list, tries again from the very beginning. As a result, all addresses that got an email will get another one. Neat (not).

What should have happened instead (technically)

Some techniques that would avoided this spring to mind:

Make sure to rescue exceptions when sending each email.
Create one background process for each email in the list, so that one failure doesn’t affect the rest.
Rescue exceptions for the whole thing just in case.

And of course, write specs/tests to make sure things are working the way you expect. All pretty reasonable, really.

Whose fault was it?

It’s natural to feel bad when you mess up building software. However, software engineering is fraught with difficulty and cannot be one person’s job. When this happened, I had written all code myself as I was pretty much the Software Development Department at that job. There were no standup meetings, pair programming or code review because there wasn’t anyone I could have them with. I was told what was needed, and I implemented and deployed it. This made me a single point of failure.

When individuals become a single point of failure, the mistake has already been made. Humans are not perfect and will make mistakes.

Aftermath

Fortunately for me, my line manager reacted pretty well and was understanding. Other people, directly affected by this, were less impressed, but I didn’t need to worry about that too much.

Now, it’s easy to invoke the ghost of Imposter’s Syndrome, and say that we needen’t worry about our ability and should simply keep going. It worked out for me at the time, but I wonder what could have happened, or what has and will still happen to other people in similar circumnstances but in a less favourable environment.

Also: a moment to check my privilege. I’m a white male. Consider how many of my lot out there may have screwed up in a similar fashion with no repercussions. Consider how many in a different demographic may have been penalised after a similar event, because of a bias unconscious or otherwise.

What this can learn us

It’s not your fault

If you get in trouble for something like this, start looking for a new job: that environment is not conducive to your growing up as a professional or a person. Having said that, I understand this is easier said than done. Not everyone enjoys circumstances where making this jump is comfortable, or even possible, regardless of their ability.

You might beat yourself up about it. Don’t. Share it with other people, both your loved ones and peers in the industry (who may be one and the same!). A local tech meetup can be a good place to exchange experiences and find that other people also have their own botch-up stories.

Find out how others would deal with this

Experiences like this make for a good interview question, for both sides of the conversation:

As an interviewer: tell us about a time you messed up. What did you learn from it?
As an interviewee: tell me about a time when somebody messed up here. How was that deal with?

Fix the process, not the people

If something like this happens in your watch, ask yourself: how did the development process fail? What should be changed to avoid it repeating? What can the team learn from this experience?

In closing

You could read this as “There ain’t no such thing as individual failure in a software development team”. TANSTAIFIASDT. Catchy!