The Ultimate Guide to Debugging

How to Effectively Debug Software Failures and Bugs: From Investigation to Resolution.

Jan 04, 2024

Hello, friend!

First of all, sorry for being late this week. I've decided to publish my stories on Thursdays instead of Sundays.

This week, I'm bringing you a very interesting topic that is dear to my heart, especially because it took me years, and I'm still working on improving my debugging skills.

Debugging has always been a challenging part of software engineering. I have seen people who love wearing the detective hat, and people who hate this hat.

I’d estimate most of what we do as software engineers is debugging, maintaining existing systems, and fixing problems. So, it’s vital to be a good problem solver and debugger. To do that, follow the below simple steps…

Step #1: Observe and identify the error symptoms.

The first thing you need to do is identify the error symptoms. Don't worry about the cause now.

Ensure you have observed what's gone wrong by doing the following:

Checking the logs.
Checking the metrics.
Checking any fired alarms or tickets.

This step is crucial in giving you a full picture of what went wrong.

Step #2: Document.

Start documenting your findings of what went wrong and your observations. This is to easily access the information, whether you need it short-term or long-term.

Ensure this documentation is stored in a location accessible to everyone on your team.
Alternatively, if your company uses an incident tool, opening an incident where you can add notes and documents would be ideal for future reference.

Remember, this step doesn't just end here; it starts here. From now on, start documenting everything you find during the investigation for swift and easy access.

Don't worry too much about beautifying your documentation if you're in a rush; you can always refine it later.

Step #3: If it’s an urgent issue, fix the symptom and unblock the pipeline.

If you are dealing with an urgent matter that needs to be unblocked, then prioritize fixing the symptom as soon as you can.

In this scenario, you are like a doctor: if the patient is out of breath, you first address the breathing issue before investigating the cause.

Here's what you need to do, in the same order:

Fix the symptoms and get everything back to normal.
Start investigating the root cause.
Fix the symptom before proceeding with the investigation.
Proceed with investigating the root cause.

Step #4: If you can, replicate the error scenario with log lines.

By now, you should start investigating the root cause.

Trying to replicate the error scenario is a good starting point to trace down what happened exactly.

But, this step is optional. If you feel that the metrics, logs, or alarms are providing sufficient information to build a full picture.

Sometimes, replicating the error scenario can be challenging, or it may not be worth the effort or time required.

In such cases, I'd recommend adding log lines or trying to replicate the error using tests.

You'll need to decide which type of tests (unit, integration, system, etc.) are appropriate based on the use case. Proceed to the next step for more details…

Step #5: Add Breakpoints, logging, and tests.

If replicating the error proves challenging, proceed with the investigation by:

Writing log lines in the different parts of the code you suspect.
Adding breakpoints to the code.
Implementing tests (unit, integration, system, etc.) that replicate the error scenario. Choose the type of test based on the specific use case.
Utilizing debugging tools in your IDE or the programming language’s CLI debugging tool to examine the code steps using breakpoints.

If for any reason you can't write tests for that particular use case, then proceed to the next step.

Step #6: If none of the above is useful in your case, then divide the code and do a binary search.

Consider dividing the code base into partitions and adding log lines to these sections.

Base your division on the errors and observations you've noted.

This should allow you to focus on certain partitions while ignoring others. Same as the approach taken in a binary search.

Step #7: Write down actions and learnings.

At this stage, you should have a document detailing all the steps and findings of your investigation.

It's important to record this information, as it can be very helpful to both your future self and your colleagues.

I've been lucky to work with colleagues who document their investigations. This saves a lot of time when the same or similar problem appear again.

Also, it's beneficial to note any actions that could prevent such issues in the future.

If it's a minor action, go ahead and address it. If not,

Document it
Mark it as urgent
Make sure it's addressed by the team at the earliest opportunity.

Remember to share knowledge about the incident you've been debugging.
This builds your credibility in the team as someone who takes responsibility.

Bonus Points

Categorize the bugs

When managing a large number of error reports, organizing them into categories of related bugs is super useful.

Often, bugs in the same category share similar causes or display similar patterns.

Tackling one bug in a particular category can lead to the fix of others.

Rubber ducking

Explaining things out loud, whether to yourself or a colleague, is always useful.

With complex bugs, pair programming can be particularly effective. It allows you to gain different perspectives on the issue and often leads to a more prompt resolution.

Avoid trying to be a hero. My general rule of thumb is that if I can't solve a problem within two hours, I ask someone to pair with me so we can debug the issue together.

When two people collaborate on the same problem, it gets fixed much quicker.

Take breaks

Debugging is a mentally demanding process. So it's important to take breaks to avoid mental burnout.

Overworking your brain benefits no one.

If you're working on an open incident, make sure to inform others when you're taking a break.

Neuroscience also suggests that breaks are beneficial for your subconscious, as it continues to work on solving the problem in the background.

Ensure your break involves time away from screens. Ideally, take a walk to refresh your mind.

Watch out for confirmation bias

Confirmation bias has been a thorn in my side for years (and I still suffer from it from time to time).

I've been cursed by the habit of rushing to conclusions prematurely. That's just how my mind works.

If you're like me, you might struggle with keeping your brain in check to avoid jumping to conclusions too soon.

I've found the 5 Whys method very useful. By asking 'why' five times (sometimes more), you should be able to drill down to the root cause.

In conclusion, navigating the deep oceans of debugging is more than just a technical journey; it's a personal challenge that teaches patience, persistence, and perspective.

The satisfaction of unraveling a particularly stubborn issue is an exceptional feeling.

It's not just about fixing code, but also about growing as a problem solver and a team player.

Remember, every bug solved is a story to tell and a lesson learned, contributing to your growth.

Thank you for reading! I hope you enjoyed it, let me know by hitting the like button ❤️ to help others find it on Substack, and share it to spread the love!

I’d be honored if you tell me some more tips and techniques you use for debugging. Let’s learn together!

Best of luck in your next bug investigation!

Talk soon,

— Basma

Great posts you don’t want to miss:

Chemil

Jan 27, 2024

Great write up!

Something I’d add to the “Take breaks” section is to get some sleep. So many times we end up figuring out a solution to a bug while sleeping. It helps a lot!

Expand full comment

1 reply by Basma Taha

Fran Soto

Jan 6, 2024

I like the emphasis you put on documentation, Basma.

If we reach the point where we need debugging, it's often because there's no documentation around this kind of issue.

It would be a waste to fix the issue without leaving the knowledge base better for the next time

4 more comments...

An Engineer's Echo

Discussion about this post