Skip to content

008: Complete Ownership During Incidents

Old School Burke
Old School Burke
4 min read
008: Complete Ownership During Incidents
Photo by Alex Shute / Unsplash

There’s a constant temptation in our software engineering world to treat incidents as someone else’s problem. When your service experiences downtime because of an infra hiccup, it’s easy to say, “This is Infra's problem,” and then sit back. But if you’re the service owner, you can’t just rent responsibility, you need to own it.

The Homeowner and the Plumbing

Imagine you’re a homeowner. One day, you notice water pooling in your living room. You call a plumber, and she patches up the leak. But even after the repair, you wouldn’t simply ignore the incident (hopefully). You’d want to know what caused the leak, whether there’s hidden damage, and what you can do to prevent it from happening again.

As a service owner, your infra team might be the experts fixing the “plumbing,” but you’re still the homeowner. It’s not enough to receive an incident report and then push it away like someone else’s problem. Instead, ask yourself:

  • What exactly went wrong?
  • How did this failure affect our service and users?
  • What steps can we take on our end to mitigate similar risks?

The team should still own the high-level narrative. It is not about micromanaging other experts, it is about understanding the broader impact so you can drive improvements.

Full-Spectrum Accountability: Owning Every Phase

You don’t need to become an infra expert, but you do need a clear blueprint of your system. A good dashboard should do more than just alert you to an issue. It should hint at the “what” and the “why” behind it. That insight is what lets you ask the right questions and work effectively.

Even when the infra team takes the lead on an incident, your role as a service owner is critical throughout the entire incident lifecycle. Let’s break it down:

  • Detection & Triaging Phase: Make sure there’s clear communication about how the issue impacts the overall service and customer experience. Ask for context beyond the technical details. Think of it as checking how the “leak” is affecting your home.
  • Mitigation Phase: Stay engaged by monitoring key metrics and making sure that the updates capture both technical progress and business impact. Again, this isn’t about micromanaging, but about bridging the gap between technical fixes and the customer experience.
  • Post-Incident: You should actively participate in a blameless postmortem. If possible, asking targeted questions such as “What triggered this incident?” and “How can we adjust our processes or monitoring to prevent a recurrence?” You should be present in the operation review meetings, if any, where this incident is presented. And/or, at least, present this meeting in your org's group as the service owner.

So, to conclude: when an incident occurs, don’t simply pass the buck. Rather engage to drive learning from the incident in your team.

Actionable Steps for Managers

  • Understand what is the normal for the metrics of your service(s). Regularly review your dashboards and identify deviations early. Look for patterns that signal systemic weaknesses.
  • Set aside some time after each incident to debrief with your team. Use these sessions to connect the technical details with the impact on users.
  • During postmortems, challenge your team with questions like, “How did this impact our users?” and “What can we do differently next time?” Engage actively, participate to make sure that every stakeholder is invested in the service’s long-term resilience.

Own Your Service

In conclusion, own your service entirely, please. All the parts – including the messy parts. Especially the messy parts.

Don’t let others carry the entire weight of an incident that happened to your service, your engagement is essential. When an incident occurs, don’t simply pass the buck. Embrace the responsibility that comes with being a service owner. Understand the basics, work alongside your experts, and drive improvements that extend beyond a quick fix.

Because building resilient systems is a team effort.

P.S. Got a story about an incident where owning your role made all the difference? Drop me an email with your experience that could help someone else take that crucial next step!


Reference reads for owner vs renter mentality:

Owner or Renter Mentality
In a recent blog post by Dan Ryan, Ryan Search and Consulting, Are Your Staff members “Owners or Renters”?, great leaders find a powerful metaphor about “owners or renters” and the mindset, attitud…
Ownership is a requirement
One of the cultural values at Tecton is “be an owner, not a renter.” We’ve iterated on these values as the company has grown, but this particular value has been set in stone since the day I joined.…
leadershiposbNewsletter

Related Posts

009: The Ladder of Autonomy

Understanding Task Relevant Maturity and Ladder of Autonomy

009: The Ladder of Autonomy