Skip to main content

Command Palette

Search for a command to run...

Your hero-engineer might be your biggest operational risk!

The 2am hero - asset or a warning sign?

Published
5 min read
Your hero-engineer might be your biggest operational risk!

There is a type of engineer almost every company has.

The person who gets pulled into every serious incidents, knows which service is actually failing just looking at the dashboards, or who can SSH at 2am and somehow stabilises the system before leadership and stakeholders wake up.

Most organisations aim to build resilient systems, while they actually reward the resilient engineer.


Hero-driven reliability

Some systems are not reliable because they are well-designed, but because one person refuses to let them fail. That person knows the edge cases not in any documentation, possess a mental model of production system that exists nowhere else, and get paged more often as they're the only one who can interpret the incident.

In a regulated industry like fintech, this is alarming; not just operationally, but also regulatory. A system whose resilience depends on one person's availability or attention is not an operationally stable system.


What happens to the team around the hero?

Teams with permanent hero-on-demand eventually lose their investigative capabilities and depth. And it will not happen at once, or immediately, but by small decisions that each would make sense and reasonable at the time.

Someone starts to defer to the hero before fully investigating a problem themselves, someone else stops reading the architecture documentation because they know they can just ask.

The hero, to their credit, usually continues helping. They are often generous people who genuinely want to solve problems. But the accumulation of these small deferrals creates something structural: the team quietly replaces documentation with a person.

And then there is another team dynamics, nobody likes talking about - the quiet resentment building up in competent engineers who aren't hero, but know that with better system, planning, and knowledge distribution, the team and the system won't even need the constant saving. This rarely surfaces as direct conflict, but disengagement, and eventually them deciding that the team is not a place for them to grow.


Hero identity - the psychological angle

Not every hero engineer is protecting knowledge intentionally. In fact, it usually starts from a healthy space - they care deeply, move quickly, feel personally responsible for outcomes and genuinely want the system and team to succeed.

The organisation starts rewarding behaviour repeatedly, visibility increases, leadership trust increases, and their opinions start carrying more weight. After some time -

Some engineers stop experiencing it as a burden, and start experiencing it as an identity.

The act of sharing knowledge is treated as an act of sharing status and relevance, because once someone becomes professionally recognised as “the person who saves things,” distributing knowledge can start to feel strangely threatening.

Organisations that do not recognise this dynamic will keep treating it as a knowledge management problem. While it is a status and recognition problem, that cannot be solved with just better documentation tooling.


A version darker than individual psychology

Some companies are structured in a way, that certain systems are genuinely under-resourced for the load they expect to carry. The technical debt is real, but the team is too small. The incident rates are high, and so is the gap between "what the system needs" vs "what the organisation has allocated".

In these environments, heroism is not a cultural problem, but a resource problem. The arrangement works until it doesn't. Because people leave, or just quietly stops caring.

Another major structural issue is creating a prestige system around operational sufferings, because calm systems do not generate enough visibility, or nobody applauds a migration where nothing interesting happened. Once the organisation starts emotionally rewarding firefighting than prevention, it is one step closer to permanent operational chaos.


It's not always a RED FLAG

In an early-stage startups, some degree of heroism is unavoidable, or required. For the first 2-3 years of a company, the systems are immature, teams are small, well-defined processes do not exist yet, and the company is still trying to capture the maximum of market.

In such environments, someone stepping up, to hold things together through force of will, is not dysfunction, it is survival and necessary.

Problem arises when some companies never evolve beyond that stage psychologically, even after they evolve operationally.

And even in mature systems, genuine crises warrant genuine recognition. An extraordinary incident handled with extraordinary effort deserves public acknowledgment. The problem is not the recognition.

The moment heroism shifts from exceptional to expected, from "this person did something remarkable" to "this is how we operate", something has gone wrong.


What mature engineering culture and organisations look like?

Good engineering organisations still recognise heroic effort, and they should.

Real incidents happen, system unexpectedly fails, and these are the moments where intervention is genuinely protects customers and business. But the mature culture treats heroism as -

An exception to defend business, and not a cultural identity.

As system grows, the incidents become less, recovery become predictable, operational knowledge is distributed, and perhaps, more importantly, people preventing or putting out the fire are equally valued as much as the people building the system.


The uncomfortable part

Organisations that want to fix this usually start by talking about documentation, runbooks, on-call rotations etc. While these things definitely matter majorly, there is another question to think about - how does your reward system look like?

If the engineer who quietly prevented three incidents last quarter received less visibility than the engineer who dramatically resolved one, company is already setting up a precedent of behaviour they prefer incentivising.

Some organisations subconsciously prefer heroes because fixing systems structurally is expensive, politically difficult, and invisible.


7 views

Beyond Technical Engineering

Part 1 of 1

A series for human side of engineering - hiring, onboarding, team culture, decision-making under pressure, and the leadership patterns that separate struggling teams from the high-performing ones.