#004B – Project Aegis: The Answer Key

Case Study

Jan 18

Written By OH

If you read Project Aegis and felt slightly uncomfortable, that’s the point.

Aegis is the kind of program where risk management exists, but doesn’t function. The register grows, meetings & reporting happen… and then reality hits the team in the face during the first migration rehearsal.

This follow-up gives you a structured way to think about the situation—what matters, what doesn’t, what to do first, and why.

A risk register alone is not risk management. Risk management is when risks change decisions.

First: What Aegis is really about

This case study is not mainly about “risk process quality.”

It’s about this:

people "tracking" risks but not developing & funding responses
leadership resisting “red” even when facts are red
vendors soft-pedaling reality until the issue becomes unavoidable
a PM caught between transparency and political survival
and a program with a public/regulatory deadline that removed room for denial

It’s a governance and behavior problem wearing a risk-management skeleton costume. Proper Halloween! :)

1) What went wrong in risk management implementation?

The simple diagnosis

Risk management became an administrative activity instead of a decision-making tool.

What that looked like in Aegis

Single workshop at the start → nothing sustained.
Generic risks (“data migration complexity”) → no actionable detail.
“Owners” assigned → but they don’t act like owners. They don’t own.
Responses = “Monitor” / “TBD” → no response plan and no accountability.
Register size grows → usefulness diminishes.
Status reporting becomes image management instead of real control & course correction.

The deeper issue aka "the real killer"

Risk management wasn’t integrated into how the program was run:

not tied to schedule buffers
not tied to contingency budgets
not tied to triggers (“if X happens, we do Y”)
not tied to governance thresholds (“if this threshold is crossed, we escalate”)

If your risks don’t create actions, owners, funding, and decisions—your risk process is only a reporting exercise with no added value.

2) What failures must be addressed now?

When you’re the PM in the middle of this mess, you don’t have time to fix “everything.” Prioritize & Execute.

If you can only fix 2–3 things fast, focus on what stops the bleeding:

#1: Stop calling it a risk if it’s already an issue

The migration rehearsal failed. Defect rates and rework cycles have measurable impact. That’s no longer “possible future uncertainty.”

Call it what it is: a live issue with delivery impact.

Why it matters:

risks can be ignored
issues force decisions and actions

This is often the moment where governance either becomes real—or collapses into politics & image control.

#2: Force funded response planning for the top risks/issues

Not for all the risks. That’s noise.
For the few items on your critical path (migration, environments, performance, regulatory reporting), you need:

an owner with capacity to own & act
a response plan with dates and resources
explicit funding / staffing decision
agreed triggers

Why it matters:

“Mitigation TBD” is not a mitigation
unfunded mitigation is mostly a wishful thinking

#3: Align governance on decision thresholds

Aegis is stuck because everyone is afraid of being first to say “red.”

So you need a clear definition of what constitutes:

“amber”
“red”
a re-baseline discussion
SteerCo/governance board communication

Why it matters:

people fear opinion-based red flags
thresholds make the conversation objective

Red isn’t a mood. Red is a fact, reflecting a breach of agreed thresholds.

What can wait (for later)

Once delivery is stabilized, you can rebuild the broader framework:

cleaner risk taxonomy / RBS
improved tool hygiene
regular workshop cadence across all workstreams; set the frequency and stick to it
enhanced reporting visuals
deeper quantitative analysis beyond top risks

But don’t start there. These are foundational but won't stop immediate bleeding like the migration crisis.

If you do start here, you’ll end up managing the process while the program burns. Damage control/stabilization comes first.

3) How to classify and handle the migration situation right now

This is the moment where many programs die: when leadership tries to keep the language soft, despite the reality.

Classification - This is no longer a “migration risk”. It is a migration issue with schedule impact, plus a clear risk of regulatory noncompliance if mishandled.

Who needs to know (realistically)

At minimum:

Sponsor
Head of Data / migration leadership
Portfolio Director (because cross-program impact)
Vendors (because capacity and contract levers matter)
PMO / governance function (because reporting must match reality)

If regulators are involved as a milestone owner (as the case suggests), then you must also be prepared for:

audit questions about when you knew
what you did about it
and whether reporting was accurate and transparent

Status reporting

The status should reflect the reality you can prove, not the optimism you hope for.

If dress rehearsal is infeasible under current capacity, that should be visible as:

red on rehearsal readiness
amber/red on go-live confidence

But you can avoid chaos by reporting in a structured way:

Facts (what happened, what the data shows)
Impact (timeline, cost, compliance risk)
Options (what you can do next; it's very important you don't come to the table empty handed)
Decision needed (what leadership must choose)

Re-baseline immediately or wait?

In practice:

You don’t re-baseline based on panic.
You also don’t delay until the project is dead.

The right move is often:

quick technical assessment (~ 1–2 weeks)
create 2–3 viable recovery scenarios
then re-baseline once leadership selects a path

Don’t re-baseline blindly.

But don’t pretend the old plan is still real.

4) What actions would you take in the next 2–4 weeks as the PM?

Here’s a realistic order that balances delivery first, governance second:

Step 1 — Run a short, direct situational assessment (delivery reality)

This is not “yet another meeting.” It’s a factual reset:

how many migration cycles are realistically needed?
what defect burn-down rate is achievable?
what data quality gates must be met?
what environments and test datasets are missing?
what’s the critical path now, not in last month’s schedule?

Expected outcome:

a credible picture of “where we really are”

Step 2 — Build three recovery options (not ten - analysis paralysis is real)

Examples:

Option A: crash migration capacity (cost + secondary risks)
Option B: slip dress rehearsal / move go-live (political cost)
Option C: reduce scope / phased migration (risk trade-off)

Expected outcome:

leadership has something real to decide on

Step 3 — Secure decisions on capacity and funding

This is the key point where programs either recover or stay stuck.

You need:

named resources
agreed funding
committed vendor actions (and contract levers if needed)

Expected outcome:

mitigation becomes a real plan, not a ppt slide

Step 4 — Communicate in a controlled way (no surprises, no panic)

You don’t dump raw fear into the client or board.

You present:

reality + response + decision points

Expected outcome:

trust maintained even while reporting gets worse

Step 5 — Establish “minimum viable risk control”

Not a full risk process rebuild yet—just enough to stop the same failure mode repeating:

focus on top 10 risks/issues only
risk owners with actual authority and accountability
weekly review linked to delivery plan
clear triggers for escalation

Expected outcome:

risk management starts influencing delivery decisions again

5) What would you change going forward?

Once the program is stabilized, this is how you stop Aegis from repeating itself:

Governance

define risk thresholds (what triggers pre-defined actions)
define escalation rules & rights (who can force a decision), this should have been in the Risk Management plan since the very start
require funded responses for critical path risks

Workshops

risk workshops are not “kickoff events”
run them by workstream & across workstreams regularly; high-impact risks often times hide in the grey zone between workstreams

Culture

“red = incompetence” cultural mindset needs to be dismantled. Decisively.

Contingency and buffers

stop using “5% generic contingency”
link contingency to real risk exposure; CCPM (Critical Chain Project Management) can guide you here
protect buffers like you protect schedule baselines

Integration with planning

risks must map to:

critical path tasks
dependencies
milestone readiness criteria

otherwise they remain “background noise”

If risk management doesn’t change the schedule, staffing, funding, or scope— it’s not management. It’s documentation.

The point of Aegis

Project Aegis failed not because people didn’t know risks exist.

It failed because the organization created a system where:

acknowledging risks has a political cost
and ignoring them has no immediate penalty

At least not until reality arrives.

Case StudyProject ManagementRisk ManagementProject Recovery

OH www.ondrejhloch.com

#004B – Project Aegis: The Answer Key

First: What Aegis is really about

1) What went wrong in risk management implementation?

2) What failures must be addressed now?

3) How to classify and handle the migration situation right now

4) What actions would you take in the next 2–4 weeks as the PM?

5) What would you change going forward?

#005A - Don’t test on moving targets

#004A - Project Aegis: When Risk Management Exists Only on Paper