Zero-Downtime Migration:
What 92% Test Coverage Means.

Author

Philipp Eiselt

Topic

Platform Delivery
ITSM Migration

Published

February 2026

Read time

10 min

When we reported 92% test coverage in our steering committee, the response was almost always the same: a nod, a tick against the relevant line, and a move to the next agenda point. The number sounded good. But very few people in that room understood what it actually meant, or more importantly, what it did not mean.

The migration context.

The project was a full replacement of the IT service management platform. ServiceNow out, Jira Service Management in, across an organisation running 24/7 operations on multiple sites. The existing ServiceNow instance had been live for several years and had accumulated significant customisation. Custom workflows, ERP integrations, monitoring system connections, and a large backlog of historical ticket data that business continuity requirements meant we could not simply abandon.

The constraint that made this hard was that we could not take the service desk offline. IT operations does not stop for a migration. Engineers needed to be able to log incidents, escalate, and track resolutions throughout go-live weekend. Any gap in that capability was not a technical failure; it was a business continuity failure. The go-live plan had to account for a live environment from hour one.

How we built the 92%.

Test coverage on a migration like this is not the same thing as unit test coverage in a software development context. We were not measuring lines of code. We were measuring scenarios. Of all the things this platform is expected to do in live operation, what proportion of them have we verified will work correctly in the new environment before we switch?

We built a scenario inventory from three sources. The existing ServiceNow workflow documentation, which was incomplete as it always is. A series of working sessions with the service desk team and key users. And a review of the last twelve months of ticket data to identify volume patterns and edge cases. That gave us a list of roughly 340 scenarios, ranging from "engineer logs a P1 incident via the portal" to "monitoring system auto-creates a ticket and routes it to the correct assignment group via an API integration."

We categorised each scenario by two factors: criticality and complexity. High criticality, high complexity scenarios got the most testing effort and multiple test cycles. Low criticality, low complexity scenarios got single-pass verification. The 92% figure meant that 92% of scenarios had been verified to the standard appropriate for their risk category. The remaining 8% were edge cases with documented manual fallbacks, or integration scenarios that could only be fully tested in production.

What happened on go-live day.

The cutover ran across a weekend. We went live on Sunday morning with a phased approach. Read-only access to the old system for 72 hours while the new system took all new ticket creation. The first eight hours were the highest risk window. Teams were standing by on every major integration point with pre-agreed rollback triggers if specific failure thresholds were crossed.

Three issues emerged in the first 48 hours. Two were in the 8% we had already flagged. A monitoring integration that behaved differently under real production load than in our test environment, and a custom escalation workflow built on ServiceNow-specific logic that did not translate directly. Both had documented workarounds ready. The third was genuinely unexpected. A browser compatibility issue with the new portal affected users on an older version of a specific internal browser. Minor, but it had not appeared in any of our test scenarios.

Zero major service interruptions. No data loss. No P1 incidents attributable to the migration. The ITSM error rate dropped 20% within the first month as the new platform's routing logic outperformed the customised ServiceNow workflows. By any measure that mattered, the go-live was clean.

What the number actually tells you.

The 92% figure was not a guarantee. It was a structured argument that we had done the right work in the right places, and that the residual risk was understood and manageable. That is a completely different claim from "everything will work." The value of the coverage number was not the number itself; it was the process of building it. The scenario inventory forced a conversation between the technical team and the business about what actually had to work, and that conversation surfaced assumptions that would otherwise have remained invisible until go-live.

If you are running a migration and someone asks you what your test coverage is, the right answer is not a percentage. Start with the scenarios you defined as critical. Explain how you tested them. Be clear about what you could not test in advance and what you have planned for those cases. That answer takes longer to give. It is also the only honest one.

Philipp Eiselt

Independent consultant in IT Portfolio Management, PMO & Governance, and Digital Transformation. Based in APAC, working globally.

Follow on LinkedIn

ShareLinkedIn X / Twitter

Back to all posts

Zero-Downtime Migration:
What 92% Test Coverage Means.

Author

Topic

Published

Read time

The migration context.

How we built the 92%.

What happened on go-live day.

What the number actually tells you.

Philipp Eiselt

More to read.

IT Governance That Doesn't Have to Slow You Down

Stakeholder Alignment Is Not a Phase. It's a Practice

Why Your IT Portfolio Reporting Is Failing

Zero-Downtime Migration:What 92% Test Coverage Means.

Author

Topic

Published

Read time

The migration context.

How we built the 92%.

What happened on go-live day.

What the number actually tells you.

Philipp Eiselt

More to read.

Zero-Downtime Migration:
What 92% Test Coverage Means.