Maintenance strategy is often treated as a narrative: a policy statement, a set of principles on a slide, or a consultant’s diagram that shows run-to-fail, time-based, and condition-based tiers. In live operations, none of that changes outcomes until it becomes something the enterprise can execute: explicit rules, master data, schedules, spares logic, and evidence that work was actually done the way the strategy assumes. From an architecture viewpoint, the gap between “we have a strategy” and “the plant behaves that way” is almost always an engineering problem disguised as a culture problem. If the strategy cannot be expressed as data and process, it will not survive contact with shift changeovers, contractor turnover, or a busy outage season.
This article is written from a Technical Architect viewpoint: the focus is how intent flows into systems, data, and controls, not only into slide decks. It sets out how to harden maintenance strategy so it is traceable, testable, and operable, without turning the organisation into a bureaucracy. It sits alongside thinking on asset criticality and ISO 55001 as a lived system: strategy is the bridge between risk appetite in the boardroom and the work order someone closes on night shift.
Why “strategy on paper” fails in the field
Most failures are not philosophical. Teams do not ignore the stated strategy because they dislike reliability engineering. They ignore it because three conditions are missing.
First, the strategy is not disambiguated at asset class level. “Condition based where justified” sounds reasonable until two engineers disagree on what justified means for the same pump train. Without class-level defaults and explicit exceptions, every decision becomes a debate.
Second, the CMMS or EAM layer does not encode the strategy. If strategy lives in a PDF and work plans live in tribal knowledge, the system of record describes a generic world, not your risk choices. Planners then optimise for throughput and noise, not for the failure modes you care about.
Third, there is no closed loop between failures, near misses, and strategy review. A maintenance strategy that never changes after major incidents is a decorative document. Architecture here means feedback paths: incident data, bad actors, and cost of work must flow back into a governed review, not only into a monthly operations report.
Define strategy as decision rules, not slogans
A useful strategy answers, for each asset class, the same compact questions:
- What failure modes matter for safety, environment, production, and compliance?
- What is the default intervention pattern before functional failure (inspection frequency, condition monitoring, fixed time overhaul, run to fail with contingency)?
- What evidence is required to prove the intervention happened, and to what standard?
- What changes when context changes (redundancy lost, duty cycle uprated, seasonal mode)?
Those answers should read like a specification, not a vision statement. If reliability engineering cannot translate policy into rules that a planner can apply without calling them, the strategy is not finished.
Tie rules to criticality and consequence
Criticality is the prioritisation lens; maintenance strategy is the treatment plan. When the two are developed separately, you get high criticality assets maintained like low criticality ones because the PM catalogue was inherited from a legacy implementation. The fix is not more meetings. It is a controlled matrix that maps criticality bands and dominant failure modes to approved treatments, then propagates that mapping into master data and job plans. Organisations that treat solution design as part of strategy definition usually catch mismatches earlier, before tens of thousands of work orders encode the wrong defaults.
Encode the strategy where work is actually planned
Execution systems only behave as well as the data model allows. Practically, that means:
- Asset classifications that reflect how you think about risk and failure physics, not only accounting codes.
- Job plans and routes that spell out method, tolerances, and required measurements, not only “inspect per OEM”.
- Materials and spares policies aligned to the same risk logic (what sits on site, what is regional, what is bought on failure).
- Prioritisation and scheduling rules that make high consequence work visible when capacity is constrained.
None of this replaces judgement on unusual events. It reduces the number of decisions that require heroics. Good architecture also avoids “shadow maintenance”: parallel spreadsheets and messaging channels that prove the official process is too slow or too vague. If people route around the system, treat that as a signal that the encoded strategy is incomplete or untrusted.
Governance that is light enough to use
Heavy governance kills adoption; absent governance kills consistency. A workable middle layer for strategy execution includes:
- A single owner for the strategy document set (often asset management or reliability lead) with clear authority to publish versioned changes.
- A change control path for PMs, routes, and BOMs that mirrors the risk of the change (high consequence assets get more scrutiny than low).
- Defined data stewards for equipment records and failure coding so post work analytics remains trustworthy.
- A quarterly or half-yearly review triggered by KPI thresholds and by significant events, not only by the calendar.
ISO’s asset management guidance family stresses alignment between objectives, plans, and operational controls. You do not need to chase certification to borrow that discipline: treat maintenance strategy as part of the management system, with the same seriousness as financial controls on large projects.
Measure whether the strategy is real
Pick a small set of operational tests that reflect intent, for example:
- Percentage of time based PMs completed inside the planned window for your highest consequence class.
- Percentage of work orders on critical classes with a structured problem code and failure mode where applicable.
- Repeat failure rate on assets where the strategy promises proactive treatment.
- Mean time to restore for scenarios you claim you can tolerate (run to fail is only acceptable if recovery is engineered and rehearsed).
If metrics stay flat while you insist the strategy improved, either the metrics are wrong or the strategy never reached the line. Both are architecture and data problems before they are motivational ones.
Closing the loop without boiling the ocean
You do not need a perfect asset register to begin. You do need a credible sequence: stabilise identity and class for the worst consequence assets first, align treatments to the agreed maintenance strategy, then widen scope as data quality and planner capacity allow. That sequencing mirrors sensible modernisation paths for enterprise work management estates: prove control where risk concentrates, then scale patterns that have already been shown to work in your own plants.
If you are refreshing how work is directed across regions, it helps to read how other teams frame sequencing between core Maximo Manage processes and adjacent analytics layers. Our commentary on sequencing MAS applications after Manage is written from a suite perspective, but the underlying idea is the same: get the operational backbone truthful before you invest in layers that amplify bad signals.