Are you a bit slow like me and sometimes have to stop and think to differentiate your key assurance metrics like your MTTRs from your MTBFs?
If so, I thought this useful diagram from researchgate.net might help
The metrics are:
MTBF (Mean Time Between Failures) – the average elapsed time between failures of a system, service or device. It’s the basic measure of availability / reliability of the system / service / device. The higher, the better.
MTTR (Mean Time to Repair) – generally used to denote the average time to close a trouble ticket (to repair a failed system / service / device). It’s the basic measure of corrective action efficiency. The lower, the better.
Some also use MTTR as a Mean Time to Recover / Resolve (ie MTTD + MTTR in the diagram above) or Mean Time to Respond (MTTD in the diagram above to acknowledge an event and create a ticket). See why I get confused?
MTTD (Mean Time to Detect / Diagnose) – the average time taken from when an event is first generated and timestamped to when the NOC detects / diagnoses the cause and generates a ticket. The lower, the better.
MTTF (Mean Time to Failure) – the average system / service / device up-timeRead the Passionate About OSS Blog for more or Subscribe to the Passionate About OSS Blog by Email