Getting confused by key Assurance metrics?

Are you a bit slow like me and sometimes have to stop and think to differentiate your key assurance metrics like your MTTRs from your MTBFs?

If so, I thought this useful diagram from researchgate.net might help

The metrics are:

MTBF (Mean Time Between Failures) – the average elapsed time between failures of a system, service or device. It’s the basic measure of availability / reliability of the system / service / device. The higher, the better.

MTTR (Mean Time to Repair) – generally used to denote the average time to close a trouble ticket (to repair a failed system / service / device). It’s the basic measure of corrective action efficiency. The lower, the better.

Some also use MTTR as a Mean Time to Recover / Resolve (ie MTTD + MTTR in the diagram above) or Mean Time to Respond (MTTD in the diagram above to acknowledge an event and create a ticket). See why I get confused?

MTTD (Mean Time to Detect / Diagnose) – the average time taken from when an event is first generated and timestamped to when the NOC detects / diagnoses the cause and generates a ticket. The lower, the better.

MTTF (Mean Time to Failure) – the average system / service / device up-time

Read the Passionate About OSS Blog for more or Subscribe to the Passionate About OSS Blog by Email

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.