Getting confused by key Assurance metrics?

Are you a bit slow like me and sometimes have to stop and think to differentiate your key assurance metrics like your MTTRs from your MTBFs?

If so, I thought this useful diagram from researchgate.net might help

The metrics are:

MTBF (Mean Time Between Failures) – the average elapsed time between failures of a system, service or device. It’s the basic measure of availability / reliability of the system / service / device. The higher, the better.

MTTR (Mean Time to Repair) – generally used to denote the average time to close a trouble ticket (to repair a failed system / service / device). It’s the basic measure of corrective action efficiency. The lower, the better.

Some also use MTTR as a Mean Time to Recover / Resolve (ie MTTD + MTTR in the diagram above) or Mean Time to Respond (MTTD in the diagram above to acknowledge an event and create a ticket). See why I get confused?

MTTD (Mean Time to Detect / Diagnose) – the average time taken from when an event is first generated and timestamped to when the NOC detects / diagnoses the cause and generates a ticket. The lower, the better.

MTTF (Mean Time to Failure) – the average system / service / device up-time

If you’d like some further assistance on how to make use of these metrics and how to calculate them you can find some additional info over here at Resco.net

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.