Yep, this is the third part, so that might suggest that there were two lead-up articles prior to this one. Well, you’d be right:
- The first proposed that most of what we refer to as “service assurance” is really only “network infrastructure” assurance.
- The second then looked at the constraints we face in trying to reverse-engineer “network infrastructure” assurance into data that will allow us to assure customer services.
I should also point out that both posts, like today’s, were inspired by an interesting white paper from the Netrounds team titled, “Reimagining Service Assurance in the Digital Service Provider Era.”
Today we’ll discuss the approach/es to overcome the constraints described in yesterday’s post.
As shown via the inserted blue row in Table 6 below (source: Netrounds), a proposed solution is to use active measurements that reflect the end-to-end user experience.
The blue row only talks about the real-time monitoring of “synthetic user traffic,” in the table below. However, there are at least two other active measurement techniques that I can think of:
- We can monitor real user traffic if we use port-mirroring techniques
- We can also apply techniques such as TR-069 to collect real-time customer service meta data
Note: There are strengths and weaknesses of each of the three approaches, but we won’t dive into that here. Maybe another time.
You may recall in yesterday’s post that we couldn’t readily ask service-related questions of our traditional systems or data. Excitingly though, active measurement solutions do allow us to ask more customer-centric questions, like those shown in the orange box below. We can start to collect metrics that do relate directly to what the customer is paying for (eg real data throughput rates on a storage backup service). We can start to monitor real SLA metrics, not just proxy / vanity metrics (like device up-time).
Interestingly, I’ve only had the opportunity to use one vendor’s active measurement solutions so far (one synthetic transaction tool and one port-mirror tool). [The vendor is not Netrounds’ I should add. I haven’t seen Netround’s solution yet, just their insightful white paper]. Figure 3 actually does a great job of articulating why the other vendor’s UI (user interface) and APIs are currently lacking.
Whilst they do collect active metrics, the UI doesn’t allow the user to easily ask important service health questions of the data like in the orange box. Instead, the user has to dig around in all the metrics and make their own inferences. Similarly the APIs don’t allow for the identification of events (eg threshold crossing) or automatic push of notifications to external systems.
This leaves a gap in our ability to apply self-healing (automated resolution) and resolution prior to failure (prediction) algorithms like discussed in yesterday’s post. Excitingly, it can collect service-centric data. It just can’t close the loop with it yet!
More on the data tomorrow!Read the Passionate About OSS Blog for more or Subscribe to the Passionate About OSS Blog by Email