Inability to serve the market (eg offerings, capacity, etc)
Inability to operate network assets profitably
In that article, we looked closely at a human factor and how current trends of open-source, Agile and microservices might actually exacerbate it. In yesterday’s article we looked at market-serving factors for us to investigate and monitor.
But let’s look at point 3 today. The profitability factors we could consider that reduce the chances of the big boss getting fired are:
Ability to see revenues in near-real-time (revenues are relatively easy to collect, so we use these numbers a lot. Much harder are profitability measures because of the shared allocation of fixed costs)
Ability to see cost breakdown (particularly which parts of the technical solution are most costly, such as what device types / topologies are failing most often)
Ability to measure profitability by product type, customer, etc
Are there more profitable or cost-effective solutions available
Is there greater profitability that could be unlocked by simplification
I’m currently reading a book entitled, “Jony Ive. The genius behind Apple’s greatest products.”
I’d like to share a paragraph with you from it (and probably expect a few more in coming days):
“…Apple’s internal culture heavily favored the engineers within the product groups. The design process was engineering driven. In the early days of Frog Design, the engineers had bent over backward to help implement the design team’s ambitions, but now the power had shifted. The different engineering groups gave their products in development to Brunner’s group, who were expected to merely “skin” them.
Brunner wanted to shift the power from engineering to design. He started thinking strategically… The idea was to get ahead of the engineering groups and start to make Apple more of a design-driven company rather than a marketing or engineering one.”
That’s an unbelievably insightful conclusion Robert Brunner made. If he wanted to turn Apple into a design-driven company, then he’d have to prepare design concepts that looked further into the future than where the engineers were up to. Products like the iPod and iPad are testimony that Brunner’s strategy worked.
We face the same situation in OSS today. The power of product development tends to lie with engineering, ie the developers. I have huge admiration for the very clever and very talented engineers who create amazing products for us to use, buuutttttt…….
I just have one reservation – is there a single OSS company that is design-driven? A single one that’s making intuitive, effective, beautiful experiences for their users? Of course engineering holds power over design in OSS – how many OSS vendors even have a dedicated design department???
Let me give a comparison (albeit a slightly unfair one). Both of my children were reasonably adept at navigating their way around our iPad (for multiple use cases) by the age of three. What would the equivalent “intuition age” be for navigating our OSS?
If you’re a product manager, have you ever tried it? Have you ever considered benchmarking it (or an equivalent usability metric) and seeing what you could do to improve it for your OSS products?
When I first started the Passionate About OSS site / blog many years ago, I was lucky to get a handful of views per day. It’s grown by many multiples since then, fortunately.
The launch of The Blue Book OSS/BSS Vendor Directory generated some exciting metrics yesterday. The directory alone came within 5 pageviews of the highest count we’ve ever seen on PassionateAboutOSS.com (and PAOSS is up to nearly 2,500 posts now). That total appeared in only a 14-hour window because we didn’t go live with The Directory or metric collection until ~10am local time! The graphs are indicating that we should easily exceed PAOSS’s best ever count today.
If you were one of the many viewers who popped in from all around the world to look at The Directory, thank you! If you have any suggested improvements, we’d love to hear from you as we’re sure to be making many further tweaks in coming days/months.
But the most interesting fact about the launch yesterday was that a job posting appeared on UpWork to scrape all the data we’ve presented. On our very first day!! In fact a gentleman in the US reviewed bids and awarded the UpWork job all within about 14 hours of go-live.
That’s positive news because it means that at least one person must’ve thought the data was useful. 🙂
It provides a comprehensive directory of over 400 suppliers that produce OSS, BSS and/or related network management tools. Company details, product details and functionality classifications are included.
Every network operator has a unique set of needs from their operational software – software that includes OSS (Operational Support Systems), BSS (Business Support Systems), NMS (Network Management Systems) and the many other related tools.
To service those many and varied needs, a large number of different products have been created by some very clever developers. But it’s a highly fragmented market. There are literally hundreds of product options out there and they all have different capabilities.
If you’re a typical buyer, how many of those products are you familiar with? Five? Ten? Fifty? How do you know whether the best-fit product or supplier is within the list you already know? Perhaps the best-fit is actually amongst the hundreds of other products and suppliers you’re not familiar with yet. How much time do you have to research each one and distill down to a short-list of possible candidates to service your specific needs? Where do you start? Lots of web searches? There has to be an easier way.
What if you’re a seller? These products tend to have lengthy life-cycles once they’ve been installed so it might be years before a prospect actually enters the buying phase. Yet there are so many prospects out there at different phases of their buying windows. There are bound to be some live ones at any time that suit your capabilities. The challenge for you as a supplier is how to make those prospects aware of you. You don’t have the time to establish trusted relationships with hundreds, perhaps even thousands, of buyers across the globe (or maybe just within your region/s). Wouldn’t you love to be presented with qualified prospects who are in (or nearing) their buying window?
Well we at Passionate About OSS have created The Blue Book OSS/BSS Vendor Directory to simplify the task of bringing buyers and sellers together. With over 400 suppliers listed (and climbing), we provide a single, comprehensive repository for searching, matching and connecting. The tools allow you to do it yourself, or we can help you using the approaches we’ve developed, used and refined over the years.
Now just click on “Directory” to start your journey of searching, matching and connecting (and updating your listing if you’re a supplier).
We’ve spoken at length about TM Forum’s, “Time to kill the RFP? Reinventing IT procurement for the 2020s,” report so far this week. We’ve also spoken about the feeling that the OSS/BSS RFP (Request For Proposal) still has relevance in some situations… as long as it’s more of a lighter-touch than most. We’ve spoken about a more pragmatic approach that aims to find best available fit (for key objectives through stages of filtering) rather than perfect fit (for all requirements through detailed analyses). And I should note that “best available fit” includes measurement against these three contrarian procurement KPIs ahead of the traditional ones.
Yesterday’s post discussed how we get to a short list with minimal involvement of buyers and sellers, with the promise that we’d discuss the detailed analysis stage today.
It’s where we do use an RFP, but with thought given to the many pain-points cited so brilliantly by Mark Newman and team in the abovementioned TM Forum report.
The RFP provides the mechanism to firm up pricing and architecture, but is also closely tied to a PoC (Proof of Concept) demonstration. The RFP helps to prioritise the order in which PoCs are performed. PoCs tend to be very time consuming for buyer and seller. So if there’s a clear leader from the paper studies so far, then they will demonstrate first.
If there’s not a clear difference, or if the prime candidate’s demonstration identified significant gaps, then additional PoCs are run.
Next steps are to form the more detailed designs, commercials / contracts and ratify that the business case still holds up.
In yesterday’s post, I also promised to share our “starting-point” procurement methodology. I say starting point because each buyer situation is different and we tend to customise it to each buyer’s needs. It’s useful for starting discussions.
The overall methodology diagram is shown below:
A few key notes here:
The process looks much heavier than it really is… if you use traditional procurement processes as an indicator
We have existing templates for all the activities marked in yellow
The activity marked in blue partially represents the project we’re getting really excited to introduce to you tomorrow
Having to get into significant discussions with vendors (yet)
Gathering all your stakeholders together to prepare a detailed list of requirements
We’ll call this “the long list,” which might consist of 5-20 suppliers. We use this evaluation technique (which we’ll share more about on Monday) to ensure we’ve looked at the broad market of suppliers rather than just the few the buyer already knows.
The next step we follow helps us to get to a much smaller list, which we’ll call “the short list.”
For this, we do need to contact vendors (the long list) and we do need to prepare a list of requirements to add to the objectives and key workflows we’ve previously identified. The requirements won’t need to be detailed, but will still probably number into the 100s – some from our pick-list, others customised to each client’s needs.
Then we engage in what we refer to as an EOI (Expression of Interest) phase. Our EOIs are not just a generic market capability analysis like many buyers conduct. Ours seek indicative vendor compliance (to objectives and requirements) and indicative pricing based on the dimensions we supply. We’ve refined this model over the years to make it quite quick and (relatively) easy for vendors to respond to.
Using compliance to measure suitability and indicative pricing to plug in to our long-term TCO (Total Cost of Ownership) model, the long list usually becomes a clear short list of 1-5 very quickly.
Now we can get into detailed discussions with a very small number of best-fit suppliers without having wasted much time of buyer or seller.
You may have noticed that we’ve run a series of posts about OSS/BSS procurement, and about the RFP process by association.
One of the first steps in the traditional procurement process is preparing a strategy and detailed set of requirements.
As TM Forum’s, “Time to kill the RFP? Reinventing IT procurement for the 2020s,” report describes:
“Before an RFP can be issued, the CSP’s IT or network team must produce a document detailing the strategy for implementing a technology or delivering a service, which is a lengthy process because of the number of stakeholders involved and the need to describe requirements in a way that satisfies them all.”
The problem with most requirements documents, the ones I’ve seen at least, is that they tend to get down into a deep, deep level of detail. And when it’s down in that level of detail, contrasting opinions from different stakeholders can make it really difficult to reach agreement. Have you ever been in a room with many high-value (and high cost) stakeholders spending days debating the semantics (and wording) of requirements? Every stakeholder group needs a say and needs to be heard.
The theory is that you need a great level of detail to evaluate supplier offerings for best-fit. Well, maybe, but not in the initial stages.
First things first – I seek to find out what’s really important for the organisation. That rarely comes from a detailed requirements spreadsheet, but by determining the things that are done most often and/or add the most value to the buyer’s organisation. I use persona mapping, long-tail and perhaps whale-curve mapping approaches to determine this.
Persona mapping means identifying all the groups within the buyer’s organisation that need to interact with the OSS/BSS (current and proposed). Then sitting with each group to determine what they need to achieve, who they need to interact with and what their workflows look like. That also gives a chance for all groups to be heard.
From this, we can collaboratively determine some high-level evaluation criteria, maybe only 15-20 to start with. You’d be surprised at how quickly this 15-20 criteria can help with initial supplier filtering.
Armed with the initial 15-20 evaluation criteria and the project we’re getting excited to launch on Monday, we can get to a relevant list of possible suppliers quite quickly. It allows us to do a broad market search to compile a list of suppliers, not just from the 5-10 suppliers the buyer already knows about, but from the 400+ suppliers/products available on the market. And we don’t even have to ask the suppliers to fill out any lengthy requirement response spreadsheets / forms yet.
We’ll continue the discussion over the next two days. We’ll also share our procurement methodology pack on Sunday.
There’s no doubt the current stereotypical RFP approach to procurement is broken. It needs to be done differently. That’s why we have been doing it differently with customers for years now (another hint regarding a project we’re getting excited to announce this Monday).
The TM Forum report is really powerful and well worth a read. There are a few additional (and somewhat random) thoughts that go through my head when considering the death of the RFP:
The TM Forum report is primarily coming at the problem from the perspective of a carrier that is constantly steering the development of its own systems, as implied through this quote, “The fundamental problem with the RFP process is that in a fast-paced technology environment, where cloud and software are fast becoming preferred options, it is difficult for CSPs to describe in lengthy, written documents what they want and need. The processes are simply too complex and cumbersome to support modern, Agile methods of working.”
That perspective is particularly applicable for some buyers, ones that have committed to having significant developer resources available to build exactly what they want. That could be in the form of in-house developers, contract developers, long-term panel arrangements with suppliers or similar
Others, perhaps such as utilities, enterprise and some telcos want to focus on their core business and delegate OSS/BSS configuration and customisation to third-parties.
Some of those rely on COTS (commercial off the shelf) software to leverage the benefits of innovation, cost and development time that have been spread across multiple customers. Their budgets simply don’t allow for custom-built solutions
COTS, be it on-prem through to cloud service models, are almost never going to be a perfect fit for a buyer’s needs. They’re designed to generically suit many buyers, so a certain amount of bloat becomes part of the trade-off
In recent weeks, I’ve seen two entirely in-house developed OSS/BSS. They fit their organisations like a glove and there’s almost no bloat at all. In fact it would be almost impossible for a COTS solution to replace what they’ve built. In both cases it’s taken a decade of ongoing development to get to that position. Most buyers don’t have that amount of time to get it right though unfortunately
Commercial realities imply a pragmatic approach is taken to procurement – which product/s provide default capability that best aligns with the buyer’s most important objectives.
RFPs often get bogged down at the far right-hand side of the long-tail of requirements (where impact tends to be negligible), or in trying to completely re-sculpt the solution to be the perfect fit (that it’s unlikely to ever be)
In my experience at least, the best-fit (not perfect fit) solution, or very short list of solutions, usually becomes apparent fairly quickly [we’ll share more about how we do that tomorrow]. It’s then just a case of testing objectives, assumptions and gaps (eg via a proof-of-concept) and getting to a mutually beneficial commercial agreement
As one respondent in the TM Forum report put it, “The RFP glorifies the process, not the outcome.” A healthy dose of outcome-driven pragmatism helps to reduce glorification of the RFP process
With so much fragmentation in the OSS/BSS market already (there are over 400 in our vendor directory), that means the talent pool of creators is thinly spread. Many of those 400 have duplicated functionality, which isn’t great for the industry’s overall progress. Custom development for each different buyer spreads the talent pool even further… unless buyers can get economies of development scale through shared platforms like ONAP
In summary, I love the concept of avoiding massive procurement events. I still can’t help but think the RFP still fits in there somewhere for many buyers… as long as we ensure we glorify the outcomes and de-emphasise the process. It’s just that we use RFPs like a primitive instrument and inflict blunt-force trauma, rather than using surgical precision.
Earlier this year, the TM Forum published a really insightful report called, “Time to kill the RFP? Reinventing IT procurement for the 2020s.” There are so many layers to the OSS/BSS procurement discussion and Mark Newman and team have done a fantastic job of capturing them. We’ll expand on a few of those layers in a series of posts this week.
For example, section 2 articulates the typical RFI / RFP / RFQ approach. It’s clear to see why the typical approach is flawed. Yesterday’s post pondered whether procurement events are flawed from the initial KPIs that are set by buyers. Today we’ll take a look at the process that follows.
Two quotes from the TM Forum report frame some of the challenges with RFPs from buyer and seller viewpoints respectively: QUOTE 1 (Buyer-side) – “CSPs normally distribute RFPs to a group of three to eight suppliers. These are most likely existing suppliers, previous vendors or companies the CSP is aware of through its own technology scouting. Suppliers are likely to include systems integrators who rely on other vendors to fulfill elements of the contract, and CSPs tend to invite bidders offering a range of options.
For example, they may invite a supplier that is likely to offer a good price, one that is a ‘safe’, low-risk option, and the incumbent supplier, which in many cases the CSP is looking to replace.
The document itself is likely to be several hundred pages long, a large portion of it comprising details of technology requirements, with suppliers asked to specify whether they comply with each requirement.”
The question I’d ask about this process is how does the CSP choose 3-8 out of the 400+ vendors that supply the OSS/BSS market? Does their “own technology scouting” adequately discount the hundreds of others that could potentially be best-fit for their needs?
QUOTE 2 (Seller-side) – “We were holed up in our hotel for a month working feverishly on different aspects of the bid. We had 15 people there in total, and we were asked to come in for meetings with five different teams. The meetings go on and on, and you really have no idea when they’re going to finish.”
Let’s do the sums on this situation. 15 people x 25 days x $1500 per day (a round figure that includes accommodation, meals, etc) = $562,500. That’s over half a million dollars just for the seller-side of the post-RFP evaluation phase. Now let’s say there were 4 sellers going through this. [Just a small aside here – reading between the lines, do you suspect the buyer was taking the seller on a journey into the minutiae or focusing on what will move the needle for them? Re-read that through the lens of yesterday’s contrasting KPI perspectives]
You can see exactly why Mark has proposed that it’s, “Time to kill the RFP,” at least in its traditional form. These two quotes lobby hard for the death penalty. More on that tomorrow!
Also note that another hint was contained above in the lead-up to a project launch on Monday that we’re really excited about.
You may’ve noticed that things have been a little quiet on this blog in recent weeks. We’ve been working on a big new project that we’ll be launching here on PAOSS on Monday. We can’t reveal what this project is just yet, but we can let you in on a little hint. It aims to help overcome one of the biggest problem areas faced by those in the comms network space.
Further clues will be revealed in this week’s series of posts.
The industry we work in is worth tens of billions of dollars annually. We rely on that investment to fund the OSS/BSS projects (and ops/maintenance tasks) that keeps many thousands of us busy. Obviously those funds get distributed by project sponsors in the buyers’ organisations. For many of the big projects, sponsors are obliged to involve the organisation’s procurement team.
That’s a fairly obvious path. But I often wonder whether the next step on that path is full of contradictions and flaws.
Do you agree with me that the 3 KPIs sponsors expect from their procurement teams are:
Negotiate the lowest price
Eliminate as many risks as possible
Create a contract to manage the project by
If procurement achieves these 3 things, sponsors will generally be delighted. High-fives for the buyers that screw the vendor prices right down. Seems pretty obvious right? So where’s the contradiction? Well, let’s look at these same 3 KPIs from a different perspective – a more seller-centric perspective:
I want to win the project, so I’ll set a really low price, perhaps even loss-leader. However, our company can’t survive if our projects lose money, so I’ll be actively generating variations throughout the project
Every project of this complexity has inherent risks, so if my buyer is “eliminating” risks, they’re actually just pushing risks onto me. So I’ll use any mechanisms I can to push risks back on my buyer to even the balance again
We all know that complex projects throw up unexpected situations that contracts can’t predict (except with catch-all statements that aim to push all risk onto sellers). We also both know that if we manage the project by contractual clauses and interpretations, then we’re already doomed to fail (or are already failing by the time we start to manage by contract clauses)
My 3 contrarian KPIs to request from procurement are:
Build relationships / trust – build a framework and environment that facilitates a mutually beneficial, long-lasting buyer/seller relationship (ie procurement gets judged on partnership length ahead of cost reduction)
Develop a team – build a framework and environment that allows the buyer-seller collective to overcome risks and issues (ie mutual risk mitigation rather than independent risk deflection)
Establish clear and shared objectives – ensure both parties are completely clear on how the project will make the buyer’s organisation successful. Then both constantly evolve to deliver benefits that outweigh costs (ie focus on the objectives rather than clauses – don’t sweat the small stuff (or purely technical stuff))
Yes, I know they’re idealistic and probably unrealistic. Just saying that the current KPI model tends to introduce flaws from the outset.
“From watching ESPN, I’d learned about the power of information bombardment. ESPN strafes its viewers with an almost hysterical amount of data and details. Scrolling boxes. Panels. Bars. Graphics. Multi-angle camera perspectives. When exposed to a surfeit of data, men tend to feel more masculine and in command. Do most men bother to decipher these boxes, panels, bars and graphics? No – but that’s not really the point.”
Martin Lindstrom, in his book, “Small Data.”
I’ve just finished reading Small Data, a fascinating book that espouses forensic analysis of the lives of users (ie small data) rather than using big data methods to identify market opportunities. I like the idea of applying both approaches to our OSS products. After all, we need to make them more intuitive, endearing and ultimately, effective.
The quote above struck a chord in particular. Our OSS GUIs (user interfaces) can tend towards the ESPN model can’t they? The following paraphrasing doesn’t seem completely at odds with most of the OSS that we interact with – “[the OSS] strafes its viewers with an almost hysterical amount of data and details.”
And if what Lindstrom says is an accurate psychological analysis, does it mean:
The OSS GUIs we’re designing help make their developers “feel more masculine and in command” or
Our OSS operators “feel more masculine and in command” or
Intriguingly, does the feeling of being more masculine and in command actually help or hinder their effectiveness?
I find it fascinating that:
Our OSS/BSS form a multi billion dollar industry
Our OSS/BSS are the beating heart of the telecoms industry, being wholly responsible for operationalising the network assets that so much capital is invested in
So little effort is invested in making the human to OSS interface far more effective than they are today
I keep hearing operators bemoan the complexities and challenges of wrangling their OSS, yet only hear “more functionality” being mentioned by vendors, never “better usability”
Maybe the last point comes from me being something of a rarity. Almost every one of the thousands of people I know in OSS either works for the vendor/supplier or the customer/operator. Conversely, I’ve represented both sides of the fence and often even sit in the middle acting as a conduit between buyers and sellers. Or am I just being a bit precious? Do you also spot the incongruence of point D on a regular basis?
Whether you’re buy-side or sell-side, would you love to make your OSS more effective? Let us know and we can discuss some of the optimisation techniques that might work for you.
Seems this post from last week has triggered some really interesting debate – Is your service assurance really service assurance?? (Part 5). It was a post that looked into collecting end-to-end service metrics rather than our traditional method of collecting network device events/metrics and trying to reverse-engineer to form a service-level perspective.
Thought I’d give you an update. I’m thinking along the following lines, but admit that I don’t have it all worked out by any means yet:
We need to concept of span like OpenTelemetry does between microservices (in a way, it’s like nearest-neighbour of where each packet is getting pushed).
Note that for us a span is on a service-by-service basis between nodes, not just a network link-by-link basis between nodes
We need to be able to measure the real-time metrics of the performance of each span as well as any events/faults impacting them
One challenge (one of probably many) is how to avoid flooding the data/management planes. Possibly a telemetry beacon at each node that’s aggregating performance/events of each packet passed for each service?? But what aggregation-window / cache-size to use? Still too impossibly huge to process except with ridiculously low sampling rates??
By chaining the spans we get a real-time, end-to-end trace of services and the performance (and real-time snapshot of service-by-service resource usage in a packet-switched network)
How to efficiently get the beacon data to a centralised logging/management point? Send beacons via management plane? Send via data plane? Take an approach similar to Netflow / IPFIX-style protocols?
How to store data for a short period (ie for real-time analysis/reporting) as well as for long periods. Due to volumes, we’d have to apply aging policies to the data, but it would still be valuable for the purpose of mid and long-term SLA, network health, optimisation, capacity management, etc
As you can see, there are still so many wide-open questions about the feasibility of the concept. But getting feedback from multiple very clever people who read this blog is definitely helping! Thank you!!
I also just stumbled upon OpenTelemetry, an open source project designed to capture traces / metrics / logs from apps / microservices. It intrigued me because just as you have the concept of traces / metrics / logs for apps, you similarly have traces / metrics / logs for networks.
In the network world, we’re good at getting metrics / logs / events, but not very good at getting trace data (ie end-to-end service chains) as described earlier in this blog series. And if we can’t monitor traces, we can’t easily interpret a customer’s experience whilst they’re using their network service. We currently do “service assurance” by reverse-engineering logs / events, which seems a bit backward to me.
Take a closer look at the OpenTelemetry link above, which provides an overview of how their team is going to gather application telemetry. With increasing software-ification of our networks (eg SDN / NFV) and the use of microservices / NaaS / APIs in our management stacks, could this actually be our path to the holy grail of service assurance (ie capturing trace data – network service telemetry)?? Is it data plane? Is it control / management plane? Is it something in between?
Note: The “active measurements” approach described in part 3 is slightly compromised in current form, which is why I’m so intrigued by the potential of extending the concepts of OpenTelemetry into our software / virtual networks.
I’d really love your take on this one because I’m sure there are many elements to this that I haven’t thought through yet. Please leave your thoughts on the viability of the approach.
Below are three insightful tables from the Netrounds white paper:
Table 1 looks at the typical components (systems) that service assurance is comprised of. But more interestingly, it looks at the types of questions / challenges each traditional system is designed to resolve. You’ll have noticed that none of them directly answer any service quality questions (except perhaps inventory systems, which can be prone to having sketchy associations between services and the resources they utilise).
Table 2 takes a more data-centric approach. This becomes important when we look at the big picture here – ensuring reliable and effective delivery of customer services. Infrastructure failures are a fact of life, so improved service assurance models of the future will depend on automated and predictive methods… which rely on algorithms that need data. Again, we notice an absence of service-related data sets here (apart from Inventory again). You can see the constraints of the traditional data collection approach can’t you?
Table 3 instead looks at the goals of an ideal service-centric assurance solution. The traditional systems / data are convenient but clearly don’t align well to those goals. They’re constrained by what has been presented in tables 1 and 2. Even the highly touted panaceas of AI and ML are likely to struggle against those constraints.
What if we instead start with Table 3’s assurance of customer services in mind and work our way back? Or even more precisely, what if we start with an objective of perfect availability and performance of every customer service?
That might imply self-healing (automated resolution) and resolution prior to failure (prediction) as well as resilience. But let’s first give our algorithms (and dare I say it, AI/ML techniques) a better chance of success.
Working back – What must the data look like? What should the systems look like? What questions should these new systems be answering?
I just came across an interesting white paper from the Netrounds team titled, “Reimagining Service Assurance in the Digital Service Provider Era.” You can find a copy here. It’s well worth a read, so much so that I’ll unpack a few of the concepts it contains in a series of articles this week.
It rightly points out that, “Alarms and fault management are what most people think of when hearing the term service assurance. Classical service assurance systems do fall into this category, as they collect indicators from network devices (such as traps, syslog messages and telemetry data) and try to pinpoint faulty devices and interfaces that need fixing.”
This takes us into the rabbit-hole of what exactly is a service (a rabbit-hole that this article partly covers). But let’s put that aside for a moment and consider a service as being an end-to-end “thing” that a customer uses (and pays for, and therefore assumes will behave as “they” expect).
To borrow again from Netrounds, “… we must be able to measure and report on service KPIs in order to accurately measure network service quality from the end user, or customer, perspective. The KPIs should correspond to the service that the customer is paying for. For example, internet access services should measure network KPIs like loss, latency, jitter, and DNS and HTTP response times; a storage backup service should measure data throughput rate; IPTV should measure video frame loss, video buffer underrun events and channel zapping time; and VoIP should measure Mean Opinion Score (MOS).”
There’s just one problem with traditional assurance measuring techniques (eg traps, syslog messages). They are only an indirect proxy for the customer’s experience (and expectations) with the service they’re paying for. Traditional techniques just report on the links in the chain rather than the integrity of the entire length of chain. We have to look at each broken link and attempt to determine whether the chain’s integrity is actually impaired (considering the “meshing” that protects modern service chains). And if there is impairment, to then determine whose chain is impacted, in what way, and what priority needs to be given to its repair.
If we’re being completely honest, the customer doesn’t care about the chain links, or even their MOS score, only that they couldn’t understand what the person at the other end of the VoIP line was trying to communicate with them.
Exacerbating this further, with increasing dependency on cloud and virtualised resources means that there are more chain links that fall outside our domain of visibility.
So, this thing that we’ve called service assurance for the last few decades might actually be a misnomer. We’ve definitely been monitoring the health of network devices and infrastructure (the links), but we tend to only be able to manage services (the chain) through reverse-engineering – by inference, brute force and wizardry.
Is there another way? Let’s dig further in tomorrow’s post.
The diagram below attempts to demonstrate the concept visually, in the form of three important sliders.
When it comes to the technical delivery, it makes sense that most of the responsibility falls upon the supplier. They obviously have the greater know-how from building and implementing their own products. However, and despite what some clients expect, you’ll notice that the slider isn’t all the way to the left though. The client can’t just “throw the hand grenade over the fence” and expect the supplier to just build the solution in isolation. The client needs to be involved to ensure the solution is configured to their unique requirements. This covers factors such as network types, service types, process models, naming conventions, personas supported, integrations, approvals, etc.
Unfortunately, organisational change is an afterthought far too often on OSS projects. Not only that, but the client often expects the supplier to handle that too. They expect the slider to fall far to the left too. In my opinion, this is completely unrealistic. In most cases, the supplier simply doesn’t have the knowledge of, or influence over, the individuals within the client’s organisation. That’s why the middle slider falls mostly towards the right-hand (client) side. Not all the way though because the supplier will have suggestions / input / training based on learnings from past implementations. BTW. The link above also describes an important perspective shift to help the org change aspect of OSS transformation.
And lastly, the success of a project relies on strength of relationship throughout, but also far beyond, the initial implementation. You’d expect that most OSS implementations will have a useful life of many years. Due to the complexity of OSS transformations, clients want to stay with the same supplier for long periods because they don’t want to endure a change-out. Like any relationships, trust plays an important role. The relationship clearly has to be beneficial to both parties. Unfortunately, three factors often doom OSS relationships from the outset.
Firstly, the sliders above show my unbiased perspective of the weight of responsibility on a generic OSS project. If each party has a vastly different expectation of slider positioning, then the project can be off to a difficult (but all-too-common) start.
Secondly, the nature of vendor selection process can also gnaw away at trust quite quickly. The client wants an as-low-as-possible cost in the contract (obviously). The supplier wants to win the bid, so they keep costs as low as possible, often hoping to make up the difference through the inevitable variations that happen on these complex projects.
And thirdly, the complexity of these projects means challenges almost always arise and can cause cynicism being hurled across the fence by both parties.
You may be wondering why the third slider isn’t perfectly centred between both. You may claim that significant responsibility for humility, fairness and forgiveness lies with each participant to ensure a long-lasting, trusted relationship. I’d agree with you on that, but I’d also argue that the supplier carries slightly more responsibility as they (usually) hold a slight balance in power. They know the client doesn’t want to endure another OSS change-out project any time soon, so the client generally has more to lose from a relationship breakdown. Unfortunately, I’ve seen this leveraged by vendors too many times.
Do you agree/disagree with these observations? I’d love to hear your thoughts.
Oh, and if you’re ever need an independent third-party to help set the right balance of expectations across these sliders on your project, you’re welcome to call upon Passionate About OSS to assist.
I was speaking with a friend today about an old OSS assurance product that is undergoing a refresh and investment after years of stagnation.
He indicated that it was to come with about 20 out of the box adaptors for data collection. I found that interesting because it was replacing a product that probably had in excess of 100 adaptors. Seemed like a major backward step… until my friend pointed out the types of adaptor in this new product iteration – Splunk, AWS, etc.
Our OSS no longer collect data directly from the network anymore. We have web-scaled processes sucking everything out of the network / EMS, aggregating it and transforming / indexing / storing it. Then, like any other IT application, our OSS just collect what we need from a data set that has already been consolidated and homogenised.
I don’t know why I’d never thought about it like this before (ie building an architecture that doesn’t even consider connecting to the the multitude of network / device / EMS types). In doing so, we lose the direct connection to the source, but we also reduce our integration tax load (directly to the OSS at least).
This is the third part of a series describing a really exciting analysis I’ve just finished.
Part 1 described how we can turn simple log files into a Sankey diagram that shows real-life process flows (not just a theoretical diagram drawn by BAs and SMEs), like below:
Part 2 described how the logs are broken down into a design tree and how we can assign weightings to each branch based on the data stored in the logs, as below:
I’ve already had lots of great feedback in relation to the Part 1 blog, especially from people who’ve had challenges capturing as-is process. The feedback has been greatly appreciated so I’m looking forward to helping them draw up their flow-charts on the way to helping optimise their process flows.
But that’s just the starting point. Today’s post is where things get really exciting (for me at least). Today we build on part 2 and not just record weightings, but use them to assist future decisions.
We can use the decision tree to “predict forward” and help operators / algorithms make optimal decisions whilst working towards process completion. We can use a feedback loop to steer an operator (or application) down the most optimal branches of the tree (and/or avoid the fall-out variants).
This allows us to create a closed-loop, self-optimising, Decision Support System (DSS), as follows:
Using log data alone, we can perform decision optimisation based on “likelihood of success” or “time to complete” as per the weightings table. If supplemented with additional data, the weightings table could also allow decisions to be optimised by “cost to complete” or many other factors.
The model has the potential to be used in “real-time” mode, using the constant stream of process logs to continually refine and adapt. For example:
If the long-term average of a process path is 1 minute, but there’s currently a problem with and that path is failing, then another path (one that is otherwise slightly less optimised over the long-term), could be used until the first path is repaired
An operator happens to choose a new, more optimal path than has ever been identified previously (the delta function in the diagram). It then sets a new benchmark and informs the new approach via the DSS (Darwinian selection)
If you’re wondering how the DSS could be implemented, I can envisage a few ways:
Using existing RPA (Robotic Process Automation) tools [which are particularly relevant if the workflow box in the diagram above crosses multiple different applications (not just a single monolithic OSS/BSS)]
Providing a feedback path into the functionality of the OSS/BSS and it’s GUI
Via notifications (eg email, Slack, etc) to operators
Via a simple, more manual process like flow diagrams, work instructions, scorecards or similar
This visualisation is exciting because it shows how your processes are actually flowing (or not), as opposed to the theoretical process diagrams that are laboriously created by BAs in conjunction with SMEs. It also shows which branches in the flow are actually being utilised and where inefficiencies are appearing (and are therefore optimisation targets).
Some people have wondered how simple activity logs can be used to show the Sankey diagrams. Hopefully the diagram below helps to describe this. You scan the log data looking for variants / patterns of flows and overlay those onto a map of decision states (DPs). In the diagram above, there are only 3 DPs, but 303 different variants (sounds implausible, but there are many variants that do multiple loops through the 3 states and are therefore considered to be a different variant).
The numbers / weightings you see on the Sankey diagram are the number* of instances (of a single flow type) that have transitioned between two DPs / states.
* Note that this is not the same as the count value that appears in the Weightings table. We’ll get to that in tomorrow’s post when we describe how to use the weightings data for decision support.
In your travels, I don’t suppose you’ve ever come across anyone having challenges to capture and/or optimise their as-is OSS/BSS process flows? Once or twice?? 🙂
Well I’ve just completed an analysis that I’m really excited about. It’s something I’ve been thinking about for some time, but have just finished proving on the weekend. I thought it might have relevance to you too. It quickly helps to visualise as-is process and identify areas to optimise.
The method takes activity logs (eg from OSS, ITIL, WFM, SAP or similar) and turns them into a process diagram (a Sankey diagram) like below with real instance volumes. Much better than a theoretical process map designed by BAs and SMEs don’t you think?? And much faster and more accurate too!!
A theoretical process map might just show a sequence of 3 steps, but the diagram above has used actual logs to show what’s really occurring. It highlights governance issues (skipped steps) and inefficiencies (ie the various loops) in the process too. Perfect for process improvement.
But more excitingly, it proves a path towards real-time “predict-forward” decision support without having to get into the complexities of AI. More has been included in the analysis!
If this is of interest to you, let me know and I’ll be happy to walk you through the full analysis. Or if you want to know how your real as-is processes perform, I’d be happy to help turn your logs into visuals like the one above.
PS1. You might think you need a lot of fields to prepare the diagrams above. The good news is the only mandatory fields would be something like:
Flow type – eg Order type, project type or similar (only required if the extract contains multiple flow types mixed together. The diagram above represents just one flow type)
Flow instance identifier – eg Order number, project number or similar (the diagram above was based on data that had around 600,000 flow instances)
Activity identifier – eg Activity name (as per the 3 states in the diagram above), recorded against each flow instance. Note that they will ideally be an enumerated list (ie from a finite pick-list)
Timestamps – Start/end timestamp on each activity instance
If the log contains other details such as the name of the operator who completed each activity, that can help add richness, but not mandatory.
PS2. The main objective of the analysis was to test concepts raised in the following blog posts: