Only do the OSS that only you can do

A friend of mine has a great saying, “only do what only you can do.”

Do you think that this holds true for the companies undergoing digital transformation? Banks are now IT companies. Insurers are IT companies. Car manufacturers are now IT companies. Telcos are, well, some are IT companies.

We’ve spoken before about the skill transformations that need to happen within telcos if they’re to become IT companies. Some are actively helping their workforce to become more developer-centric. Some of the big telcos that I’ve been assisting in the last few years are embarking on bold Agile-led IT transformations. They’re cutting more of their own code and managing their own IT developments.

That’s exciting news for all of us in OSS. Even if it loses the name OSS in future, telcos will still need software that efficiently operationalises their networks. We have the overlapping skills in software, networks, business and operations.

But I wonder about the longevity of the in-house approach unless we come focus clearly on the first quote above. If all development is brought in-house, we end up with a lot of duplication across the industry. I’m not really sure that it makes sense doing all the heavy-lifting of all custom OSS tools when the heavy-lifting has already been done elsewhere.

It’s the old ebb and flow between in-house and outsourced OSS.

In my very humble opinion, it’s not just a choice between in-house and outsourced that matters. The more important decisions are around choosing to only develop the tools in-house that only you can do (ie the strategic differentiators).

A single glass of pain or single pane of glass??

Is your OSS a single pane of glass, or a single glass of pain?

You can tell I’m being a little flippant here. People often (perhaps idealistically) talk about OSS as being the single pane of glass (SPOG) to manage a network.

I say “idealistically” for a couple of reasons:

  1. There are usually many personas who interact with an OSS, each with vastly different user interface (UI) needs
  2. There is usually more than one OSS product in a client’s OSS suite, often from different vendors, with varying levels of integration

Where a single pane of glass can be a true ambition is as a consolidated health-status dashboard / portal, Invariably, this portal is used by executive / leader / manager personas who want to quickly see a single-screen health status that covers all networks and/or parts of the OSS suite. When things go wrong, this portal becomes the single glass of pain.

These single panes tend to be heavily customised for each organisation as every one has a unique set of metrics-that-matter. For those designing these panes, the key is to not just include vanity metrics, but to show information that the leader can action.

But the interesting perspective here is whether the single glass of pain is even relevant within your organisation’s culture. It’s just my opinion, but I prefer for coal-face workers to be empowered to make rapid recovery actions rather than requiring direction from up high in the org-chart. Coal-face workers generally have different tools with UIs that *should* help them monitor, manage and repair super-efficiently.

To get back to the “idealistic” comment above, each OSS UI needs to be fit-for-purpose for each unique persona (eg designers, product owners, network operations, etc). To me this implies that there is no single pane of glass…

I should caveat that by citing the example of an OSS search interface, something I’ve yet to see in OSS… although that’s just a front end to dozens of persona-specific panes of glass.

An OSS without the shackles of topology

It’s been nearly two decades since I designed my first root-cause analysis (RCA) rule. It was completely reliant on network topology – more specifically, it relied on a network hierarchy to determine which alarms could be suppressed.

I had a really interesting discussion today with some colleagues who are using much more modern RCA techniques. I was somewhat surprised, but not surprised at all in hindsight, that their Machine Learning engine doesn’t even use topology data. It just looks at events and tries to identify patterns.

That’s a really interesting insight that hadn’t dawned on me before. But it’s an exciting one because it effectively unshackles our fault management tools from data quality perfection in our inventory / asset databases. It also possibly lessens the need for integrations that share topological data.

Equally interesting, the ML engine had identified over 4,000 patterns, but only a dozen had been codified and put into use so far. In other words, the machine was learning, but humans still needed to get involved in the process to confirm that the machine had learned correctly.

Makes me wonder whether the ML pre-seeding technique we discussed in an earlier post might actually be useful for confirmations at a greater scale than the team had achieved with 12 of 4000+ to date.

The standard approach is to let the ML loose and identify patterns. This is the reactive approach. The ML reacts to the alarms that are pushed up from the network. It looks at alarms and determines what the root cause is based on historical data. A human then has to check that the root cause is correct by reverse engineering the alarm stream (just like a network operator used to do before RCA tools came along) and comparing. If the comparison is successful, the person then approves this pattern.

My proposed alternate approach is the proactive method. If we proactively trigger a fault (e.g. pull a patch lead, take a port down, etc), we start from a position of already knowing what the root cause is. This has three benefits:
1. We can check if the ML’s interpretation of root cause is right
2. We’ve proactively seeded the ML’s data with this root cause example
3. We categorically know what the root cause is, unlike the reactive mode which only assumes the operator has correctly diagnosed the root cause

Then we just have to figure out a whole bunch of proactive failures to test safely. Where to start? Well, I’d speak with the NOC operators to find out what their most common root causes are and look to trigger those events.

More tomorrow on intentionally triggering failures in production systems.

Mythical OSS beasts – feature removal releases

Life can be improved by adding, or by subtracting. The world pushes us to add, because that benefits them. But the secret is to focus on subtracting…

No amount of adding will get me where I want to be. The adding mindset is deeply ingrained. It’s easy to think I need something else. It’s hard to look instead at what to remove.

The least successful people I know run in conflicting directions, drawn to distractions, say yes to almost everything, and are chained to emotional obstacles.

The most successful people I know have a narrow focus, protect against time-wasters, say no to almost everything, and have let go of old limiting beliefs.”
Derek Sivers, here.

I’m really curious here. Have you ever heard of an OSS product team removing a feature? Nope?? Me either!

I’ve seen products re-factored, resulting in changes to features. I’ve also seen products obsoleted and their replacements not offer all of the same features. But what about a version upgrade to an existing OSS product that has features subtracted? That never happens does it?? The adding mindset is deeply ingrained.

So let’s say we do want to go on a subtraction drive and remove some of the clutter from our OSS. I know plenty of OSS GUIs where subtraction is desperately needed BTW! But how do we know what to remove?

I have no data to back this up, but I would guess that almost every OSS would have certain functions that are not used, by any of their customers, in a whole year. That functionality was probably built for a specific use-case for a specific customer that no longer has relevance. Perhaps for a service type that is no longer desired by the market or a network type that will never be used again.

Question is, does your OSS have profiling instrumentation that allows you to measure what functionality is and isn’t used across your whole client base?

Can your products team readily produce a usage profile graph like the following that shows a list of functions (x-axis) by the number of times each function is used (y-axis) in a given time window? Per client? Across all clients?
Long-tail of OSS functionality use

Leave us a comment below if you’ve ever seen this type of profiling instrumentation (not for code optimisation, but for identifying client utilisation levels) and/or systematic feature subtraction initiatives.

BTW. I should make the distinction that just because a function hasn’t been used in a while, doesn’t mean it should automatically be removed. Some functionality (eg data loaders) might be rarely used, but important to retain.

The use of drones by OSS

The last few days have been all about organisational structuring to support OSS and digital transformations. Today we take a different tack – a more technical diversion – onto how drones might be relevant to the field of OSS.

A friend recently asked for help to look into the use of drones in his archaeological business. This got me to thinking about how they might apply in cross-over with OSS.

I know they’re already used to perform really accurate 3D cable route / corridor surveying. Much cooler than the old surveyor diagrams on A1 sheets from the old days. Apparently experts in the field can even tell if there’s rock in the surveyed area by looking at the vegetation patterns, heat and LIDAR scans.

But my main area of interest is in the physical inventory. With accurate geo-tagging available on drones and the ability to GPS correct the data, it seems like a really useful technique for getting outside plant (OSP) data into OSS inventory systems. Or geo-correcting data for brownfields assets.
Drone-based cable corridor surveys
Have you heard of drone-based OSP asset identification and mapping data being fed into inventory systems yet? I haven’t, but it seems like the logical next step. Do you know anyone who has started to dabble in this type of work? If you do, please send me a note as I’d love to be introduced.

Once loaded into the inventory system, with 3d geo-location, we then have the ability to visualise the OSP data with augmented reality solutions.

And other applications for drone technology?

Speeding up your OSS transition from PoC to PROD

In yesterday’s article, we discussed 7 models for achieving startup-like efficiency on large OSS transformations.

One popular approach is to build a proof-of-concept or sandpit quickly on cloud hosting or in lab environments. It’s fast for a number of reasons including reduced number of approvals, faster activation of infrastructure, reduced safety checks (eg security, privacy, etc), minimised integration with legacy systems and many other reasons. The cloud hosting business model is thriving for all of these reasons.

However, it’s one thing to speed up development of an OSS PoC and another entirely to speed up deployment to a PROD environment. As soon as you wish to absorb the PoC-proven solution back into PROD, all the items listed above (eg security sign-offs) come back into play. Something that took days/weeks to stand up in PoC now takes months to productionise.

Have you noticed that the safety checks currently being used were often defined for the old world? They often aren’t designed with transition from cloud to PROD in mind. Similarly, the culture of design cross-checks and approvals can also be re-framed (especially when the end-to-end solution crosses multiple different business units). Lastly, and way outside my locus of competence, is in re-visiting security / privacy / deployment / etc models to facilitate easier transition.

One consideration to make is just how much absorption is required. For example, there are examples of services being delivered to the large entity’s subscribers by a smaller, external entity. The large entity then just “clips-the-ticket,” gaining a revenue stream with limited involvement. But the more common (and much more challenging) absorption model is for the partner to fold the solution back into the large entity’s full OSS/BSS stack.

So let’s consider your opportunity in terms of the absorption continuum that ranges between:

clip-the-ticket (minimally absorbed) <-----------|-----------> folded-in (fully absorbed)

Perhaps it’s feasible for your opportunity to fit somewhere in between (partially absorbed)? Perhaps part of that answer resides in the cloud model you decide to use (public, private, hybrid, cloud-managed private cloud) as well as the partnership model?

Modularity and reduced complexity (eg integrations) are also a factor to consider (as always).

I haven’t seen an ideal response to the absorption challenge yet, but I believe the solution lies in re-framing corporate culture and technology stacks. We’ll look at that in more detail tomorrow.

How about you? Have you or your organisation managed to speed up your transition from PoC to PROD? What techniques have you found to be successful?

How to bring your art and your science to your OSS

In the last two posts, we’ve discussed repeatability within the field of OSS implementation – paint-by-numbers vs artisans and then resilience vs precision in delivery practices.

Now I’d like you to have a think about how those posts overlay onto this quote by Karl Popper:
Non-reproducible single occurrences are of no significance to science.”

Every OSS implementation is different. That means that every one is a non-reproducible single occurrence. But if we bring this mindset into our OSS implementations, it means we’re bringing artisinal rather than scientific method to the project.

I’m all for bringing more art, more creativity, more resilience into our OSS projects.

I’m also advocating more science though too. More repeatability. More precision. Whilst every OSS project may be different at a macro level, there are a lot of similarities in the micro-elements. There tends to be similarities in sequences of activities if you pay close attention to the rhythms of your projects. Perhaps our products can use techniques to spot and leverage similarities too.

In other words, bring your art and your science to your OSS. Please leave a comment below. I’d love to hear the techniques you use to achieve this.

All OSS products are excellent. So where’s the advantage?

“You don’t get differential advantage from your products, it’s from the way you speak to and relate to your customers . All products are excellent these days.”

The quote above paraphrases Malcolm McDonald from a podcast about his book, “Malcolm McDonald on Value Propositions: How to Develop Them, How to Quantify Them.”

This quote had nothing to do with OSS specifically, but consider for a moment how it relates to OSS.

Consider also in relation to the diagram below.
Long-tail features

Let’s say the x-axis on this graph shows a list of features within any given OSS product. And the y-axis shows a KPI that measures the importance of each feature (eg number of uses, value added by using that feature, etc).

As Professor McDonald indicates, all OSS products are excellent these days. And all product vendors know what the most important features are. As a result, they all know they must offer the features that appear on the left-side of the image. Since all vendors do the left-side, it seems logical to differentiate by adding features to the far-right of the image, right?

Well actually, there’s almost no differential advantage at the far-right of the image.

Now what if we consider the second part of Prof McDonald’s statement on differential advantage, “…it’s from the way you speak to and relate to your customers.”

To me it implies that the differential advantage in OSS is not in the products, but in the service wrapper that is placed around it. You might be saying, “but we’re an OSS product company. We don’t want to get into services.” As described in this earlier post, there are two layers of OSS services.

One of the layers mentioned is product-related services (eg product installation, product configuration, product release management, license management, product training, data migration, product efficiency / optimisation, etc). None of these items would appear as features on the long-tail diagram above. Perhaps as a result, it’s these items that are often less than excellent in vendor offerings. It’s often in these items where complexity, time, cost and risk are added to an OSS project, increasing stress for clients.

If Prof McDonald is correct and all OSS products are excellent, then perhaps it’s in the services wrapper where the true differential advantage is waiting to be unlocked. This will come from spending more time relating to customers than cutting more code.

What if we take it a step further? What if we seek to better understand our clients’ differential advantages in their markets? Perhaps this is where we will unlock an entirely different set of features that will introduce new bands on the left-side of the image. I still feel that amazing OSS/BSS can give carriers significant competitive advantage in their marketplace. And the converse can give significant competitive disadvantage!

Are you desperately seeking to increase your OSS‘s differential advantage? Contact us at Passionate About OSS to help map out a way.

Can OSS/BSS assist CX? We’re barely touching the surface

Have you ever experienced an epic customer experience (CX) fail when dealing a network service operator, like the one I described yesterday?

In that example, the OSS/BSS, and possibly the associated people / process, had a direct impact on poor customer experience. Admittedly, that 7 truck-roll experience was a number of years ago now.

We have fewer excuses these days. Smart phones and network connected devices allow us to get OSS/BSS data into the field in ways we previously couldn’t. There’s no need for printed job lists, design packs and the like. Our OSS/BSS can leverage these connected devices to give far better decision intelligence in real time.

If we look to the logistics industry, we can see how parcel tracking technologies help to automatically provide status / progress to parcel recipients. We can see how recipients can also modify their availability, which automatically adjusts logistics delivery sequencing / scheduling.

This has multiple benefits for the logistics company:

  • It increases first time delivery rates
  • Improves the ability to automatically notify customers (eg email, SMS, chatbots)
  • Decreases customer enquiries / complaints
  • Decreases the amount of time the truck drivers need to spend communicating back to base and with clients
  • But most importantly, it improves the customer experience

Logistics is an interesting challenge for our OSS/BSS due to the sheer volume of customer interaction events handled each day.

But it’s another area that excites me even more, where CX is improved through improved data quality:

  • It’s the ability for field workers to interact with OSS/BSS data in real-time
  • To see the design packs
  • To compare with field situations
  • To update the data where there is inconsistency.

Even more excitingly, to introduce augmented reality to assist with decision intelligence for field work crews:

  • To provide an overlay of what fibres need to be spliced together
  • To show exactly which port a patch-lead needs to connect to
  • To show where an underground cable route goes
  • To show where a cable runs through trayway in a data centre
  • etc, etc

We’re barely touching the surface of how our OSS/BSS can assist with CX.

Waiting for the disaster to invest in the data

Have you seen OSS tools where the applications are brilliant but consigned to failure by bad data? I definitely have! I call it the data death spiral. It’s a well known fact in the industry that bad data can ruin an OSS. You know it. I know it. Everyone knows it.

But how many companies do you know that invest in data quality? I mean truly invest in it.

The status quo is not to invest in the data, but the disaster. That is the disaster caused by the data!

Being a data nerd, it boggles my brain to understand why that is. My only assumption to date is that we don’t adequately measure the cost of quality. Or more to the point, what the cost impact is resulting from bad data.

I recently attempted to model the cost of quality. My model focuses on the ripple-out impacts from poor PNI (Physical Network Inventory) quality data alone. Using conservative numbers, the cost of quality is in the millions for the first carrier I applied it to.

Why do you think operators wait for the disaster before investing in the data? What alternate techniques do you use to focus attention, and investment, on the data?

Where an absence of OSS data can still provide insights

The diagram below has some parallels with OSS. The story however is a little long before it gets to the OSS part, so please bear with me.

null

The diagram shows analysis the US Navy performed during WWII on where planes were being shot. The theory was that they should be reinforcing the areas that received the most hits (ie the wing-tips, central part of the body, etc as per the diagram).

Abraham Wald, a statistician, had a completely different perspective. His rationale was that the hits would be more uniform and the blank sections on the diagram above represent the areas that needed reinforcement. Why? Well, the planes that were hit there never made it home for analysis.

In OSS, this is akin to the device or EMS that has failed and is unable to send any telemetry data. No alarms are appearing in our alarm lists for those devices / EMS because they’re no longer capable of sending any.

That’s why we use heartbeat mechanisms to confirm a system is alive (and capable of sending alarms). These come in the form of pollers, pingers or just watching for other signs of life such as network traffic of any form.

In what other areas of OSS can an absence of data demonstrate an insight?

The Rolls Royce vision of OSS

Yesterday’s post mentioned the importance of setting a future vision as part of your MVP delivery strategy.

As Steve Blank said here, Founders act like the “minimum” part is the goal. Or worse, that every potential customer should want it. In the real world not every customer is going to get overly excited about your minimum feature set…You’re selling the vision and delivering the minimum feature set to visionaries not everyone.”

Yesterday’s post promised to give you an example of an exciting vision. Not just any vision, the Rolls-Royce version of a vision.

We’ve all seen examples of customers wanting a Rolls-Royce OSS solution. Here’s a video that’s as close as possible to Rolls-Royce’s own vision of an OSS solution.

The OSS Minimum Feature Set is Not The Goal

This minimum feature set (sometimes called the “minimum viable product”) causes lots of confusion. Founders act like the “minimum” part is the goal. Or worse, that every potential customer should want it. In the real world not every customer is going to get overly excited about your minimum feature set. Only a special subset of customers will and what gets them breathing heavy is the long-term vision for your product.

The reality is that the minimum feature set is 1) a tactic to reduce wasted engineering hours (code left on the floor) and 2) to get the product in the hands of early visionary customers as soon as possible.

You’re selling the vision and delivering the minimum feature set to visionaries not everyone.”
Steve Blank here.

A recent blog series discussed the use of pilots as an OSS transformation and augmentation change agent.
I have the need for OSS speed
Re-framing an OSS replacement strategy
OSS transformation is hard. What can we learn from open source?

Note that you can replace the term pilot in these posts with MVP – Minimum Viable Product.

The attraction in getting an MVP / pilot version of your OSS into the hands of users is familiarity and momentum. The solution becomes more tangible and therefore needs less documentation (eg architecture, designs, requirement gathering, etc) to describe foreign concepts to customers. The downside of the MVP / pilot is that not every customer will “get overly excited about your minimum feature set.”

As Steve says, “Only a special subset of customers will and what gets them breathing heavy is the long-term vision for your product.” The challenge for all of us in OSS is articulating the long-term vision and making it compelling…. and not just leaving the product in its pilot state (we’ve all seen this happen haven’t we?)

We’ll provide an example of a long-term vision tomorrow.

PS. I should also highlight that the maximum feature set also isn’t the goal either.

The layers of ITIL redundancy

Today’s is something of a heretical post, especially for the believers in ITIL. In the world of OSS, we look to build in layers of resiliency and not layers of redundancy.

The following diagram and subsequent text in italics describes a typical ITIL process and is all taken from https://www.computereconomics.com/article.cfm?id=1074

Example of relationship between ITIL incidents, problems, and changes

The sequence of events as shown in Figure 1 is as follows:

  • At TIME = 0, an External Event is detected by the Incident Management process. This could be as simple as a customer calling to say that service is unavailable or it could be an automated alert from a system monitoring device.The incident owner logs and classifies this as incident i2. Then, the incident owner tries to match i2 to known errors, work-arounds, or temporary fixes, but cannot find a match in the database.
  • At TIME = 1, the incident owner dispatches a problem request to the Problem Management process anticipating a work-around, temporary fix, or other assistance. In doing so, the incident owner has prompted the creation of Problem p2.
  • At TIME = 2, the problem owner of p2 returns the expected temporary fix to the incident owner of i2.  Note that both i2 and p2 are active and exist simultaneously. The incident owner for i2 applies the temporary fix.
  • In this case, the work-around requires a change request.  So, at Time = 3, the incident owner for i2 initiates change request, c2.
  • The change request c2 is applied successfully, and at TIME = 4, c2 is closed. Note that for a while i2, p2 and c2 all exist simultaneously.
  • Because c2 was successful, the incident owner for i2 can now confirm that the incident is resolved. At TIME = 5, i2 is closed. However, p2 remains active while the problem owner searches for a permanent fix. The problem owner for p2 would be responsible for implementing the permanent fix and initiating any necessary change requests.

But I look at it slightly differently. At their root, why do Incident Management, Problem Management and Change Management exist? They’re all just mechanisms for resolving a system* health issue. If we detect an event/s and fix it, we don’t have to expend all the effort of flicking tickets around.

Thinking within the T2R paradigm of trouble -> incident -> problem -> change -> resolve holds us back. If we can skip the middle steps and immediately associate a resolution with the event/s, we get a whole lot more efficient. If we can immediately relate a trigger with a reaction, we can also get rid of the intermediate ticket flickers and the processing cycle time.

So, the NOC of the future surely requires us to build a trigger -> reaction recommendation engine (and body of knowledge). That’s a more powerful tool to supply to our NOC operators than incidents, problems and change requests. (Easier to write about than to actually solve though of course)

OSS transformation is hard. What can we learn from open source?

Have you noticed an increasing presence of open-source tools in your OSS recently? Have you also noticed that open-source is helping to trigger transformation? Have you thought about why that might be?

Some might rightly argue that it is the cost factor. You could also claim that they tend to help resolve specific, but common, problems. They’re smaller and modular.

I’d argue that the reason relates to our most recent two blog posts. They’re fast to install and they’re easy to run in parallel for comparison purposes.

If you’re designing an OSS can you introduce the same concepts? Your OSS might be for internal purposes or to sell to market. Either way, if you make it fast to build and easy to use, you have a greater chance of triggering transformation.

If you have a behemoth OSS to “sell,” transformation persuasion is harder. The customer needs to rally more resources (funds, people, time) just to compare with what they already have. If you have a behemoth on your hands, you need to try even harder to be faster, easier and more modular.

I have the need for OSS speed

You already know that speed is important for OSS users. They / we don’t want to wait for minutes for the OSS to respond to a simple query. That’s obvious right? The bleeding obvious.

But that’s not what today’s post is about. So then, what is it about?

Actually, it follows on from yesterday’s post about re-framing of OSS transformation.  If a parallel pilot OSS can be stood up in weeks then it helps persuasion. If the OSS is also fast for operators to learn, then it helps persuasion.  Why is that important? How can speed help with persuasion?

Put simply:

  • It takes x months of uncertainty out of the evaluators’ lives
  • It takes x months of parallel processing out of the evaluators’ lives
  • It also takes x months of task-switching out of the evaluators’ lives
  • Given x months of their lives back, customers will be more easily persuaded

It also helps with the parallel bake-off if your pilot OSS shows a speed improvement.

Whether we’re the buyer or seller in an OSS pilot, it’s incumbent upon us to increase speed.

You may ask how. Many ways, but I’d start with a mass-simplification exercise.

Re-framing an OSS replacement strategy

Friday’s post posed a re-framing exercise that asked you (whether customer, seller or integrator) to run a planning exercise as if you MUST offer a money-back guarantee on your OSS (whether internal or external). It’s designed to force a change in mindset from risk mitigation to risk removal.

We have another re-framing exercise for you today.

As we all know, incumbent OSS can be really difficult to replace / usurp. It becomes a massive exercise for a customer to change the status quo. And when you’re on the team that’s trying to instigate change (again whether you’re internal or external to the OSS customer organisation), you want to minimise the barriers to change.

The ideal replacement approach is to put a parallel pilot in place (which also bears some similarity with the strangler fig analogy). Unfortunately the pilot approach doesn’t get used as often as it could because pilot implementation projects tend to take months to stand up. This implies significant effort and cost, which in turn implies a major procurement event needs to occur.

If the parallel pilot could be stood-up in days or a couple of weeks, then it becomes a more useful replacement persuasion strategy.

So today’s re-framing exercise is to ask yourself what you could do to stand up a pilot version of your OSS in only days/weeks and at very little cost?

Let me add an extra twist to that exercise. When I say stand up the OSS in days/weeks, I also mean to hand over to the users, which means that it has to be intuitive enough for operators to begin using with almost no training. Don’t forget that the parallel solution is unlikely to have additional resources to operate it. It’s likely that the same workforce will need to operate incumbent and pilot, performing a comparison.

So, what you could do to stand up a pilot version of your OSS in only days/weeks, at very little cost and with almost immediate take-up by users?

What’s the one big factor holding back your OSS? And the exercise to reduce it

We’ve talked about some of the emotions we experience in the OSS industry earlier this week, the trauma of OSS and anxiety relating to OSS.

To avoid these types of miserable feelings, it’s human nature to seek to limit them. We over-analyse, we over-specify, we over-engineer, we over-document, we over-contract, we over-react, we over-estimate (nah, actually we almost never over-estimate do we?), we over-resource (well, actually, we don’t seem to do that very often either). Anyway, you get the “over” idea.

What is the one big factor that leads to all of these overs? What is the one big factor that makes our related costs and delivery times become overs too?

Have you guessed yet?

The answer is…… drum-roll please…… RISK.

Let’s face it. OSS projects are as full as a centipede’s sock drawer when it comes to risk. The customer carries risks, the supplier carries risk, the integrators carry risk, the sponsors carry risk, the end-users carry risk, the implementers carry risk. What a burden! And it is a burden that impacts in many ways, as indicated in the triple constraint of OSS projects.

Anyone who’s done more than a few OSS projects knows there are many risks and they tend to respond by going into over-mode (ie all the overs mentioned above). That’s a clever strategy. It’s called risk mitigation.

But today’s post isn’t about risk mitigation. It takes a contrarian approach. Let me explain.

Have you noticed how many companies build risk reduction techniques into their sales models? Phrases like “money-back guarantee” abound. This technique is designed to remove most of the risk for the customer and also remove the associated barrier to purchase. To be fair, it might not actually be a case of removing the risk, but directing all of the risk onto the seller. Marketers call it risk reversal.

I’m sure you’re thinking, “well that’s fine for high-volume, low-cost products like burgers or books, but not so easy for complex, customised solutions like OSS.” I hear you!

I’m not actually asking you to offer a money-back guarantee for your OSS, although Passionate About OSS does offer that all the way from our products through to our high-end consultancy services.

What I am asking you to do (whether customer, seller or integrator) is to run a planning exercise as if you MUST offer a money-back guarantee. What that forces is a change of mindset from risk mitigation to risk removal. It forces consideration of what are the myriad risks “in the system” (for customer, seller and integrator) and how can they be removed? Here are a few risk planning suggestions FWIW.

Set the following challenge for your analysts and engineers – Don’t come to me with a business case for the one-million-and-first feature to add, but prove your brilliance by showing me the business case for the risks you will remove. Risk reduction rather than feature-add or cost-out business cases.

Let me know what you discover and what your results are.

OSS data that’s even more useless than useless

About 6-8 years ago, I was becoming achingly aware that I’d passed well beyond an information overload (I-O) threshold. More information was reaching my brain each day than I was able to assimilate, process and archive. What to do?

Well, I decided to stop reading newspapers and watching the news, in fact almost all television. I figured that those information sources were empty calories for the brain. At first it was just a trial, but I found that I didn’t miss it much at all and continued. Really important news seemed to find me at the metaphorical water-cooler anyway.

To be completely honest, I’m still operating beyond the I-O threshold, but at least it’s (arguably) now a more healthy information diet. I’m now far more useless at trivia game shows, which could be embarrassing if I ever sign up as a contestant on “Who Wants to be a Millionaire.” And missing out on the latest news sadly makes me far less capable of advising the Queen on how to react to Meghan Markle’s latest royal “atrocity.” The crosses we bear.

But I’m digressing markedly (and Markle-ey) from what this blog is all about – O.S.S.

Let me ask you a question – Is your OSS data like almost everybody else’s (ie also in I-O mode)?

Seth Godin recently quoted 3 rules of data:
First, don’t collect data unless it has a non-zero chance of changing your actions.
Second, before you seek to collect data, consider the costs of processing that data.
Third, acknowledge that data collected isn’t always accurate, and consider the costs of acting on data that’s incorrect.”

If I remove the double-negative from rule #1 – Only collect data if it has even the slightest chance of changing your actions.

Most people take the perspective that we might as well collect everything because storage is just getting so cheap (and we should keep it, not because we ever use it, but just in case our AI tools eventually find some relevance locked away inside it).

In the meantime, pre-AI (rolls eyes), Seth’s other two rules provide further sanity to the situation. Storing data is cheap, except where it has to be timely and accurate enough to make decisive, reliable actions on.

So, let me go back to the revised quote in bold. How much of the data in your OSS database / data-lake / data-warehouse / etc has even the slightest chance of changing your actions? As a percentage??

I suspect a majority is never used. And as most of it ages, it becomes even more useless than useless. One wonders, why are we storing it then?

Zero Touch Assurance – ZTA (part 3)

This is the third in a series on ZTA, following on from yesterday’s post that suggested intentionally triggering events to allow the accumulation of a much larger library of historical network data.

Today we’ll look at the impact of data collection on our ability to achieve ZTA and refer back to part 1 in the series too.

  1. Monitoring – There is monitoring the events that happen in the network and responding manually
  2. Post-cognition – There is monitoring events that happen in the network, comparing them to past events and actions (using analytics to identify repeating patterns), using the past to recommend (or automate) a response
  3. Pre-cognition – There is identifying events that have never happened in the network before, yet still being able to provide a recommended / automated response

In my early days of OSS projects, it was common that network performance data would be collected at 15 minute intervals at best. Sometimes even less if it put too much load on the processor of any given network devices. That was useful for long and medium term trend analysis, but averaging across the 15 minute period meant that significant performance events could be missed. Back in those days it was mostly Stage 1 – Monitoring. Stage 2, Post-cognition, was unsophisticated at a system level (eg manually adjusting threshold-crossing event levels) so post-cognition relied on talented operators who could remember similar events in the past.

If we want to reach the goal of ZTA, we have to drastically reduce measurement / polling / notification intervals. Ideally, we want near-real-time data collection across the following dimensions:

  • To extract (from the device/EMS)
  • To transform / normalise the data (different devices may use different counter models for example)
  • To load
  • To identify patterns (15 minute poll cycles disguise too many events)
  • To compare with past patterns
  • To compare with past responses / results
  • To recommend or automate a response

I’m sure you can see the challenge here. The faster the poll cycle, the more data that needs to be processed. It can be a significant feat of engineering to process large data volumes at near-real-time speeds (streaming analytics) on large networks.