Two concepts to help ease long-standing OSS problems

There’s a famous Zig Ziglar quote that goes something like, “You can have everything in life you want, if you will just help enough other people get what they want.”

You could safely assume that this was written for the individual reader, but there is some truth in it within the OSS context too. For the OSS designer, builder, integrator, does the statement “You can have everything in your OSS you want, if you will just help enough other people get what they want,” apply?

We often just think about the O in OSS – Operations people, when looking for who to help. But OSS/BSS has the ability to impact far wider than just the Ops team/s.

The halcyon days of OSS were probably in the 1990’s to early 2000’s when the term OSS/BSS was at its most sexy and exciting. The big telcos were excitedly spending hundreds of millions of dollars. Those projects were huge… and hugely complex… and hugely fun!

With that level of investment, there was the expectation that the OSS/BSS would help many people. And they did. But the lustre has come off somewhat since then. We’ve helped sooooo many people, but perhaps didn’t help enough people enough. Just speak with anybody involved with an OSS/BSS stack and you’ll hear hints of a large gap that exists between their current state and a desired future state.

Do you mind if I ask two questions?

  1. When you reflect on your OSS activities, do you focus on the technology, the opportunities or the problems
  2. Do you look at the local, day-to-day activities or the broader industry

I tend to find myself focusing on the problems – how to solve them within the daily context on customer challenges, but the broader industry problems when I take the time to reflect, such as writing these blogs.

The part I find interesting is that we still face most of the same problems today that we did back in the 1990’s-2000’s. The same source of risks. We’ve done a fantastic job of helping many people get what they want on their day-to-day activities (the incremental). We still haven’t cracked the big challenges though. That’s why I wrote the OSS Call for Innovation, to articulate what lays ahead of us.

It’s why I’m really excited about two of the concepts we’ve discussed this week:

Auto-releasing chaos monkeys to harden your network (CT/IR)

In earlier posts, we’ve talked about using Netflix’s chaos monkey approach as a way of getting to Zero Touch Assurance (ZTA). The chaos monkeys intentionally trigger faults in the network as a means of ensuring resilience. Not just for known degradation / outage events, but to unknown events too.

I’d like to introduce the concept of CT/IR – Continual Test / Incremental Resilience. Analogous to CI/CD (Continuous Integration / Continuous Delivery) before it, CT/IR is a method to systematically and programmatically test the resilience of the network, then ensuring resilience is continually improving.

The continual, incremental improvement in resiliency potentially comes via multiple feedback loops:

  1. Ideally, the existing resilience mechanisms work around or overcome any degradation or failure in the network
  2. The continual triggering of faults into the network will provide additional seed data for AI/ML tools to learn from and improve upon, especially root-cause analysis (noting that in the case of CT/IR, the root-cause is certain – we KNOW the cause – because we triggered it – rather than reverse engineering what the cause may have been)
  3. We can program the network to overcome the problem (eg turn up extra capacity, re-engineer traffic flows, change configurations, etc). Having the NaaS that we spoke about yesterday, provides greater programmability for the network by the way.
  4. We can implement systematic programs / projects to fix endemic faults or weak spots in the network *
  5. Perform regression tests to constantly stress-test the network as it evolves through network augmentation, new device types, etc

Now, you may argue that no carrier in their right mind will allow intentional faults to be triggered. So that’s where we unleash the chaos monkeys on our digital twin technology and/or PSUP (Production Support) environments at first. Then on our prod network if we develop enough trust in it.

I live in Australia, which suffers from severe bushfires every summer. Our fire-fighters spend a lot of time back-burning during the cooler months to reduce flammable material and therefore the severity of summer fires. Occasionally the back-burns get out of control, causing problems. But they’re still done for the greater good. The same principle could apply to unleashing chaos monkeys on a production network… once you’re confident in your ability to control the problems that might follow.

* When I say network, I’m also referring to the physical and logical network, but also support functions such as EMS (Element Management Systems), NCM (Network Configuration Management tools), backup/restore mechanisms, service order replay processes in the event of an outage, OSS/BSS, NaaS, etc.

Is your OSS squeaking like an un-oiled bearing?

Network operators spend huge amounts on building and maintaining their OSS/BSS every year. There are many reasons they invest so heavily, but in most cases it can be distilled back to one thing – improving operational efficiency.

And our OSS/BSS definitely do improve operational efficiency, but there are still so many sources of friction. They’re squeaking like un-oiled bearings. Here are just a few of the common sources:

  1. First-time Installation
  2. Identifying best-fit tools
  3. Procurement of new tools
  4. Update / release processes
  5. Continuous data quality / consistency improvement
  6. Navigating to all features through the user interface
  7. Non-intuitive functionality / processes
  8. So many variants / complexity that end-users take years to attain expert-level capability
  9. Integration / interconnect
  10. Getting new starters up to speed
  11. Getting proficient operators to expertise
  12. Unlocking actionable insights from huge data piles
  13. Resolving the root-cause of complex faults
  14. Onboarding new customers
  15. Productionising new functionality
  16. Exception and fallout handling
  17. Access to supplier expertise to resolve challenges

The list goes on far deeper than that list too. The challenge for many OSS product teams, for any number of reasons, is that their focus is on adding new features rather than reducing friction in what already exists.

The challenge for product teams is diagnosing where the friction  and risks are for their customers / stakeholders. How do you get that feedback?

  • Every vendor has a product support team, so that’s a useful place to start, both in terms of what’s generating the most support calls and in terms of first-hand feedback from customers
  • Do you hold user forums on a regular basis, where you get many of your customers together to discuss their challenges, your future roadmap, new improvements / features
  • Does your process “flow” data show where the sticking points are for operators
  • Do you conduct gemba walks with your customers
  • Do you have a program of ensuring all developers spend at least a few days a year interacting directly with customers on their site/s
  • Do you observe areas of difficulty when delivering training
  • Do you go out of your way to ask your customers / stakeholders questions that are framed around their pain-points, not just framed within the context of your existing OSS
  • Do you conduct customer surveys? More importantly, do you conduct surveys through an independent third-party?

On the last dot-point, I’ve been surprised at some of the profound insights end-users have shared with me when I’ve been conducting these reviews as the independent interviewer. I’ve tended to find answers are more open / honest when being delivered to an independent third-party than if the supplier asks directly. If you’d like assistance running a third-party review, leave us a note on the contact page. We’d be delighted to assist.

Would you hire a furniture maker as an OSS CEO?

Well, would you hire a furniture maker as CEO of an OSS vendor?

At face value, it would seem to be an odd selection right? There doesn’t seem to be much commonality between furniture and OSS does there? It seems as likely as hiring a furniture maker to be CEO of a car maker?

Oh wait. That did happen.

Ford Motor Company made just such a decision last year when appointing Jim Hackett, a furniture industry veteran, as its CEO. Whether the appointment proves successful or not, it’s interesting that Ford made the decision. But why? To focus on user experience and design as it’s next big differentiator. Clever line of thinking Bill Ford!!

I’ve prepared a slightly light-hearted table for comparison purposes between cars and OSS. Both are worth comparing as they’re both complex feats of human engineering:

Idx Comparison Criteria Car OSS
1 Primary objective Transport passengers between destinations Operationalise and monetise a comms network
2 Claimed “Business” justification Personal freedom Reducing the cost of operations
3 Operation of common functionality without conscious thought (developed through years of operator practice) Steering

Changing gears

Indicating

Hmmm??? Depends on which sales person or operator you speak with
4 Error detection and current-state monitoring Warning lights and instrument cluster/s Alarm lists, performance graphs
5 Key differentiator for customers (1970’s) Engine size Database / CPU size
6 Key differentiator for customers (2000’s) Gadgets / functions / cup-holders Functionality
7 Key differentiator for customers (2020+) User Experience

Self-driving

Connected car (car as an “experience platform”)

User Experience??

Zero-touch assurance?

Connected OSS (ie OSS as an experience platform)???

I’d like to focus on three key areas next:

  1. Item 3
  2. Item 4 and
  3. The transition between items 6 and 7

Item 3 – operating on auto-pilot

If we reference against item 1, the primary objective, experienced operators of cars can navigate from point A to point B with little conscious thought. Key activities such as steering, changing gears and Indicating can be done almost as a background task by our brains whilst doing other mental processing (talking, thinking, listening to podcasts, etc).

Experienced operators of OSS can do primary objectives quickly, but probably not on auto-pilot. There are too many “levers” to pull, too many decisions to make, too many options to choose from, for operators to background-process key OSS activities. The question is, could we re-architect to achieve key objectives more as background processing tasks?

Item 4 – error detection and monitoring

In a car, error detection is also a background task, where operators are rarely notified, only for critical alerts (eg engine light, fuel tank empty, etc). In an OSS, error detection is not a background task. We need full-time staff monitoring all the alarms and alerts popping up on our consoles! Sometimes they scroll off the page too fast for us to even contemplate.

In a car, monitoring is kept to the bare essentials (speedo, tacho, fuel guage, etc). In an OSS, we tend to be great at information overload – we have a billion graphs and are never sure which ones, or which thresholds, actually allow us to operate our “vehicle” effectively. So we show them all.

Transitioning from current to future-state differentiators

In cars, we’ve finally reached peak-cup-holders. Manufacturers know they can no longer differentiate from competitors just by having more cup-holders (at least, I think this claim is true). They’ve also realised that even entry-level cars have an astounding list of features that are only supplementary to the primary objective (see item 1). They now know it’s not the amount of functionality, but how seamlessly and intuitively the users interact with the vehicle on end-to-end tasks. The car is now seen as an extension of the user’s phone rather than vice versa, unlike the recent past.

In OSS, I’ve yet to see a single cup holder (apart from the old gag about CD trays). Vendors mark that down – cup holders could be a good differentiator. But seriously, I’m not sure if we realise the OSS arms race of features is no longer the differentiator. Intuitive end-to-end user experience can be a huge differentiator amongst the sea of complex designs, user interfaces and processes available currently. But nobody seems to be talking about this. Go to any OSS event and we only hear from engineers talking about features. Where are the UX experts talking about innovative new ways for users to interact with machines to achieve primary objectives (see item 1)?

But a functionality arms race isn’t a completely dead differentiator. In cars, there is a horizon of next-level features that can be true differentiators like self-driving or hover-cars. Likewise in OSS, incremental functionality increases aren’t differentiators. However, any vendor that can not just discuss, but can produce next-level capabilities like zero touch assurance (ZTA) and automated O2A (Order to Activate) will definitely hold a competitive advantage.

Hat tip to Jerry Useem, whose article on Atlantic provided the idea seed for this OSS post.

Unleashing the chaos monkeys on your OSS

I like to compare OSS projects with chaos theory. A single butterfly flapping it’s wings (eg a conversation with the client) can have unintended consequences that cause a tornado (eg the client’s users refusing to use a new OSS).

The day-to-day operation of a network and its management tools can be similarly sensitive to seemingly minor inputs. We can never predict or test for every combination of knock-on effects. This means that forecasting the future is impossible and failure is inevitable.

If we take these two statements to be true, it perhaps changes the way we engineer our OSS.

How many production OSS (and/or related EMS) do you know of whose operators have to tiptoe around the edges for fear of causing a meltdown? Conversely, how many do you know whose operators would quite happily trigger failures with confidence, knowing that their solution is robust and will recover without perceptibly impacting customers?

How many of you could confidently trigger scheduled or unscheduled outages of various types on your production OSS to introduce the machine learning seeding technique discussed yesterday?

Would you be prepared to unleash the chaos monkeys on your OSS / network like Netflix is prepared to do on its production systems?

Most OSS are designed for known errors and mechanisms are put in place to prevent them. Instead I wonder whether we should be design systems on the assumption that failure is inevitable, so recovery should be both rapid and automated.

It’s a subtle shift in thinking. Reduce the test scenarios that might lead to OSS failure, and increase the number of intentional OSS failures to test for recovery.

PS. Oh, and you’d rightly argue that a telco is very different from Netflix. There’s a lot more complexity in the networks, especially the legacy stacks. Many a telco would NEVER let anyone intentionally cause even the slightest degradation / failure in the network. This is where digital twin technology potentially comes into play.

An OSS without the shackles of topology

It’s been nearly two decades since I designed my first root-cause analysis (RCA) rule. It was completely reliant on network topology – more specifically, it relied on a network hierarchy to determine which alarms could be suppressed.

I had a really interesting discussion today with some colleagues who are using much more modern RCA techniques. I was somewhat surprised, but not surprised at all in hindsight, that their Machine Learning engine doesn’t even use topology data. It just looks at events and tries to identify patterns.

That’s a really interesting insight that hadn’t dawned on me before. But it’s an exciting one because it effectively unshackles our fault management tools from data quality perfection in our inventory / asset databases. It also possibly lessens the need for integrations that share topological data.

Equally interesting, the ML engine had identified over 4,000 patterns, but only a dozen had been codified and put into use so far. In other words, the machine was learning, but humans still needed to get involved in the process to confirm that the machine had learned correctly.

Makes me wonder whether the ML pre-seeding technique we discussed in an earlier post might actually be useful for confirmations at a greater scale than the team had achieved with 12 of 4000+ to date.

The standard approach is to let the ML loose and identify patterns. This is the reactive approach. The ML reacts to the alarms that are pushed up from the network. It looks at alarms and determines what the root cause is based on historical data. A human then has to check that the root cause is correct by reverse engineering the alarm stream (just like a network operator used to do before RCA tools came along) and comparing. If the comparison is successful, the person then approves this pattern.

My proposed alternate approach is the proactive method. If we proactively trigger a fault (e.g. pull a patch lead, take a port down, etc), we start from a position of already knowing what the root cause is. This has three benefits:
1. We can check if the ML’s interpretation of root cause is right
2. We’ve proactively seeded the ML’s data with this root cause example
3. We categorically know what the root cause is, unlike the reactive mode which only assumes the operator has correctly diagnosed the root cause

Then we just have to figure out a whole bunch of proactive failures to test safely. Where to start? Well, I’d speak with the NOC operators to find out what their most common root causes are and look to trigger those events.

More tomorrow on intentionally triggering failures in production systems.

Mythical OSS beasts – feature removal releases

Life can be improved by adding, or by subtracting. The world pushes us to add, because that benefits them. But the secret is to focus on subtracting…

No amount of adding will get me where I want to be. The adding mindset is deeply ingrained. It’s easy to think I need something else. It’s hard to look instead at what to remove.

The least successful people I know run in conflicting directions, drawn to distractions, say yes to almost everything, and are chained to emotional obstacles.

The most successful people I know have a narrow focus, protect against time-wasters, say no to almost everything, and have let go of old limiting beliefs.”
Derek Sivers, here.

I’m really curious here. Have you ever heard of an OSS product team removing a feature? Nope?? Me either!

I’ve seen products re-factored, resulting in changes to features. I’ve also seen products obsoleted and their replacements not offer all of the same features. But what about a version upgrade to an existing OSS product that has features subtracted? That never happens does it?? The adding mindset is deeply ingrained.

So let’s say we do want to go on a subtraction drive and remove some of the clutter from our OSS. I know plenty of OSS GUIs where subtraction is desperately needed BTW! But how do we know what to remove?

I have no data to back this up, but I would guess that almost every OSS would have certain functions that are not used, by any of their customers, in a whole year. That functionality was probably built for a specific use-case for a specific customer that no longer has relevance. Perhaps for a service type that is no longer desired by the market or a network type that will never be used again.

Question is, does your OSS have profiling instrumentation that allows you to measure what functionality is and isn’t used across your whole client base?

Can your products team readily produce a usage profile graph like the following that shows a list of functions (x-axis) by the number of times each function is used (y-axis) in a given time window? Per client? Across all clients?
Long-tail of OSS functionality use

Leave us a comment below if you’ve ever seen this type of profiling instrumentation (not for code optimisation, but for identifying client utilisation levels) and/or systematic feature subtraction initiatives.

BTW. I should make the distinction that just because a function hasn’t been used in a while, doesn’t mean it should automatically be removed. Some functionality (eg data loaders) might be rarely used, but important to retain.

The humbling experience of OSS superstars

Years ago, I was so confident, and so naive. I was so sure that I was right and everyone else was wrong.
Unfortunately I was lucky and got successful, so that kept me ignorant of my shortcomings.
I sold my company, felt ready to do something new, and started to learn. But the more I learned, the more I realized how little I knew, and how dumb-lucky I had been. I continued learning until I felt like an absolute idiot…
So I’m glad my old confidence is gone, because it thought I was right, and maybe even great.
Derek Sivers here.

Confidence is a fascinating thing in OSS. There are so many facets to it. One person can rightfully build the confidence that comes from being the best in the world at one facet, but completely lack confidence in many of the other facets. Or be a virtuoso within their project / environment but be out of their depth in another organisation’s environment.

Yesterday’s post discussed how much of a bottleneck I became on my first OSS project. Most members of that project team had a reliance on the information / data that I knew how to find and interpret. Most members of the project team had problems that “my” information / data would help them to solve.

As they say, knowledge is power and I had the confidence that came with having that knowledge / power. My ego thrived on the importance of having that knowledge / power. That ego was built on helping the team solve many of the huge number of problems we faced. But as with most ego plays, I was being selfish in retrospect.

Having access to all of the problems, and helping to solve them, meant I was learning at a rapid rate (by my limited standards). That was exciting. Realising there were so many problems to solve in this field probably helped spawn my passion for OSS.

With the benefit of hindsight, I was helping, but also hindering in almost equal measures. Like Derek, “the more I learned, the more I realized how little I knew, and how dumb-lucky I had been.” To be honest, I’ve always been aware of just how little I know across the many facets of OSS. But the more I learn, the more I realise just how big the knowledge chasm is. And with the speed of change currently afoot, the chasm is only getting wider. Not just for me, but for all of us I suspect. Do you feel it too?

Also like Derek, my old naive confidence is gone. A humbler confidence now replaces it, one derived from many humbling experiences and failures (ie learning experiences) along the way.

There are superstars on every OSS project team. You’ve worked with many of them no doubt. But let me ask you a question. If you think back on those superstars, how many have also been bottlenecks on your projects? How many have thrived on the importance of being the bottleneck? Conversely, how many have mastered the art of not just being the superstar performer, but also the on-field leader who brings others into the game? The one who can make an otherwise dysfunctional OSS team function more cohesively.

Leave a comment below if you’d like to share a story of your experience dealing with OSS superstars (or being the superstar). What have you learned from that experience?

I was a huge bottleneck on my first OSS project

I became a problematic bottleneck on my first OSS project. It didn’t start that way, but it definitely ended that way. And I’ve been thinking ever since about how I could’ve managed that better.

I started out as a network subject matter expert but wasn’t a bottleneck in that role. However, the next two functions I absorbed were the source of the problem.

The first additional role was in becoming the unofficial document librarian. Most of the documents coming into our organisation came through me. Being inquisitive, I’d review each document and try to apply it to what my colleagues and I were trying to achieve. When the team had an information void, they’d come to me with the problem and I’d not just point them to the relevant document/s but dive into helping to solve the problem.

The next role was assisting to model network data into the OSS database. This morphed into becoming responsible for all of the data in the database. In those days, I didn’t have a Minimum Viable Data (MVD) mindset. Instead it was an ingest-it-all-and-figure-out-how-to-use-it-later mentality. When the team had a data void, they’d come to me with the problem and I’d not just point them to the relevant data and what it meant but dive into helping to solve their problem/s.

You can see how this is leading to being a bottleneck can’t you?

I was effectively asking for all problems to be re-routed through me. Every person on the project (except possibly the project admins) relied on documentation and data. I averaged 85 hour weeks for about 2.5 years on that project, but still didn’t get close to servicing all the requests. Great as a learning exercise. Not great for the project.

Twenty years on, how would I do it better? Well, let me ask first, how would you do it better?

You possibly have many more ideas, but the two I’d like to leave you with are:

  • Figure out ways to make teaching more repeatable and self-learnt
  • Very closely aligned, and more importantly, is in asking leading questions that help others solve their own problems

It still feels like it’s less helpful to not dive into solving the problem, but it undoubtedly improves overall team efficiency and growth.

Oh, and by the way, if you’re just starting out in OSS and want to speed up your own development into becoming an OSS linchpin – find your way into the document librarian and/or data management roles. After all these years on OSS projects, I still think these are the best places to launch into the learning curve from.

Speeding up your OSS transition from PoC to PROD

In yesterday’s article, we discussed 7 models for achieving startup-like efficiency on large OSS transformations.

One popular approach is to build a proof-of-concept or sandpit quickly on cloud hosting or in lab environments. It’s fast for a number of reasons including reduced number of approvals, faster activation of infrastructure, reduced safety checks (eg security, privacy, etc), minimised integration with legacy systems and many other reasons. The cloud hosting business model is thriving for all of these reasons.

However, it’s one thing to speed up development of an OSS PoC and another entirely to speed up deployment to a PROD environment. As soon as you wish to absorb the PoC-proven solution back into PROD, all the items listed above (eg security sign-offs) come back into play. Something that took days/weeks to stand up in PoC now takes months to productionise.

Have you noticed that the safety checks currently being used were often defined for the old world? They often aren’t designed with transition from cloud to PROD in mind. Similarly, the culture of design cross-checks and approvals can also be re-framed (especially when the end-to-end solution crosses multiple different business units). Lastly, and way outside my locus of competence, is in re-visiting security / privacy / deployment / etc models to facilitate easier transition.

One consideration to make is just how much absorption is required. For example, there are examples of services being delivered to the large entity’s subscribers by a smaller, external entity. The large entity then just “clips-the-ticket,” gaining a revenue stream with limited involvement. But the more common (and much more challenging) absorption model is for the partner to fold the solution back into the large entity’s full OSS/BSS stack.

So let’s consider your opportunity in terms of the absorption continuum that ranges between:

clip-the-ticket (minimally absorbed) <-----------|-----------> folded-in (fully absorbed)

Perhaps it’s feasible for your opportunity to fit somewhere in between (partially absorbed)? Perhaps part of that answer resides in the cloud model you decide to use (public, private, hybrid, cloud-managed private cloud) as well as the partnership model?

Modularity and reduced complexity (eg integrations) are also a factor to consider (as always).

I haven’t seen an ideal response to the absorption challenge yet, but I believe the solution lies in re-framing corporate culture and technology stacks. We’ll look at that in more detail tomorrow.

How about you? Have you or your organisation managed to speed up your transition from PoC to PROD? What techniques have you found to be successful?

How to bring your art and your science to your OSS

In the last two posts, we’ve discussed repeatability within the field of OSS implementation – paint-by-numbers vs artisans and then resilience vs precision in delivery practices.

Now I’d like you to have a think about how those posts overlay onto this quote by Karl Popper:
Non-reproducible single occurrences are of no significance to science.”

Every OSS implementation is different. That means that every one is a non-reproducible single occurrence. But if we bring this mindset into our OSS implementations, it means we’re bringing artisinal rather than scientific method to the project.

I’m all for bringing more art, more creativity, more resilience into our OSS projects.

I’m also advocating more science though too. More repeatability. More precision. Whilst every OSS project may be different at a macro level, there are a lot of similarities in the micro-elements. There tends to be similarities in sequences of activities if you pay close attention to the rhythms of your projects. Perhaps our products can use techniques to spot and leverage similarities too.

In other words, bring your art and your science to your OSS. Please leave a comment below. I’d love to hear the techniques you use to achieve this.

How did your first OSS project make you feel?

Can you remember how you felt during the initial weeks of your first OSS project?

I can vividly recall how out of my depth I felt on my first OSS project. I was in my 20s and had relocated to a foreign country for the first time. I had a million questions (probably more actually). The challenges seemed immense (they were). I was working with talented and seasoned telco veterans who had led as many as 500 people, but had little OSS experience either. None of us had worked together before. We were all struggling. We were all about to embark on an incredibly steep learning curve, by far the steepest of my career to date.

Now through PAOSS I’m looking to create a new series of free e-books to help those starting out on their own OSS journey.

But this isn’t about my experiences. That would be a perspective limited to just one unique journey in OSS. No, this survey is all about you. I’d love to capture your feelings, experiences, insights and opinions to add much greater depth to the material.

Are you up for it? Are you ready to answer just five one-line questions to help the next generation of OSS experts?

We’ve created a survey below that closes on 23 March 2019. The best 5 responses will win a free copy of my physical book, “Mastering your OSS,” (valued at US$49.97+P/H).

Click on the link below to start the 5 question survey:

https://passionateaboutoss.com/how-did-your-first-oss-make-you-feel

Where an absence of OSS data can still provide insights

The diagram below has some parallels with OSS. The story however is a little long before it gets to the OSS part, so please bear with me.

null

The diagram shows analysis the US Navy performed during WWII on where planes were being shot. The theory was that they should be reinforcing the areas that received the most hits (ie the wing-tips, central part of the body, etc as per the diagram).

Abraham Wald, a statistician, had a completely different perspective. His rationale was that the hits would be more uniform and the blank sections on the diagram above represent the areas that needed reinforcement. Why? Well, the planes that were hit there never made it home for analysis.

In OSS, this is akin to the device or EMS that has failed and is unable to send any telemetry data. No alarms are appearing in our alarm lists for those devices / EMS because they’re no longer capable of sending any.

That’s why we use heartbeat mechanisms to confirm a system is alive (and capable of sending alarms). These come in the form of pollers, pingers or just watching for other signs of life such as network traffic of any form.

In what other areas of OSS can an absence of data demonstrate an insight?

Of OSS bosses and teachers

We need more teachers and less bosses.”
Lee Cockerell
.

The quote from Lee could be bordering on condescending to some people. Hmmm…. Sorry about that.

But let me ask you a couple of questions:

  1. Of all your years in OSS (or in industry in general), who are the 5 people who have been the most valuable to you (and your customers)?
  2. Of those 5 people, which of the following three tags would you assign to them:- a) teacher, b) boss, c) implementer

From my experiences and list of top five, I would only classify one as a boss. The other four are dual-tagged as teachers and implementers. That makes me wonder whether they go hand-in-glove? ie is the ability to implement vital to being a good teacher as much as being a good communicator / teacher makes for better implementers.

Based on my list (a tiny sample size I admit), it seems we need four times as many OSS teacher/implementers as we need bosses. 🙂

Another question. Do you aspire to be (or already are) a boss, a teacher or an implementer?

Identifying the fault-lines that trigger OSS churn

Most people slog through their days in a dark funk. They almost never get to do anything interesting or go to interesting places or meet interesting people. They are ignored by marketers who want them to buy their overpriced junk and be grateful for it. They feel disrespected, unappreciated and taken for granted. Nobody wants to take the time to listen to their fears, dreams, hopes and needs. And that’s your opening.
John Carlton
.

Whilst the quote above may relate to marketing, it also has parallels in the build and run phases of an OSS project. We talked about the trauma of OSS yesterday, where the OSS user feels so much trauma with their current OSS that they’re willing to go through the trauma of an OSS transformation. Clearly, a procurement event must be preceded by a significant trauma!

Sometimes that trauma has its roots in the technical, where the existing OSS just can’t do (or be made to do) the things that are most important to the OSS user. Or it can’t do it reliable, at scale, in time, cost effectively, without significant risk / change. That’s a big factor certainly.

However, the churn trigger appears to more often be a human one. The users feel disrespected, unappreciated and taken for granted. But here’s an interesting point that might surprise some users – the suppliers also often feel disrespected, unappreciated and taken for granted.

I have the privilege of working on both sides of the equation, often even as the intermediary between both sides. Where does the blame lie? Where do the fault-lines originate? The reasons are many and varied of course, but like a marriage breakup, it usually comes down to relationships.

Where the communication method is through hand-grenades being thrown over the fence (eg management by email and by contractual clauses), results are clearly going to follow a deteriorating arc. Yet many OSS relationships structurally start from a position of us and them – the fence is erected – from day one.

Coming from a technical background, it took me far too deep into my career to come to this significant realisation – the importance of relationships, not just the quest for technical perfection. The need to listen to both sides’ fears, dreams, hopes and needs.

Zero Touch Assurance – ZTA (part 1)

A couple of years ago, we published a series on pre-cognitive OSS based on the following quote by Ben Evans about three classes of search/discovery:

  1. There is giving you what you already know you want (Amazon, Google)
  2. There is working out what you want (Amazon and Google’s aspiration)
  3. And then there is suggesting what you might want (Heywood Hill).

Today, I look to apply a similar model towards the holy grail of OSS – Zero Touch Assurance (ZTA).

  1. Monitoring – There is monitoring the events that happen in the network and responding manually
  2. Post-cognition – There is monitoring events that happen in the network, comparing them to past events and actions (using analytics to identify repeating patterns), using the past to recommend (or automate) a response
  3. Pre-cognition – There is identifying events that have never happened in the network before, yet still being able to provide a recommended / automated response

The third step, pre-cognition, is where the supposed holy grail lies. It’s where everyone talks about the prospect of AI solving all of our problems. It seems we’re still a way off this happening.

But I wonder whether the actual ZTA solution might be more of a brute-force version of step 2 – post-cognition?

More on that tomorrow.

Do you have a nagging OSS problem you cannot solve?

On Friday, we published a post entitled, “Think for a moment…” which posed the question of whether we might be better-served looking back at our most important existing features and streamlining them rather than inventing new features to solve that have little impact.

Over the weekend, a promotional email landed in my inbox from Nightingale Conant. It is completely unrelated to OSS, yet the steps outlined below seem to be a fairly good guide for identifying what to reinvent within our existing OSS.

Go out and talk to as many people [customers or potential] as you can, and ask them the following questions:
1. Do you have a nagging problem you cannot solve?
2. What is that problem?
3. Why is solving that problem important to you?
4. How would solving that problem change the quality of your life?
5. How much would you be willing to pay to solve that problem?

Note: Before you ask these questions, make sure you let the people know that you’re not going to sell them anything. This way you’ll get quality answers.
After you’ve talked with numerous people, you’ll begin to see a pattern. Look for the common problems that you believe you could solve. You’ll also know how much people are willing to pay to have their problem solved and why.

I’d be curious to hear back from you. Do those first 4 questions identify a pattern that relates to features you’ve never heard of before or features that your OSS already performs (albeit perhaps not as optimally as it could)?

Nobody dabbles at dentistry

There are some jobs that are only done by accredited professionals.
And then there are most jobs, jobs that some people do for fun, now and then, perhaps in front of the bathroom mirror.
It’s difficult to find your footing when you’re a logo designer, a comedian or a project manager. Because these are gigs that many people think they can do, at least a little bit.”
Seth Godin here.

I’d love to plagiarise the entire post from Seth above, but instead suggest you have a look at the other pearls of wisdom he shares in the link above.

So where does OSS fit in Seth’s thought model? Well, you don’t need an accreditation like a dentist does. Most of the best I’ve met haven’t had any OSS certifications to speak of.

Does the layperson think they can do an OSS role? Most people have never heard of OSS, so I doubt they would believe they could do a role as readily as they could see themselves being logo designers. But the best I’ve met have often come from fields other than telco / IT / network-ops.

One of my earliest OSS projects was for a new carrier in a country that had just deregulated. They were the second fixed-line carrier in this country and tried to poach staff from the incumbent. Few people crossed over. To overcome this lack of experience, the carrier built an OSS team that consisted of a mathematician, an architect, an automotive engineer, a really canny project manager and an assortment of other non-telco professionals.

The executives of that company clearly felt they could develop a team of experts (or just had no choice but to try). The strategy didn’t work out very well for them. It didn’t work out very well for us either. We were constantly trying to bring their team up to speed on the fundamentals in order to use the tools we were developing / delivering (remembering that as one of my first OSS projects, I was still desperately trying to bring myself up to speed – still am for that matter).

As Seth also states, “If you’re doing one of these non-dentist jobs, the best approach is to be extraordinarily good at it. So much better than an amateur that there’s really no room for discussion.” That needs hunger (a hungriness to learn without an accredited syllabus).

It also needs humility though. Even the most extraordinarily good OSS proponents barely scratch the surface of all there is to learn about OSS. It’s the breadth of sub-expertise that probably explains why there is no single accreditation that covers all of OSS.

How to build a personal, cloud-native OSS sandpit

As a project for 2019, we’re considering the development of a how-to training course that provides a step-by-step guide to build your own OSS sandpit to play with. It will be built around cloud-native and open-source components. It will be cutting-edge and micro-scaled (but highly scalable in case you want to grow it).

Does this sound like something you’d be interested in hearing more about?

Like or comment if you’d like us to keep you across this project in 2019.

I’d love to hear your thoughts on what the sandpit should contain. What would be important to you? We already have a set of key features identified, but will refine it based on community feedback.

OSS answers that are simple but wrong vs complex but right

We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard, because that goal will serve to organize and measure the best of our energies and skills…”
John F Kennedy
.

Let’s face it. The business of running a telco is complex. The business of implementing an OSS is complex. The excitement about working in our industry probably stems from the challenges we face, but the impact we can make if/when we overcome them.

The cartoon below tells a story about telco and OSS consulting (I’m ignoring the “Science vs everything else” box for the purpose of this post, focusing only on the simple vs complex sign-post).

Simple vs Complex

I was recently handed a brochure from a consulting firm that outlined a step-by-step transformation approach for comms service providers of different categories. It described quarter-by-quarter steps to transform across OSS, BSS, networks, etc. Simple!

The problem with their prescriptive model was that they’d developed a stereotype for each of the defined carrier categories. By stepping through the model and comparing against some of my real clients, it was clear that their transformation approaches weren’t close to aligning to any of those clients’ real situations.

Every single assignment and customer has its own unique characteristics, their own nuances across many layers. Nuances that in some cases are never even visible to an outsider / consultant. Trying to prepare generic, but prescriptive transformation models like this would seem to be a futile exercise.

I’m all for trying to bring repeatable methodologies into consulting assignments, but they can only act as general guidelines that need to be moulded to local situations. I’m all for bringing simplification approaches to consultancies too, as reflected by the number of posts that are categorised as “Simplification” here on PAOSS. We sometimes make things too complex, so we can simplify, but this definitely doesn’t imply that OSS or telco transformations are simple. There is no one-size-fits-all approach.

Back to the image above, there’s probably another missing arrow – Complex but wrong! And perhaps another answer with no specific path – Simple, but helpful in guiding us towards the summit / goal.

I can understand why telcos get annoyed with us consultants telling them how they should run their business, especially consultants who show no empathy for the challenges faced.

But more on that tomorrow!