This OSS is different to what I’m used to

OSS implementations / transformations are always challenging. Stakeholders seem to easily get their heads around the fact that there will be technical challenges (even if they / we can’t always get their head around the actual changes initially).

When a vendor / integrator is charged with doing an OSS implementation, the client (perhaps rightly) expects the vendor / SI to lead the technical implementation and guide the client through any challenges. It’s the, “Over to you!” client mentality at times.

However, it’s the change management challenges that are often overlooked and/or underestimated (by client and vendor / integrator alike). It’s far less realistic for a client to delegate these activities and challenges to the vendor / SI. The vendor / SI simply doesn’t have the reach or influence within the client’s organisation (unless they’re long-term trusted partners). Just doing a 2 week training course at the end of the implementation rarely works.

Now, if you do represent the client, change management starts all the way back at the start of the project – from the time we start to gather current and desired future state, including process and persona mappings.

At that time we can put ourselves in the shoes of each person impacted by OSS change and consider, “If your current normal is exactly what you need, then different isn’t worth exploring” (a Seth Godin quote).

How many times have you heard about operators bypassing their sophisticated new OSS and reverting back to their old spreadsheets (thus keeping an offline store of data that would be valuable to be stored in the OSS)?

Interestingly though, if you approach those same people before the OSS implementation and ask them whether their as-is spreadsheet model gives them exactly what they need, you will undoubtedly get some great insights (either yes it is and here’s why
or not it’s not because…).

You have a stronger position of influence with these operators if you involve and listen pre-implementation than enforcing change afterwards.

To again quote Seth, they’re not always, “hesitant about this new idea because it’s a risky, problematic, defective idea… [but] because it’s simply different than [they’re] used to.”

A modern twist on OSS architecture

I was speaking with a friend today about an old OSS assurance product that is undergoing a refresh and investment after years of stagnation.

He indicated that it was to come with about 20 out of the box adaptors for data collection. I found that interesting because it was replacing a product that probably had in excess of 100 adaptors. Seemed like a major backward step… until my friend pointed out the types of adaptor in this new product iteration – Splunk, AWS, etc.

Of course!!

Our OSS no longer collect data directly from the network anymore. We have web-scaled processes sucking everything out of the network / EMS, aggregating it and transforming / indexing / storing it. Then, like any other IT application, our OSS just collect what we need from a data set that has already been consolidated and homogenised.

I don’t know why I’d never thought about it like this before (ie building an architecture that doesn’t even consider connecting to the the multitude of network / device / EMS types). In doing so, we lose the direct connection to the source, but we also reduce our integration tax load (directly to the OSS at least).

I’m really excited by a just-finished OSS analysis (part 3)

This is the third part of a series describing a really exciting analysis I’ve just finished.

Part 1 described how we can turn simple log files into a Sankey diagram that shows real-life process flows (not just a theoretical diagram drawn by BAs and SMEs), like below:

Part 2 described how the logs are broken down into a design tree and how we can assign weightings to each branch based on the data stored in the logs, as below:
OSS Decision Tree Analysis

I’ve already had lots of great feedback in relation to the Part 1 blog, especially from people who’ve had challenges capturing as-is process. The feedback has been greatly appreciated so I’m looking forward to helping them draw up their flow-charts on the way to helping optimise their process flows.

But that’s just the starting point. Today’s post is where things get really exciting (for me at least). Today we build on part 2 and not just record weightings, but use them to assist future decisions.

We can use the decision tree to “predict forward” and help operators / algorithms make optimal decisions whilst working towards process completion. We can use a feedback loop to steer an operator (or application) down the most optimal branches of the tree (and/or avoid the fall-out variants).

This allows us to create a closed-loop, self-optimising, Decision Support System (DSS), as follows:

Note: Diagram sourced from https://passionateaboutoss.com/closing-the-loop-to-make-better-decisions, where further explanation is provided

Using log data alone, we can perform decision optimisation based on “likelihood of success” or “time to complete” as per the weightings table. If supplemented with additional data, the weightings table could also allow decisions to be optimised by “cost to complete” or many other factors.

The model has the potential to be used in “real-time” mode, using the constant stream of process logs to continually refine and adapt. For example:

  • If the long-term average of a process path is 1 minute, but there’s currently a problem with and that path is failing, then another path (one that is otherwise slightly less optimised over the long-term), could be used until the first path is repaired
  • An operator happens to choose a new, more optimal path than has ever been identified previously (the delta function in the diagram). It then sets a new benchmark and informs the new approach via the DSS (Darwinian selection)

If you’re wondering how the DSS could be implemented, I can envisage a few ways:

  1. Using existing RPA (Robotic Process Automation) tools [which are particularly relevant if the workflow box in the diagram above crosses multiple different applications (not just a single monolithic OSS/BSS)]
  2. Providing a feedback path into the functionality of the OSS/BSS and it’s GUI
  3. Via notifications (eg email, Slack, etc) to operators
  4. Via a simple, more manual process like flow diagrams, work instructions, scorecards or similar
  5. You can probably envisage other methods

I’m really excited by a just-finished OSS analysis (part 2)

As the title suggests, this is the second part in a series describing a process flow visualisation, optimisation and decision support methodology that uses simple log data as input.

Yesterday’s post, part 1 in the series, showed the visualisation aspect in the form of a Sankey flow diagram.

This visualisation is exciting because it shows how your processes are actually flowing (or not), as opposed to the theoretical process diagrams that are laboriously created by BAs in conjunction with SMEs. It also shows which branches in the flow are actually being utilised and where inefficiencies are appearing (and are therefore optimisation targets).

Some people have wondered how simple activity logs can be used to show the Sankey diagrams. Hopefully the diagram below helps to describe this. You scan the log data looking for variants / patterns of flows and overlay those onto a map of decision states (DPs). In the diagram above, there are only 3 DPs, but 303 different variants (sounds implausible, but there are many variants that do multiple loops through the 3 states and are therefore considered to be a different variant).

OSS Decision Tree Analysis

The numbers / weightings you see on the Sankey diagram are the number* of instances (of a single flow type) that have transitioned between two DPs / states.

* Note that this is not the same as the count value that appears in the Weightings table. We’ll get to that in tomorrow’s post when we describe how to use the weightings data for decision support.

I’m really excited by a just-finished OSS analysis

In your travels, I don’t suppose you’ve ever come across anyone having challenges to capture and/or optimise their as-is OSS/BSS process flows? Once or twice?? 🙂

Well I’ve just completed an analysis that I’m really excited about. It’s something I’ve been thinking about for some time, but have just finished proving on the weekend. I thought it might have relevance to you too. It quickly helps to visualise as-is process and identify areas to optimise.

The method takes activity logs (eg from OSS, ITIL, WFM, SAP or similar) and turns them into a process diagram (a Sankey diagram) like below with real instance volumes. Much better than a theoretical process map designed by BAs and SMEs don’t you think?? And much faster and more accurate too!!

OSS Sankey process diagram

A theoretical process map might just show a sequence of 3 steps, but the diagram above has used actual logs to show what’s really occurring. It highlights governance issues (skipped steps) and inefficiencies (ie the various loops) in the process too. Perfect for process improvement.

But more excitingly, it proves a path towards real-time “predict-forward” decision support without having to get into the complexities of AI. More has been included in the analysis!

If this is of interest to you, let me know and I’ll be happy to walk you through the full analysis. Or if you want to know how your real as-is processes perform, I’d be happy to help turn your logs into visuals like the one above.

PS1. You might think you need a lot of fields to prepare the diagrams above. The good news is the only mandatory fields would be something like:

  1. Flow type – eg Order type, project type or similar (only required if the extract contains multiple flow types mixed together. The diagram above represents just one flow type)
  2. Flow instance identifier – eg Order number, project number or similar (the diagram above was based on data that had around 600,000 flow instances)
  3. Activity identifier – eg Activity name (as per the 3 states in the diagram above), recorded against each flow instance. Note that they will ideally be an enumerated list (ie from a finite pick-list)
  4. Timestamps – Start/end timestamp on each activity instance

If the log contains other details such as the name of the operator who completed each activity, that can help add richness, but not mandatory.

PS2. The main objective of the analysis was to test concepts raised in the following blog posts:

Are modern OSS architectures well conceived?

Whatever is well conceived is clearly said,
And the words to say it flow with ease
.”
Nicolas Boileau-Despréaux
.

I’d like to hijack this quote and re-direct it towards architectures. Could we equally state that a well conceived architecture can be clearly understood? Some modern OSS/IT frameworks that I’ve seen recently are hugely complex. The question I’ve had to ponder is whether they’re necessarily complex. As the aphorism states, “Everything should be made as simple as possible, but not simpler.”

Just take in the complexity of this triptych I prepared to overlay SDN, NFV and MANO frameworks.

Yet this is only a basic model. It doesn’t consider networks with a blend of PNF and VNF (Physical and Virtual Network Functions). It doesn’t consider closed loop assurance. It doesn’t consider other automations, or omni-channel, or etc, etc.

Yesterday’s post raised an interesting concept from Tom Nolle that as our solutions become more complex, our ability to make a basic assessment of value becomes more strained. And by implication, we often need to upskill a team before even being able to assess the value of a proposed project.

It seems to me that we need simpler architectures to be able to generate persuasive business cases. But it poses the question, do they need to be complex or are our solutions just not well enough conceived yet?

To borrow a story from Wikiquote, “Richard Feynman, the late Nobel Laureate in physics, was once asked by a Caltech faculty member to explain why spin one-half particles obey Fermi Dirac statistics. Rising to the challenge, he said, “I’ll prepare a freshman lecture on it.” But a few days later he told the faculty member, “You know, I couldn’t do it. I couldn’t reduce it to the freshman level. That means we really don’t understand it.

Making a basic assessment of OSS value

“…as technology gets more complicated, it becomes more difficult for buyers to acquire the skills needed to make even a basic assessment of value. Without such an assessment, it’s hard to get a project going, and in particular hard to get one going the right way.”
Tom Nolle
.

Have you noticed that over the last few years, OSS choice has proliferated, making project assessment more challenging? Previously, the COTS (Commercial Off-the-Shelf) product solution dominated. That was already a challenge because there are hundreds to choose from (there are around 400 on our vendors page alone). But that’s just the tip of the iceberg.

We now also have choices to make across factors such as:

  • Building OSS tools with open-source projects
  • An increasing amount of in-house development (as opposed to COTS implementations by the product’s vendors)
  • Smaller niche products that need additional integration
  • An increase in the number of “standards” that are seeking to solve traditional OSS/BSS problems (eg ONAP, ETSI’s ZSM, TM Forum’s ODA, etc, etc)
  • Revolutions from the IT world such as cloud, containerisation, virtualisation, etc

As Tom indicates in the quote above, the diversity of skills required to make these decisions is broadening. Broadening to the point where you generally need a large team to have suitable skills coverage to make even a basic assessment of value.

At Passionate About OSS, we’re seeking to address this in the following ways:

  • We have two development projects underway (more news to come)
    • One to simplify the vendor / product selection process
    • One to assist with up-skilling on open-source and IT tools to build modern OSS
  • In addition to existing pages / blogs, we’re assembling more content about “standards” evolution, which should appear on this blog in coming days
  • Use our “Finding an Expert” tool to match experts to requirements
  • And of course there are the variety of consultancy services we offer ranging from strategy, roadmap, project business case and vendor selection through to resource identification and implementation. Leave us a message on our contact page if you’d like to discuss more

The OSS “out of control” conundrum

Over the years in OSS, I’ve spent a lot of my time helping companies create their OSS / BSS strategies and roadmaps. Sometimes clients come from the buy side (eg carriers, utilities, enterprise), other times clients come from the sell side (eg vendors, integrators). There’s one factor that seems to be most commonly raised by these clients, and it comes from both sides.

What is that one factor? Well, we’ll come back to what that factor is a little later, but let’s cover some background first.

OSS / BSS covers a fairly broad estate of functionality:
OSS and BSS overlaid onto the TAM

Even if only covering a simplified version of this map, very few suppliers can provide coverage of the entire estate. That infers two things:

  1. Integrations; and
  2. Relationships

If you’re from the buy-side, you need to manage both to build a full-function OSS/BSS suite. If you’re from the sell-side, you’re either forced into dealing with both (reactive) or sometimes you can choose to develop those to bring a more complete offering to market (proactive).

You will have noticed that both are double-ended. Integrations bring two applications / functions together. Relationships bring two organisations together.

This two-ended concept means there’s always a “far-side” that’s outside your control. It’s in our nature to worry about what’s outside our control. We tend to want to put controls around what we can’t control. Not only that, but it’s incumbent on us as organisation planners to put mitigation strategies in place.

Which brings us back to the one factor that is raised by clients on most occasions – substitution – how do we minimise our exposure to lock-in with an OSS product / service partner/s if our partnership deteriorates?

Well, here are some thoughts:

  1. Design your own architecture with product / partner substitution in mind (and regularly review your substitution plan because products are always evolving)
  2. Develop multiple integrations so that you always have active equivalency. This is easier for sell-side “reactives” because their different customers will have different products to integrate to (eg an OSS vendor that is able to integrate with four different ITSM tools because they have different customers with each of those variants)
  3. Enhance your own offerings so that you no longer require the partnership, but can do it yourself
  4. Invest in your partnerships to ensure they don’t deteriorate. This is the OSS marriage analogy where ongoing mutual benefits encourage the relationship to continue.

Can you solve the omni-channel identity conundrum for OSS/BSS?

For most end-customers, the OSS/BSS we create are merely back-office systems that they never see. The closest they get are the customer portals that they interact with to drive workflows through our OSS/BSS. And yet, our OSS/BSS still have a big part to play in customer experience. In times where customers can readily substitute one carrier for another, customer service has become a key differentiator for many carriers. It therefore also becomes a priority for our OSS/BSS.

Customers now have multiple engagement options (aka omni-channel) and form factors (eg in-person, phone, tablet, mobile phone, kiosk, etc). The only options we used to have were a call to a contact centre / IVR (Interactive Voice Response), a visit to a store, or a visit from an account manager for business customers. Now there are websites, applications, text messages, multiple social media channels, chatbots, portals, blogs, etc. They all represent different challenges as far as offering a seamless customer experience across all channels.

I’ve just noticed TM Forum’s “Omni-channel Guidebook” (GB994), which does a great job at describing the challenges and opportunities. For example, it explains the importance of identity. End-users can only get a truly seamless experience if they can be uniquely identified across all channels. Unfortunately, some channels (eg IVR, website) don’t force end-users to self-identify.

The Ovum report, “Optimizing Customer Service in a Multi Channel World, March 2011” indicates that around 74% of customers use 3 channels or more for engaging customer service. In most cases, it’s our OSS/BSS that provide the data that supports a seamless experience across channels. But what if we have no unique key? What if the unique key we have (eg phone number) doesn’t uniquely identify the different people who use that contact point (eg different family members who use the same fixed-line phone)?

We could use personality profiling across these channels, but we’ve already seen how that has worked out for Cambridge Analytica and Facebook in terms of customer privacy and security.

I’d love to hear how you’ve done cross-channel identity management in your OSS/BSS. Have you solved the omni-channel identity conundrum?

PS. One thing I find really interesting. The whole omni-channel thing is about giving customers (or potential customers) the ability to connect via the channel they’re most comfortable with. But there’s one glaring exception. When an end-user decides a phone conversation is the only way to resolve their issue (often after already trying the self-service options), they call the contact centre number. But many big telcos insist on trying to deflect as many calls as possible to self-service options (they call it CVR – call volume reduction), because contact centre staff are much more expensive per transaction than the automated channels. That seems to be an anti-customer-experience technique if you ask me. What are your thoughts?

The 3 states of OSS consciousness

The last four posts have discussed how our OSS/BSS need to cope with different modes of working to perform effectively. We started off with the thread of “group flow,” where multiple different users of our tools can work cohesively. Then we talked about how flow requires a lack of interruptions, yet many of the roles using our OSS actually need constant availability (ie to be constantly interrupted).

From a user experience (UI/UX) perspective, we need an awareness of the state the operator/s needs to be in to perform each step of an end-to-end process, be it:

  • Deep think or flow mode – where the operator needs uninterrupted time to resolve a complex and/or complicated activity (eg a design activity)
  • Constant availability mode – where the operator needs to quickly respond to the needs of others and therefore needs a stream of notifications / interruptions (eg network fault resolutions)
  • Group flow mode – where a group of operators need to collaborate effectively and cohesively to resolve a complex and/or complicated activity (eg resolve a cross-domain fault situation)

This is a strong argument for every OSS/BSS supplier to have UI/UX experts on their team. Yet most leave their UI/UX with their coders. They tend to take the perspective that if the function can be performed, it’s time to move on to building the next function. That was the same argument used by all MP3 player suppliers before the iPod came along with its beautiful form and function and dominated the market.

Interestingly, modern architectural principles potentially make UI/UX design more challenging. With old, monolithic OSS/BSS, you at least had more control over end-to-end workflows (I’m not suggesting we should go back to the monoliths BTW). These days, you need to accommodate the unique nuances / inconsistencies of third-party modules like APIs / microservices.

As Evan Linwood incisively identified, ” I guess we live in the age of cloud based API providers, theoretically enabling loads of pre-canned integration patterns but these may not be ideal for a large service provider… Definitely if the underlying availability isn’t there, but could also occur through things like schema mismanagement across multiple providers? (Which might actually be an argument for better management / B/OSS, rather than against the use of microservices!

Am I convincing any of you to hire more UI/UX resources? Or convincing you to register for UI/UX as your next training course instead of learning a ninth programming language?

Put simply, we need your assistance to take our OSS from this…
Old MP3 player

To this…
iPod

OSS work practices that are repulsive

I believe in the principle that deep work and constant availability are repulsive concepts (in the magnetic sense).”
Tyler Mumford
in comment 2 to this post.

This blogging thing really amazes me at times. I’m regularly left shocked at the serendipitous connections that form when writing posts. Take today’s post. I did a web search looking for the thread of an idea that had no relation at all to yesterday’s post. But of the millions of possible authors that could’ve come up in the search, the article I read first was by Cal Newport. The same Cal Newport as quoted in yesterday’s post. The two articles weren’t even from the same domain (BBC.com vs calnewport.com).

Not only that but the quote above from Tyler Mumford, in serendipitous response to Cal’s article, perfectly articulates what I was struggling to describe to close out yesterday’s post. Deep work and constant availability are indeed repulsive (ie mutually exclusive). Yet both exist within the activities performed using our OSS!!

Think about that for a moment.

There are some tasks that require constant availability (think about the NOC operators who have to respond urgently to any degradation in their network’s health).
There are other tasks that require deep work (think about the NOC operators who have to identify the root-cause of a really gnarly and catastrophic fault).

But the OSS user interfaces we build do little to separate them. The processes we design don’t consider their repulsiveness. Even the way we resource our OSS implementation projects suffers from this magnetic repulsion.

As an OSS implementer, I’ve always found it interesting that clients struggle to provide suitable expertise to steer the build, to ensure it’s configured precisely for their needs. I often quote the old parable of “you get back what you put in.” I still believe the saying, but there’s more to it than that.

An OSS implementation team needs significant input from the most knowledgeable end-users. They provide the local context, the tribal knowledge. But the most knowledgeable end-users are also most valuable at performing BAU (business as usual) tasks [assuming you’re transforming an OSS whilst still maintaining an existing network]. But I’ve rarely seen a client get the balance right between providing expertise to the “build” and “run” streams in parallel. Even rarer have I seen a client expert who can quickly task-switch between build and run activities. It seems to be much more effective if client expert/s can be seconded to work on the OSS project team with few BAU activities. Tyler’s quote above helps to explain why.

Build mode requires deep work, for the most part (eg coding, process design, solution architecture, data mapping, etc). Run mode tends to require constant availability, with a few key exceptions (eg network designs, root-cause identification, etc). The two require separation.

So perhaps the parable should be, “you get back what you put in and separate out.” 🙂

Completing an OSS design, going inside, going outside, going Navy SEAL

Our most recent post last week discussed the research organisations like DARPA (Defense Advanced Research Projects Agency) and Google are investing into group flow for the purpose of group effectiveness. It cites the cost of training ($4.25m) each elite Navy SEAL and their ability to operate as if choreographed in high pressure / noise environments.

We contrasted this with the mechanisms used in most OSS that actually prevent flow-state from occurring. Today I’m going to dive into the work that goes into creating a new design (to activate a customer), and how our current OSS designs / processes inhibit flow.

Completely independently of our post, BBC released an article last week discussing how deep focus needs to become a central pillar of our future workplace culture.

To quote,

“Being switched on at all times and expected to pick things up immediately makes us miserable, says [Cal] Newport. “It mismatches with the social circuits in our brain. It makes us feel bad that someone is waiting for us to reply to them. It makes us anxious.”

Because it is so easy to dash off a quick reply on email, Slack or other messaging apps, we feel guilty for not doing so, and there is an expectation that we will do it. This, says Newport, has greatly increased the number of things on people’s plates. “The average knowledge worker is responsible for more things than they were before email. This makes us frenetic. We should be thinking about how to remove the things on their plate, not giving people more to do…

Going cold turkey on email or Slack will only work if there is an alternative in place. Newport suggests, as many others now do, that physical communication is more effective. But the important thing is to encourage a culture where clear communication is the norm.

Newport is advocating for a more linear approach to workflows. People need to completely stop one task in order to fully transition their thought processes to the next one. However, this is hard when we are constantly seeing emails or being reminded about previous tasks. Some of our thoughts are still on the previous work – an effect called attention residue.”

That resonates completely with me. So let’s consider that and look into the collaboration process of a stylised order activation:

  1. Customer places an order via an order-entry portal
  2. Perform SQ (Service Qualification) and Credit Checks, automated processes
  3. Order is broken into work order activities (automated process)
  4. Designer1 picks up design work order activity from activity list and commences outside plant design (cables, pits, pipes). Her design pack includes:
    1. Updating AutoCAD / GIS drawings to show outside plant (new cable in existing pit/pipe, plus lead-in cable)
    2. Updating OSS to show splicing / patching changes
    3. Creates project BoQ (bill of quantities) in a spreadsheet
  5. Designer2 picks up next work order activity from activity list and commences active network design. His design pack includes:
    1. Allocation of CPE (Customer Premises Equipment) from warehouse
    2. Allocation of IP address from ranges available in IPAM (IP address manager)
    3. Configuration plan for CPE and network edge devices
  6. FieldWorkTeamLeader reviews inside plant and outside plant designs and allocates to FieldWorker1. FieldWorker1 is also issued with a printed design pack and the required materials
  7. FieldWorker1 commences build activities and finds out there’s a problem with the design. It indicates splicing the customer lead-in to fibres 1/2, but they appear to already be in use

So, what does FieldWorker1 do next?

The activity list / queue process has worked reasonably well up until this step in the process. It allowed each person to work autonomously, stay in deep focus and in the sequence of their own choosing. But now, FieldWorker1 needs her issue resolved within only a few minutes or must move on to her next job (and next site). That would mean an additional truck-roll, but also annoying the customer who now has to re-schedule and take an additional day off work to open their house for the installer.

FieldWorker1 now needs to collaborate quickly with Designer1, Designer2 and FieldWorkTeamLeader. But most OSS simply don’t provide the tools to do so. The go-forward decision in our example draws upon information from multiple sources (ie AutoCAD drawing, GIS, spreadsheet, design document, IPAM and the OSS). Not only that, but the print-outs given to the field worker don’t reflect real-time changes in data. Nor do they give any up-stream context that might help her resolve this issue.

So FieldWorker1 contacts the designers directly (and separately) via phone.

Designer1 and Designer2 have to leave deep-think mode to respond urgently to the notification from FieldWorker1 and then take minutes to pull up the data. Designer1 and Designer2 have to contact each other about conflicting data sets. Too much time passes. FieldWorker1 moves to her next job.

Our challenge as OSS designers is to create a collaborative workspace that has real-time access to all data (not just the local context as the issue probably lies in data that’s upstream of what’s shown in the design pack). Our workspace must also provide all participants with the tools to engage visually and aurally – to choreograph head-office and on-site resources into “group flow” to resolve the issue.

Even if such tools existed today, the question I still have is how we ensure our designers aren’t interrupted from their all-important deep-think mode. How do we prevent them from having to drop everything multiple times a day/hour? Perhaps the answer is in an organisational structure – where all designers have to cycle through the Design Support function (eg 1 day in a fortnight), to take support calls from field workers and help them resolve issues. It will give designers a greater appreciation for problems occurring in the field and also help them avoid responding to emails, slack messages, etc when in design mode.

 

Stealing Fire for OSS (part 2)

Yesterday’s post talked about the difference between “flow state” and “office state” in relation to OSS delivery. It referenced a book I’m currently reading called Stealing Fire.

The post mainly focused on how the interruptions of “office state” actually inhibit our productivity, learning and ability to think laterally on our OSS. But that got me thinking that perhaps flow doesn’t just relate to OSS project delivery. It also relates to post-implementation use of the OSS we implement.

If we think about the various personas who use an OSS (such as NOC operators, designers, order entry operators, capacity planners, etc), do our user interfaces and workflows assist or inhibit them to get into the zone? More importantly, if those personas need to work collaboratively with others, do we facilitate them getting into “group flow?”

Stealing Fire suggests that it costs around $500k to train each Navy SEAL and around $4.25m to train each elite SEAL (DEVGRU). It also describes how this level of training allows DEVGRU units to quickly get into group flow and function together almost as if choreographed, even in high-pressure / high-noise environments.

Contrast this with collaborative activities within our OSS. We use tickets, emails, Slack notifications, work order activity lists, etc to collaborate. It seems to me that these are the precise instruments that prevent us from getting into flow individually. I assume it’s the same collectively. I can’t think back to any end-to-end OSS workflows that seem highly choreographed or seamlessly effective.

Think about it. If you experience significant rates of process fall-out / error, then it would seem to indicate an OSS that’s not conducive to group flow. Ditto for lengthy O2A (order to activate) or T2R (trouble to resolve) times. Ditto for bringing new products to market.

I’d love to hear your thoughts. Has any OSS environment you’ve worked in facilitated group flow? If so, was it the people and/or the tools? Alternatively, have the OSS you’ve used inhibited group flow?

PS. Stealing Fire details how organisations such as Google and DARPA are investing heavily in flow research. They can obviously see the pay-off from those investments (or potential pay-offs). We seem to barely even invest in UI/UX experts to assist with the designs of our OSS products and workflows.

Stealing fire for OSS

I’ve recently started reading a book called Stealing Fire: How Silicon Valley, the Navy SEALs, and Maverick Scientists Are Revolutionizing the Way We Live and Work. To completely over-generalise the subject matter, it’s about finding optimal performance states, aka finding flow. Not the normal topic of conversation for here on the PAOSS blog!!

However, the book’s content has helped to make the link between flow and OSS more palpable than you might think.

In the early days of working on OSS delivery projects, I found myself getting into a flow state on a daily basis – achieving more than I thought capable, learning more effectively than I thought capable and completely losing track of time. In those days of project delivery, I was lucky enough to get hours at a time without interruptions, to focus on what was an almost overwhelming list of tasks to be done. Over the first 5-ish years in OSS, I averaged an 85 hour week because I was just so absorbed by it. It was the source from where my passion for OSS originated. Or was it??

The book now has me pondering a chicken or egg conundrum – did I become so passionate about OSS that I could get into a state of flow or did I only become passionate about OSS because I was able to readily get into a state of flow with it? That’s where the book provides the link between getting in the zone and the brain chemicals that leave us with a feeling of ecstasis or happiness (not to mention the addictive nature of it). The authors describe this state of consciousness as Selflessness, Timelessness, Effortlessness, and Richness, or STER for short. OSS definitely triggered STER for me,, but chicken or egg??

Having spent much of the last few years embedded in big corporate environments, I’ve found a decreased ability to get into the same flow state. Meetings, emails, messenger pop-ups, distractions from surrounding areas in open-plan offices, etc. They all interrupt. It’s left me with a diminishing opportunity to get in the zone. With that has come a growing unease and sense of sub-optimal productivity during “office hours.” It was increasingly disheartening that I could generally only get into the zone outside office hours. For example, whilst writing blogs on the train-trip or in the hours after the rest of my family was asleep.

Since making the concerted effort to leave that “office state,” I’ve been both surprised and delighted at the increased productivity. Not just that, but the ability to make better lateral connections of ideas and to learn more effectively again.

I’d love to hear your thoughts on this in the comments section below. Some big questions for you:

  1. Have you experienced a similar productivity gap between “flow state” and “office state” on your OSS projects?
  2. Have you had the same experience as me, where modern ways of working seem to be lessening the long chunks of time required to get into flow state?
  3. If yes, how can our sponsor organisations and our OSS products continue to progress if we’re increasingly working only in office state?

ONAP’s fourth release, Dublin, now available

ONAP Doubles-Down on Deployments, Drives Commercial Activity Across Open Source Networking Stack with ‘Dublin’ Release.

ONAP’s fourth release, Dublin, brings an uptick in commercial activity –  including new deployment plans from major operators (including Deutsche Telekom, KDDI, Swisscom, Telecom Italia, and Telstra) and ONAP-based products and solutions from more than a dozen leading vendors – and has become the focal point for industry alignment around management and orchestration of the open networking stack, standards, and more.

Combined with the availability of ONAP Dublin, the addition of new members (Aarna Networks, Loodse, the LIONS Center at Pennsylvania State University, Matrixx Software, VoerEir AB, and XCloud Networks) continues LFN’s global drumbeat of ecosystem growth for accelerated development and adoption of open source and open standards-based networking technologies.

“It’s great to see such robust ecosystem growth with new deployments, new commercial adoption, and new members,” said Arpit Joshipura, general manager, Networking, Orchestration, Edge & IoT, the Linux Foundation. “ONAP is now a focal point for industry alignment around MANO, conformance and verification, and standards collaboration. Dublin specifically brings 5G network automation for secure, standards-aligned global deployments on any cloud of any size or location.”

“Beyond the technical accomplishments, Dublin highlights the maturity of our ONAP Community,” said Catherine Lefèvre, ONAP TSC Chair. “The relationship between carriers and vendors has grown even stronger through cooperation in many areas, including development, security and integration. For example, Swisscom and Samsung played significant roles in this release. Their collaboration with other carriers and vendors highlights the ‘innovate together’ spirit that prevails within the ONAP community. Swisscom drove the broadband service use case, collaborating with member vendors of the ONAP open source community in development and testing. Samsung performed penetration tests that identified new requirements that were taken up as a priority by the ONAP Security Subcommittee led by Orange.”

End-User Deployments Drive Commercial Activity with ONAP Dublin
Telcos and vendors alike announced new production deployments of ONAP during the Dublin release cycle. Major operators leverage ONAP to enhance consumer mobility services (AT&T) and monitor the quality of network management system access across several European countries (Orange). Concurrently, carriers including Bringcom, China Mobile, China Telecom, Deutsche Telekom, KT, Reliance Jio, Swisscom Turk Telecom,Telstra, and TIM conduct testing, PoCs, or trials that may result in additional production deployments by the end of 2019.

On the vendor side, Aarna Networks, Amdocs, BOCO, Huawei, and ZTE announced a pure-play ONAP distribution or products based on ONAP. In addition, new demos and support services were made available by Accenture, Ampere, Arris, Ciena,  Ericsson, iconectiv, Netsia, Nokia, Pantheon, Ribbon, Rift, Tech Mahindra, and Wipro.

The latest deployments signal ONAP’s continued growth among end users.

Additional updates in ONAP Dublin include:

  • New and Enhanced Blueprints:  
    • Dublin introduces a new residential connectivity blueprint, Broadband Service (BBS), to demonstrate multi-gigabit residential connectivity over PON using ONAP. 
    • The multi-release 5G blueprint adds enhancements to PNF support, performance management, fault management (PM, FM) monitoring, homing using the physical cell ID (PCI), and progress on modeling to support end-to-end network slicing in subsequent releases.
    • The CCVPN blueprint now includes dynamic addition of services and bandwidth on-demand.
  • OVP Enhancements: In April, an expanded OPNFV Verification Program (OVP) was launched that includes VNF verification through publicly-available VNF compliance test tooling based on requirements developed within the ONAP community. While OVP checks against industry-wide requirements, it does not check VNF compliance against operator-specific requirements (e.g. VM flavors, dataplane acceleration technologies, and so on). For this reason, Dublin adds a Vendor Software Product (VSP) compliance check in SDC to fill this gap.
  • Standards Alignment: Illustrating the significance of ONAP both as a reference architecture and reference code for an automation platform, LF Networking collaborates with standards bodies (e.g. 3GPP, ETSI NFV ISG, ETSI ZSM ISG, MEF, and TM Forum) to provide reference architectures for standards development. (For more information on how open source and open standards are collaborating, watch this keynote panel discussion from Open Networking Summit North America).

More details on ONAP Dublin are available at this link. ONAP’s next release, El Alto, is expected later this year and will include minor requirement updates focused on S3P, among other enhancements.

Lightning strikes in OSS

Operators have developed many unique understandings of what impacts the health of their networks.

For example, mobile operators know that they have faster maintenance cycles in coastal areas than they do in warm, dry areas (yes, due to rust). Other operators have a high percentage of faults that are power-related. Others are impacted by failures caused by lightning strikes.

Near-real-time weather pattern and lightning strike data is now readily accessible, potentially for use by our OSS.

I was just speaking with one such operator last week who said, “We looked at it [using lightning strike data] but we ended up jumping at shadows most of the time. We actually started… looking for DSLAM alarms which will show us clumps of power failures and strikes, then we investigate those clumps and determine a cause. Sometimes we send out a single truck to collect artifacts, photos of lightning damage to cables, etc.”

That discussion got me wondering about what other lateral approaches are used by operators to assure their networks. For example:

  1. What external data sources do you use (eg meteorology, lightning strike, power feed data from power suppliers or sensors, sensor networks, etc)
  2. Do you use it in proactive or reactive mode (eg to diagnose a fault or to use engineering techniques to prevent faults)
  3. Have you built algorithms (eg root-cause, predictive maintenance, etc) to utilise your external data sources
  4. If so, do those algorithms help establish automated closed-loop detect and response cycles
  5. By measuring and managing, has it created quantifiable improvements in your network health

I’d love to hear about your clever and unique insight-generation ideas. Or even the ideas you’ve proposed that haven’t been built yet.

282 million reasons for increased OSS/BSS scrutiny

The hotel group Marriott International has been told by the UK Information Commissioner’s Office that it will be fined a little over £99 million (A$178 million) over a data breach that occurred in December last year…
This is the second fine for data breaches announced by the ICO on successive days. On Monday, it said British Airways would be fined £183.39 million (A$329.1 million) for a data breach that occurred in September 2018
.”
Sam Varghese of ITwire.

The scale of the fines issued to Marriott and BA is mind-boggling.

Here’s a link to the GDPR (General Data Protection Regulation) fine regime and determination process. Fines can be issued by GDPR policing agencies of up to €20 million, or 4% of the worldwide annual revenue of the prior financial year, whichever is higher.

Determination is based on the following questions:

  1. Nature of infringement: number of people affected, damaged they suffered, duration of infringement, and purpose of processing
  2. Intention: whether the infringement is intentional or negligent
  3. Mitigation: actions taken to mitigate damage to data subjects
  4. Preventative measures: how much technical and organizational preparation the firm had previously implemented to prevent non-compliance
  5. History: (83.2e) past relevant infringements, which may be interpreted to include infringements under the Data Protection Directive and not just the GDPR, and (83.2i) past administrative corrective actions under the GDPR, from warnings to bans on processing and fines
  6. Cooperation: how cooperative the firm has been with the supervisory authority to remedy the infringement
  7. Data type: what types of data the infringement impacts; see special categories of personal data
  8. Notification: whether the infringement was proactively reported to the supervisory authority by the firm itself or a third party
  9. Certification: whether the firm had qualified under approved certifications or adhered to approved codes of conduct
  10. Other: other aggravating or mitigating factors may include financial impact on the firm from the infringement

The two examples listed above provide 282 million reasons for governments to police data protection more stringently than they do today. The regulatory pressure is only going to increase right? As I understand it, these processes are only enforced in reactive mode currently. What if the regulators become move to proactive mode?

Question for you – Looking at #7 above, do you think the customer information stored in your OSS/BSS is more or less “impactful” than that of Marriott or British Airways?

Think about this question in terms of the number of daily interactions you have with hotels and airlines versus telcos / ISPs. I’ve stayed in Marriott hotels for over a year in accumulated days. I’ve boarded hundreds of flights. But I can’t begin to imagine how many of my data points the telcos / ISP could potentially collect every day. It’s in our OSS/BSS data stores where those data points are most likely to end up.

Do you think our OSS/BSS are going to come under increasing GDPR-like scrutiny in coming years? Put it this way, I suspect we’re going to become more familiar with risk management around the 10 dot points above than we have been in the past.

The great OSS squeeeeeeze

TM Forum’s Open Digital Architecture (ODA) White Paper begins with the following statement:

Telecoms is at a crucial turning point. The last decade has dealt a series of punishing blows to an industry that had previously enjoyed enviable growth for more than 20 years. Services that once returned high margins are being reduced to commodities in the digital world, and our insatiable appetite for data demands continuous investment in infrastructure. On the other hand, communications service providers (CSPs) and their partners are in an excellent position to guide and capitalize on the next wave of digital revolution.

Clearly, a reduction in profitability leads to a reduction in cash available for projects – including OSS transformation projects. And reduced profitability almost inevitably leads executives to start thinking about head-count reduction too.

As Luke Clifton of Macquarie Telecom observed here, “Telstra is reportedly planning to shed 1,200 people from its enterprise business with many of these people directly involved in managing small-to-medium sized business customers. More than 10,000 customers in this segment will no longer have access to dedicated Account Managers, instead relegated to being managed by Telstra’s “Digital Hub”… Telstra, like the big banks once did, is seemingly betting that customers won’t leave them nor will they notice the downgrade in their service. It will be interesting to see how 10,000 additional organisations will be managed through a Digital Hub.
Simply put, you cannot cut quality people without cutting the quality of service. Those two ideals are intrinsically linked
…”

As a fairly broad trend across the telco sector, projects and jobs are being cut, whilst technology change is forcing transformation. And as suggested in Luke’s “Digital Hub” quote above, it all leads to increased expectations on our OSS/BSS.

Pressure is coming at our OSS from all angles, and with no signs of abating.

To quote Queen, “Pressure. Pushing down on me.Pressing down on you.”

So it seems to me there are only three broad options when planning our OSS roadmaps:

  1. We learn to cope with increased pressure (although this doesn’t seem like a viable long-term option)
  2. We reduce the size (eg functionality, transaction volumes, etc) of our OSS footprint [But have you noticed that all of our roadmaps seem expansionary in terms of functionality, volumes, technologies incorporated, etc??]
  3. We look beyond the realms of traditional OSS/BSS functionality (eg just servicing operations) and into areas of opportunity

TM Forum’s ODA White Paper goes on to state, “The growth opportunities attached to new 5G ecosystems are estimated to be worth over $580 billion in the next decade.
Servicing these opportunities requires transformation of the entire industry. Early digital transformation efforts focused on improving customer experience and embracing new technologies such as virtualization, with promises of wide-scale automation and greater agility. It has become clear that these ‘projects’ alone are not enough. CSPs’ business and operating models, choice of technology partners, mindset, decision-making and time to market must also change.
True digital business transformation is not an easy or quick path, but it is essential to surviving and thriving in the future digital market.”

BTW. I’m not suggesting 5G is the panacea or single opportunity here. My use of the quote above is drawing more heavily on the opportunities relating to digital transformation. Not of the telcos themselves, but digital transformation of their customers. If data is the oil of the 21st century, then our OSS/BSS and telco assets have the potential to be the miners and pipelines of that oil.

If / when our OSS go from being cost centres to revenue generators (directly attributable to revenue, not the indirect attribution by most OSS today), then we might feel some of the pressure easing off us.

Step-by-step guide to build a systematic root-cause analysis (RCA) pipeline

Fault / Alarm management tools have lots of strings to their functionality bows to help operators focus in on the target/s that matter most. ITU-T’s recommendation X.733 provided an early framework and common model for classification of alarms. This allowed OSS vendors to build a standardised set of filters (eg severity, probable cause, etc). ITU-T’s recommendation M.3703 then provided a set of guiding use cases for managing alarms. These recommendations have been around since the 1990’s (or possibly even before).

Despite these “noise reduction” tools being readily available, they’re still not “compressing” event lists enough in all cases.

I imagine, like me, you’ve heard many customer stories where so many new events are appearing in an event list each day that the NOC (network operations centre) just can’t keep up. Dozens of new events are appearing on the screen, then scrolling off the bottom of it before an operator has even had a chance to stop and think about a resolution.

So if humans can’t keep up with the volume, we need to empower machines with their faster processing capabilities to do the job. But to do so, we first have to take a step away from the noise and help build a systematic root-cause analysis (RCA) pipeline.

I call it a pipeline because there are generally a lot of RCA rules that are required. There are a few general RCA rules that can be applied “out of the box” on a generic network, but most need to be specifically crafted to each network.

So here’s a step-by-step guide to build your RCA pipeline:

  1. Scope – Identify your initial target / scope. For example, what are you seeking to prioritise:
    1. Event volume reduction to give the NOC breathing space to function better
    2. Identifying “most important” events (but defining what is most important)
    3. Minimising SLA breaches
    4. etc
  2. Gather Data – Gather incident and ticket data. Your OSS is probably already doing this, but you may need to pull data together from various sources (eg alarms/events, performance, tickets, external sources like weather data, etc)
  3. Pattern Identification – Pattern identification and categorisation of incidents. This generally requires a pattern identification tool, ideally supplied by your alarm management and/or analytics supplier
  4. Prioritise – Using a long-tail graph like below, prioritise pattern groups by the following (and in line with item #1 above):
      1. Number of instances of the pattern / group (ie frequency)
      2. Priority of instances (ie urgency of resolution)
      3. Number of linked incidents (ie volume)
      4. Other technique, such as a cumulative/blended metric

  5. Gather Resolution Knowledge – Understand current NOC approaches to fault-identification and triage, as well as what’s important to them (noting that they may have biases such as managing to vanity metrics)
  6. Note any Existing Resolutions – Identify and categorise any existing resolutions and/or RCA rules (if data supports this)
  7. Short-list Remaining Patterns – Overlay resolution pattern on long-tail (to show which patterns are already solved for). then identify remaining priority patterns on the long-tail that don’t have a resolution yet.
  8. Codify Patterns – Progressively set out to identify possible root-cause by analysing cause-effect such as:
    1. Topology-based
    2. Object hierarchy
    3. Time-based ripple
    4. Geo-based ripple
    5. Other (as helped to be defined by NOC operators)
  9. Knowledge base – Create a knowledge base that itemises root-causes and supporting information
  10. Build Algorithm / Automation – Create an algorithm for identifying root-cause and related alarms. Identify level of complexity, risks, unknowns, likelihood, control/monitoring plan for post-install, etc. Then build pilot algorithm (and possibly roll-back technique??). This might not just be an RCA rule, but could also include other automations. Automations could include creating a common problem and linking all events (not just root cause event but all related events), escalations, triggering automated workflows, etc
  11. Test pilot algorithm (with analytics??)
  12. Introduce algorithm into production use – But continue to monitor what’s being suppressed to
  13. Repeat – Then repeat from steps 7 to 12 to codify the next most important pattern
  14. Leading metrics – Identify leading metrics and/or preventative measures that could precede the RCA rule. Establish closed-loop automated resolution
  15. Improve – Manage and maintain process improvement

RIP John Reilly

Sadly, I’ve just heard about the passing of John Reilly, a distinguished fellow of the TM Forum and giant of our industry.

As the fellowship suggests, John was a significant contributor to some of the TM Forum’s most significant and enduring works.

Unfortunately I never had the chance to meet John in person, but he was always happy to field my questions regarding the nuances of TAM, SID and eTOM. Not just field the questions, but respond with articulate and understandable summaries. John was also kind enough to be a contributor to one of my e-books.

John, you will be missed by me and a select group of friends who also sought your counsel.