Actually, maybe a few related questions.
As you’ve seen from recent articles such as this, I’m increasingly interested in the cross-over worlds of comms and energy. They’re becoming increasingly linked, largely due to the current energy use of generative AI (I’ve seen estimates that GenAI searches consume around 10x the energy of equivalent traditional searches).
This makes me ponder a bunch of questions about OSS / BSS code efficiency. OSS and BSS code bases tend to be quite large and complex, with a lot of computation being applied to many different types of transactions flowing through an OSS/BSS “factory.” At such large scale, energy and processing efficiency can have a bigger impact than we might initially think.
But before asking some questions to my dev friends, let me share a couple of stories that have made me re-think the importance of code efficiency of OSS and BSS.
Story #1 – A staggering number of pizza boxes
A tier one operator had gone through an extensive evaluation process to find the best-fit observability solution for its network. It had managed to get down to two remaining bidders. Both had relatively similar functional capabilities. The big-name brand proposed an infrastructure footprint of 110 x 1RU pizza boxes to process all the telemetry coming from the carrier network. By comparison, the winning bidder had shown that it could process the same volume of telemetry data on only a handful of VMs (virtual machines).
Think about the ramifications of that for a moment:
- How much space do you think those pizza box compute modules would take up in the carrier’s DC? Space that would otherwise be available for revenue-generating opportunities (eg selling co-lo space or hosting services to customers) and/or delay significant CAPEX-heavy DC expansion projects
- How much energy do you think that compute was costing
- How many more potential points of failure were added via that much infra
- How much more monitoring and maintenance would be required to keep those appliances running
- etc
And the kicker? Because the winning bidder’s solution was so much more computationally efficient, it actually delivered much faster visibility of what was happening in the network (ie it processed the logs much faster, thus reducing the time between when an event happened in the network to when it became visible in operator consoles for actioning).
Story #2 – A staggering number of “free” servers
I recently heard a story about a carrier that wanted to do call-trace analysis over its expansive mobile network. As many of you probably already know, call-trace data can often run into many Tbs of data per day. It’s a fire-hose of information that you might only be looking for a specific cup-full of insight from.
One of this carrier’s canny vendors understood that it could take a lot of compute to process that much information. Especially since the vendor also knew that its processing engine is computationally inefficient compared with other solutions. The vendor also knew the carrier didn’t have the CAPEX to fund that much infrastructure. The “generous” vendor gave the carrier 450 servers for free to crunch the call-trace data. [By comparison, I believe that other solutions can deliver similar functionality with only a handful of servers.]
Sounds like a good deal for the carrier to be given 450 free servers doesn’t it? However, this vendor knows they can play the long game. I suspect they’ve also relied on the carrier not performing any TCO (Total Cost of Ownership) modelling.
The vendor is going to charge high support and maintenance on the servers and applications. They’re going to charge a premium on professional services to derive call-trace insights (as a black-box contract). There’s also all that space and power of running so many servers, as per the prior example.
Now, the questions
Now, after hearing those stories, some questions for my dev friends include:
- If I have access to an OSS/BSS code base (eg I work for an ISV – independent software vendor – or carrier that produces the code),
- Can I easily capture metrics about (and/or heatmap) the energy consumption of various parts of it? eg are there existing tools that do this
- Can today’s AI code co-pilots (or similar tools) audit the code-base to find ways to improve efficiency
- After new functionality is implemented, do OSS/BSS devs often / rarely / never come back and look at improving the computational efficiency of the code (without prompting of clients that say certain functionality is too slow)
- If I don’t have access to the code (eg I work for a carrier that uses proprietary software from third-party suppliers and must perform an audit without being privy to what’s inside the black box of code)
- Can I easily determine the energy use of my existing OSS/BSS
- Can I readily compare the energy efficiency of competing OSS/BSS in a bake-off situation
- Can I measure and/or heat-map the energy consumption of my entire OSS/BSS environment, which may consist of many different products / modules from various vendors, as well as integration frameworks such as ESBs and data lakes
At first glance, I’d never really considered the implications of these questions. I’d assumed that one OSS is going to be relatively similar in energy consumption to any other OSS, so it wouldn’t be worth the effort to evaluate or optimise. I’m now beginning to re-think that assumption and I’d love to hear from you what the state-of-the-art is for auditing OSS/BSS code for efficiency.