How to Architect Your OSS/BSS/NMS Security Framework

Table of Contents

Our OSS / BSS manage some of the world’s most vital comms infrastructure don’t they? That makes them pretty important assets to protect from cyber-intrusion. Therefore security is a key, but often underestimated, component of any OSS / BSS project. It can’t be an afterthought.

Let me start by saying I’m no security expert. However, I have worked with quite a few experts tasked with securing our OSS projects and picked up a few ideas along the way. We’ll share a few of those ideas in this article.

.

1. The OSS/BSS Stack (TMN Pyramid)

Let’s start by using the layers of the TMN Pyramid as a guide

OSS abstract and connect

Figure 1 – TMN Pyramid

  • The Network Element Layer (NEL) is the heart of the Active Network. The Active Networks represents the network devices (eg switches, routers, muxes, etc) that carry customer traffic. It also represents the connectivity (logical and physical) between the network nodes. The active network is arguably the most important part of the network to secure, partly because it carries customer traffic that needs to be protected from outage and also because the sheer volume of customer connection points represents a huge number of potential attack points
  • The EMS / NMS (Element / Network Management Systems) tend to handle the management of individual network devices and their connectivity. They also act as “feeders” of information to the OSS layers above them.
    • Element Management Systems (EMS) / EML layer are typically vendor-specific and/or single-domain and provide functionalities like device configurations, local connectivity, monitoring, and local troubleshooting. Because they’re often single-domain and single vendor, they can have quite specific proprietary monitoring and management practices / functionality. 
    • NMS (Network Management System) / NML layer oversees the entire network infrastructure, including fault, performance, and configuration (FCAPS) management across different network domains. It serves as the bridge between high-level service objectives and lower-level network resources. This is roughly equivalent to the NMS/NML layer, where network equipment, usually from multiple vendors and/or domains is stitched together
  • The OSS (Operations Support System)/SML layer is responsible for managing telecom services delivered to customers. It includes service provisioning, quality assurance and customer experience management, ensuring that services meet agreed SLAs (Service Level Agreements) and the networks are managed effectively. It is also responsible for managing the field workforce that looks after the network and associated services
  • The BSS (Business Support System) / BML or topmost layer focuses on business objectives, strategy, and financial considerations. It ensures that network operations align with business goals such as revenue generation, cost reduction, and service quality improvements.

.

2. Security Trust Zones / Realms

Now we can consider how each of these pieces needs to operate securely. Security starts with how you segment and segregate your network and related systems. The aim of segmentation / segregation is to restrict malicious access to sensitive data / systems.

The diagram below shows a highly simplified three-realm design.

OSS BSS Cloud Security Control Points:

Figure 2 – Simplified Security Trust Zones for OSS/BSS

Starting at the bottom:

  1. The operator’s Active Network realm – the network that carries live customer traffic and is managed by the CSP / operator [Noting though, that these may also be managed as virtual and/or leased entities rather than owned in some cases]. As mentioned above, it comprises the routers, switches, muxes, etc that make up the network. As such, this zone needs to be highly secure. Customers connect to the Active Network at the edge of the organisation’s network, often via CPE (Customer Premises Equipment), NTU (Network Termination Units) or similar
  2. The operator’s Corporate / Enterprise realm – the network that houses the organisation’s corporate IT assets (Core DNS, NTP stratum-2 servers
    Mail servers, SIEM services, but often also OSS or parts of it such as ticketing, workflow, CMDB, ERP, etc). This is where most corporate staff engage with core business services like desktop tools and so much more. If network operations staff need to connect to the Corporate / Enterprise realm but also reach into the Active Network realm, then an air-gap is usually established by the SCP between the two. This is bridged through technologies like Citrix, RDP (Remote Desktop Protocol) or similar
  3. The Cloud / Internet realm –  the external networks / infrastructure utilised by the organisation that are outside the organisation’s direct control. This includes Internet services (like email), which many corporate users rely on of course. However, it may also include some important components of your OSS/BSS stack if provided as public cloud services (eg BSS is increasingly delivered via SaaS models), an increasingly common software supply model these days
  4. You’ll also notice the all-important Security Control Points (SCP) like firewalls that provide segregation between the zones

In all likelihood, your security trust model will contain more than these three zones, but these should be the absolute minimum. There are other possibilities such as Demilitarized Zones (DMZ), Management Zones and more.

In fact, it’s quite likely that your organisation already has a trust model that you need to design your OSS/BSS stack around. This can be really challenging and time-consuming, so this is where the questions for your security / infrastructure teams need to be asked early in your OSS/BSS transformation process!

.

3. What OSS/BSS Components go Where?

Once we have the security trust zones identified (whether the simplified ones indicated above, or your existing model) we now have to determine the appropriate locations for our OSS / BSS / management stack to reside.

The diagram below provides a high-level, initial sense of where things go. However, there are many considerations that could see a different architecture.

Figure 3 – Simplified Security Trust Zones with OSS/BSS Components Shown

The Active Network should be segregated from the Corporate / Enterprise network so that it can continue to provide service to customers even if the connection between them is lost (or intentionally severed if a security breach is identified). It should also be segregated so that no customers can access the EMS/NMS or corporate systems.

But this is where things get interesting. The Active Network and our Network Management stack rely on Shared Services such as DNS (Domain Naming System), NTP (Network Time Protocol), Identity / Access Management (IAM/UAM/PAM), Network Access Control (NAC), Patch Management, Anti-Virus, AD / SSO and more. These shared services tend to be housed in Corporate / Enterprise realms. If we want the Active Network to be able to operate in complete standalone mode then we need to provide special consideration to the shared services architectures. 

Another important consideration is the coloured lines between components:

  • The orange lines represent the active network connectivity
  • The aqua lines represent the management network or DCN (Data Control Network). This network is responsible for facilitating communications between all the layers in the TMN pyramid, as well as connections to shared services. You’ll have noticed that the management network traverses trust zones, so it needs to be very carefully planned. It also provides access to high-value assets (ie the platforms that monitor and manage the active network), so it’s important that it remains highly secure. The DCN often comes in two types (or a hybrid of both):
    • In-band, where management traffic traverses the same links as other corporate traffic, but generally segmented via a management VLAN; or
    • Out-of-band (OOB), where management traffic has its own, dedicated network, which provides a level of protection and segregation
Zone / Trust realmCore operational systems (examples)Supporting infrastructure & security toolingPrimary security objectives
Cloud OSS/BSS (off-net)SaaS OSS modules (inventory, orchestration, assurance)
Cloud BSS (CRM, billing, order management)
AI/ML analytics work-loads & data lakes
Remote vendor portals & licence servers
Cloud IAM & MFA
API gateways / WAF
CASB & DDoS scrubbing
Zero-trust network access (ZTNA) brokers
Keep sensitive network data off the public Internet
Encrypt all north-south API traffic
Segregate tenants & environments
Security Control Point 1 (cloud ↔ DC)Next-gen firewall set, IDS/IPS, reverse proxy, SSL/TLS termination, SASE
Monitoring taps feeding SIEM
Inline malware sandbox
Automated compliance policy engines
Enforce least-privilege flows between SaaS and on-prem
Break/inspect TLS where policy permits
Corporate / Enterprise realm (on-net)On-prem OSS stacks & NMS managers
Ticketing, workflow, CMDB, ERP, BI
Source-of-truth repositories (IPAM, GIS)
Corporate AD/LDAP & RADIUS
Core DNS, NTP stratum-2 servers
Mail & chat servers
Patch-management & software repo
Bastion / jump servers with PAM, NAC
Protect corporate and management data from the active network (but still allow key roles such as NOC to access mail and internet to perform their tasks)
Authenticate & log all privileged sessions
Security Control Point 2 (DC ↔ Active network)Dual-homed bastion hosts
Protocol firewalls (SNMP/NETCONF/gRPC whitelists)
Data diodes / one-way gateways where high assurance needed
TACACS+ / RADIUS AAA servers
Central syslog collectors
Prevent lateral movement from IT to OT
Rate-limit config pushes & software loads
Active Network realm (on-net)Network elements (routers, switches, RAN, OLT, DWDM)
EMS/element managers & SDN controllers
Out-of-band DCN routers & terminal servers
Stratum-1 NTP, PTP, GNSS references
SCADA / environmental sensors
Local syslog / telemetry collectors
Auto-rollback config vault
Inline encryption for high-value links

Preserve real-time performance
Deny unauthorised config change
Meet deterministic timing for services
Prevent customer access to management layers

  • The location of OSS / BSS are interesting. They have to interface with the active network and EMS / NMS. But they also usually have to interface with corporate systems like data warehouses, reporting tools, etc. They’re so critical to managing the Active Network, they need to be highly secure. However, they also often need to have connections to external networks, especially if the OSS or BSS components are hosted by SaaS (Software as a Service) providers
  • That means they could be placed inside the Active Network realm or even have their own special Central Management realm. In other cases, different components of the OSS / BSS might be spread across different realms
  • Dedicated Network Operation Centre (NOC) operator terminals tend to connect inside the Active Network, but may need internet and email access
  • We may have some shared services that need to bridge between zones, such as NTP and DNS that are needed in both. Parent-child relationships may dictate which zone or zones that their components must reside in

Note that we also have to consider the systems (eg user portals, asset management systems, etc, etc) that our OSS / BSS need to interface to and where they reside in the trust model.

Aside: Traditionally, we’ve focused on perimeter defense and authenticated users are granted authorised access to a broad collection of resources. We now see the trend towards more remote users and cloud-based assets outside the enterprise-owned boundary in our OSS architectures. There’s currently debate around whether zero-trust architectures are required to segment more holistically – to restrict lateral movement within a network, assuming an attacker is already present on the network.
The NIST ZTA draft discusses this emerging approach in more detail

.

4. Personas: Who Interacts with What?

The people who interact with the various different components in the OSS/BSS stack will also impact the zone design. Here are just a few thoughts, but your specific case might be quite different:

PersonaZones typically accessedAccess path / toolingTypical activities
NOC operatorsCorporate / Enterprise + (indirect) Active networkOSS/NMS GUI via corporate LAN or VPN → jump server → EMSAlarm triage, topology views, config pushes, performance troubleshooting
Field workforceCorporate / Enterprise + Active networkRuggedised laptop/tablet → LTE/NB-IoT or local console → EMS/NEOn-site fault repair, cable tracing, commissioning, access to network designs and/or alarm/ticket information
Network designersCorporate / EnterpriseInventory & planning tools, SDN simulatorsCapacity design, route selection, what-if modelling
OSS architectsCorporate / Enterprise + CloudDirect corporate LAN, cloud IDE, CI/CD pipelinesSolution blueprinting, data-model governance, automation strategy
OSS suppliers (on-prem)Cloud + Corporate / Enterprise + (via project VLAN) + Active Network (via management network)Time-boxed VPN + PAM to staging & prodImplementation, upgrade, break-fix support, patch/release management
OSS suppliers (SaaS)Cloud OSS/BSSCloud console + API telemetry, secured by ZTNADevOps, road-map releases, SaaS health monitoring
Customers / wholesale partnersCloud portals (DMZ)Web/SAML portal or B2B APIOrder status, self-service reporting, ticket initiation
Contractors / consultantsCorporate / Enterprise (limited)Federated identity, bastion hostRequirements workshops, documentation, training
Security operations (SOC)Corporate / Enterprise + All SCPsSIEM, NDR, SOAR dashboardsThreat hunting, incident response, compliance evidence

.

5. Restricting Access to OSS / BSS systems and data

We want to uniquely control who has access to what systems and data using our OSS / BSS stack.

The Security Trust model also impacts the architectures of Identity Management (Directory Services like Active Directory), User Access Management (UAM), Privileged Access Management (PAM) and Network Access Control (NAC) solutions and how they control access to our OSS / BSS. 

They serve three purposes:

  • To provide fine-grained management of access to privileged / restricted data and systems within our OSS / BSS
  • To simplify the administrative overhead of managing user access to our OSS / BSS by defining group-based user access policies
  • To log the activities of individual users whilst they use the OSS/BSS and related systems / networks

Most OSS / BSS allow user authentication via Directory Services these days. Most, but not all, also allow roles / privileges to be assigned via Directory Services. For example, RBAC (Role Based Access Control) is policy that is defined by our OSS / BSS applications. It controls what functions users / groups can perform via permission management.

At a high level, we have:

  • NAC – Governs who can connect to the network, from where, and under what conditions
    Gate 1 – “Are you allowed to enter?”
  • PAM – Governs what privileged actions users can take once connected, including device access, session timing, and credentials
    Gate 2 – “Are you allowed to perform sensitive operations?”

This video from CyberArk provides walkthrough of how Dynamic privileged access management can provide very specific access.

For central user admi
nistration purposes, it’s ideal that the Directory Service can pass role-based information to our OSS / BSS. 

.

6. OSS / BSS Data Security

The first step in the data security process is to identify categories of data such as unclassified, confidential, secret, etc.

We then need to consider what security mechanisms need to be applied to each category. There are four main OSS / BSS data security considerations:

  1. Data Anonymisation / Privacy – is the process of removing / redacting / encrypting personally identifiable information from the data sets stored in our OSS / BSS (particularly the latter). Our solutions need to store personal data such as names, addresses, contact details, billing details, etc. We can use techniques to control the pervasiveness of access to that data. For example, we may use a tightly restricted system to store personal details as well as a non-identifiable code (eg LocationID or ServiceID) for use by our other more widely accessed tools (eg PNI / LNI)
  2. Encryption of data at rest – is the process of encrypting the large stores of data used by our OSS / BSS, whether a local database used by each application or in centralised data warehouses
  3. Encryption of data in transit – is the process of encrypting data as it transits between components within your OSS/BSS stack (and possibly beyond). Techniques such as VPNs and IPSec protocols can be used. As we increasingly see OSS / BSS built as web-based applications, we’re using encrypted connections (eg HTTPS, SSL, TLS, etc) to protect our data
  4. Physical security – is the process of restricting physical access to data stores (eg locked cabinets, facilities access management, etc). This isn’t always within our control as an OSS / BSS project team.

.

7. Real-time Security Logging / Monitoring

Ensure all systems in the management stack (OSS, BSS, NMS, EMS, the network, out-of-band management, etc) are logging to a central SIEM (Security Information and Event Management) tool. Oh, and don’t do what I saw one big bank do – they had so many hits occurring just on their IPS / IDS tool that they just left it sitting in the corner unmonitored and in the too-hard basket. By having the tools, they’d ticked their compliance box, but there was no checkbox asking them to actually look at the results or respond to the incidents identified!!

This traffic is carried by the aqua links in the diagram earlier in this article.

.

8. Patch Management

Software patch management is theoretically one of the simplest security management techniques to implement. It ensures you have the latest, hopefully most secure, version of all software.

OSS / BSS / Management stacks tend to have many, many different components. Not just at the obvious application level, but operating systems, third-party software (eg runtime environments, databases, application servers, message buses, antivirus software, syslog, etc). 

Patch management is often well maintained by IT teams within the Corporate / Enterprise trust zone discussed above. They have access to the Internet to download patches and tools to help push updates out. However, the Active Network zone shouldn’t have direct access to the Internet, so routine patch management could be easily overlooked and/or difficult to implement. Sometimes the software components reside on servers that are rarely logged into and patches can be easily overlooked.

The other problem is that OSS / BSS applications are often heavily customised, making it hard to follow a standard upgrade path. I’ve seen OSS / BSS that haven’t been patched for years, even with something as simple as Java runtime environments, because it causes the OSS / BSS to fail.

.

9. Security Testing / Hardening

Your organisation probably already has standards and checklists in place to ensure that all of your IT assets are as secure as possible. Your OSS / BSS environments are just one of those assets. However, as the “manager of managers” of your Active Network, the OSS / BSS is probably more important to secure than most.  

Your organisation might also insist that all applications, including the OSS / BSS, are built on a hardened Standard Operating Environment (SOE). However, some suppliers provide OSS / BSS as appliances, built on their own environments. These then have to go through a hardening process in alignment with your corporate IT standards.

If using a vendor-supplied off-the-shelf application, it will be quite common for it to have a default admin account on the application and database. This makes it easier for the system implementation team to navigate their way around the solution when building it. However, one of the first steps in a hardening process is to rename or disable these built-in accounts.

As “manager of managers,” your OSS / BSS’s primary purpose is to collect (or request) information from a variety of sources. Some of these sources reside in the Active Network. Others reside in the Corporate Network or elsewhere. As such, careful consideration needs to be given to what Ports / Protocols are allowed. Some systems will come pre-configured with default / open settings. However, these should be restricted to necessary protocols only, including SNMP, HTTPS, SSH, FTPS and/or similar.

Speaking of SNMP, its original design was inherently insecure as it uses a primitive method of authentication. It uses clear-text community strings to secure access to the management plane. Only version 3 of SNMP (ie SNMPv3) has the ability to authenticate and encrypt payloads, so this should be used wherever possible. Some of you may have legacy device types that precede SNMPv3 though. Alert TA17-156A provides suggestions to minimise exposure to SNMP abuse.

Also consider the environment on which you’re performing your security testing. As described in this post about OSS / BSS environments and test transitions, you’ll probably have multiple environments – PROD environments that are connected to the live Active Network devices and non-PROD environments that are connected to test lab devices and/or simulators. Where should you perform your penetration / security testing? Probably not on PROD, because you want to ensure the solution is already secure before letting it loose into Production. But you also want to ensure it’s the most PROD-like as possible. You could possibly use PRE-PROD (ie a state before a solution is cut-over to PROD), before it’s fully connected to the Active Network. Or, you could use the most PROD-like lower environment (eg Staging).

One other thing when conducting security tests and hardening – penetration testing often breaks things by injecting malicious code / data. Ensure you take a backup of any environment so you can roll-back to a working state after conducting your pen-tests.

.

10. Prod vs Non-Prod Security

In large‐scale OSS programmes the production environment is invariably the project’s critical-path anchor. Every security zone layout, firewall rule and fail-over scenario must survive architecture governance, risk review and even rounds of penetration testing. That rigour is essential for regulatory compliance, security and availability, yet it routinely adds months to the schedule. This includes lead-times for secure network zones, dual-site connectivity and security / architecture / change-board approvals that can even exceed the development effort itself. While production remains the “gold copy”, its very weight slows feedback loops and postpones value realisation.

Our projects need to be faster, so there are a lot of activities that can be done whilst the PROD environments are coming online. This is where non-PROD or pre-PROD or even sandpit environments are a strategic lever to achieving project momentum. 

Non-production tiers (DEV, TEST, PRE-PROD) are used to inject velocity. Standing them up on an isolated cloud tenancy or hosted lab lets the team launch a pilot OSS stack within days rather than months. It allows the team to demonstrate early wins to sponsors (such as gather operational telemetry, designing data models, experimenting with operational processes and much more), which de-risks later design choices. These non-PROD environments are generally designed to expedite the transition of the OSS/BSS tools into the PROD environments when they become available later.

The guiding principle of non-PROD environments is to make them representative not identical. They should aim to provide as much coverage of requirements as possible, whilst also acknowledging that they can’t be a perfect replica of PROD. To be representative of PROD, we may make fast-tracking decisions such as containerised versions of the OSS, a cut-down SD-WAN core, synthetic EMS feeds, emulated/simulated lab environments and masked or generated customer data to produce behaviour close enough for functional, performance and data-migration rehearsals without exposing sensitive information. Depersonalising data satisfies ISO 27002 8.31 and GDPR obligations while avoiding the lengthy approvals associated with live datasets.

Because this pilot realm is air-gapped – physically or via one-way gateways – much of the heavy security stack (pen-tests, PAM vaults, SIEM connectors) can be deferred. At most, a minimal perimeter of jump hosts, role-based access control and pipeline-driven image signing is usually sufficient. We may even decide whether to connect to existing physical / virtual lab environments or generate them separately to remain entirely air-gapped for the Pilot Phase of the OSS/BSS transformation.

That minimalist posture is deliberate: every unnecessary control weakens the “speed premium” these sandboxes are meant to deliver. 

Finally, a disciplined promote-path turns non-PROD momentum into PROD -ready artefacts. Infrastructure-as-Code templates, configuration baselines and automated test suites authored in DEV advance unchanged through TEST and PRE-PROD, accumulating security hardening and audit evidence at each stage. When the production platform eventually arrives, the OSS payload is already proven – installation becomes an execution step rather than an exploration exercise, trimming weeks from cut-over and sharply reducing rollback risk.

To design your Pilot Environment, you may like to go back to Figure 3 above and determine what components you need to include / exclude / simulate and in what trust model.

.

11. Questions for your Security / Infra Leads

As the OSS delivery team is generally dependent upon their organisation’s security and infrastructure teams for stand-up of PROD (and even non-PROD) environments, the following list of questions might help you to identify key requirements for building your Pilot environment and getting it ready for an OSS/BSS/NMS/NE build.

  1. What PROD and non-PROD hosting options already exist (physical lab, virtual sand-box, cloud tenancy) and who owns them
  2. Are there any existing network lab environments (physical or virtual)
  3. Are there any documented operating procedures or “run books” for interacting with those environments
  4. Are there any existing non-PROD environments where our OSS/BSS/NMS/NE builds can be hosted? If so, what mechanisms must be followed (eg ordering hosting, design reviews, budget allocation, etc)
  5. Given the number of trust zones the OSS/BSS/NMS/NE stack will traverse, what sort of lead times would we typically expect on PROD and non-PROD infrastructure builds, with all the associated approvals? More granularly, what are the typical lead-times for each governance gate (architecture board, security design review, change-control window)
  6. What governance gate must a new non-PROD build pass (design review boards, security architecture forum, budget committee)
  7. What design constraints apply to non-PROD (max VM sizes, specific OS baselines, mandatory monitoring agents, cloud region limits)
  8. Can / should we build our OSS/BSS/NMS/NE-lab totally airgapped from any of the organisation’s current PROD or non-PROD infrastructure? If not, why do you recommend not to? eg will it delay transition into PROD later
  9. What zones or security realms will Production components occupy and what traffic is permitted between them
  10. Are there any mandated design obligations (eg TACACS+, RADIUS, privileged access, data-residency or sovereignty rules, etc)
  11. What level of simulated network traffic is permitted (protocol stubs only, full control packets, real NE images)
  12. Are there existing budget lines or project codes we can charge the lab against or must we raise a new CAPEX request
  13. Who will operate the environment day-to-day (deployment squad, central LabOps, managed service partner)
  14. Are there any retention or disposal policies that cover the lab once the pilot is finished
  15. Which information-classification tier will pilot data fall under and what minimum control set is therefore mandatory
  16. Can we use fully depersonalised production extracts or must we synthetically generate datasets for the pilot
  17. What network segmentation will separate DEV TEST and PRE-PROD from the live management plane
  18. Will the pilot sit in an existing sandbox VPC/VLAN or require a new air-gapped tenancy
  19. Are outbound internet connections permitted for package repositories or will we need an offline artefact mirror
  20. What identity source will handle admin authentication for the pilot (local IdP test AD or corporate SSO)
  21. What telemetry must be forwarded to the central log servers / SIEM and what can be stored locally until roll-off
  22. Will the simulated network lab be allowed to originate real control traffic or must it stay protocol-stub only
  23. Do we need formal change-records in the CRB and/or ITSM for every pilot redeploy or is a single agile change authorised upfront
  24. What physical or logical access must field engineers have to the lab and how will that be brokered securely
  25. What is the escalation path if pilot telemetry triggers security alarms that would normally page the on-call SOC
  26. At what point does the pilot environment need to integrate with corporate infra (eg backup identity and monitoring tooling) to prevent last-minute surprises
  27. Does Production need an out-of-band DCN for element management and break-glass access
  28. What else should we be asking you?
  29. I’m sure you have many, many others!

.

12. Useful Security Standards

The following is a list of security standards that I’ve used in the past:

As I mentioned at the start, I’m far from being an expert in the field of network or data security. I’d love to get your feedback if I’m missing anything important!!