Assurance Reimagined for Telecom Operations Teams

5G 
Dec 1, 2022

5G brings huge changes to telecom networks, from open radio networks to ultra-low latencies to customizable network slices and more. Communication service providers (CSPs) are banking on these innovations to disrupt the value chain for digital services - and the role they play within it. But it’s the engineers staffing telco Network Operations Centers (NOCs) who must grapple with what this disruption means in practice.

For those in the trenches, open and disaggregated cloud-native architectures aren’t just buzzwords. They’re massive changes to the way telco networks behave and what a telco infrastructure even is. Before the industry can succeed in this transition—before we can deliver on 5G’s incredible promise—we need to reimagine day-to-day operations for a 5G world. That process starts with assuring 5G networks and services. 

A Need for New Approaches

The move to 5G represents a radical departure from the past. Until now, telco networks were generally composed of self-contained, self-managed pieces of (mostly hardware) infrastructure. With 5G, virtualization and disaggregation are no longer optional. They’re baked into the 3GPP specification itself.  Suddenly, network engineers must contend with an infrastructure that’s exponentially more complex.

Network functions are now implemented as containerized microservices, running in a Kubernetes-orchestrated service mesh spanning multiple clouds. The network also includes many more vendors, releasing more frequent software updates, and creating an environment that continuously changes. It shouldn’t be surprising that assurance tools built for simpler, more static conditions won’t keep up with this larger, dynamic environment. Legacy assurance approaches struggle with:

  • 5G service-level agreements (SLAs): With network slicing, operators can offer services customized for ultra-low latency, Internet of Things (IoT) connectivity, and other unique application requirements under more profitable SLAs. But to deliver on those SLAs, they need to be able to detect service degradations before they affect customers. 
  • Highly dynamic networks: CSPs have long used passive assurance to monitor different components of a service, but this approach is no longer sufficient. How can you instrument a static monitoring infrastructure for a network with exponentially more physical and virtual connections, that are constantly in flux? Trying to map out service implications if specific components or combinations of components change becomes nearly impossible. Passive assurance also depends on live customer traffic, so any intervention happens after a problem has already impacted users and live services. 
  • Isolated network and service assurance: Traditionally, assurance responsibilities have been split between network assurance (for physical and logical infrastructure) and service-level assurance (for applications). That won’t work when operators must continuously monitor 5G slices for stringent SLA targets. Operators need real-time, end-to-end visibility across all network and application layers, and the ability to test services holistically—prior to activation, once they’re live, and anytime the infrastructure changes. 
  • Sporadic issues: Problems affecting a handful of users, or one or two sites or types of traffic, are notoriously difficult to diagnose. In 5G environments, the problem gets even harder as infrastructure and services move closer to the edge to cater to a smaller, concentrated number of users who may be under an SLA agreement. Additionally, modern infrastructures are so resilient and so good at self-healing and adapting around issues, they can quickly obscure underlying problems. To application-level service assurance systems, everything continues running normally. But in the underlying infrastructure, the network may be down a path, reducing fault tolerance and redundancy. Unaddressed, those problems can eventually become single points of failure—and turn into massive outages. 

Introducing Active Assurance

Today, CSPs are adopting an approach better suited to dynamic 5G infrastructures: active assurance. Active assurance injects synthetic traffic into the network to mimic user behavior - using the same authentication, following the same network paths, and running the same applications as real end-user devices. This gives CSPs a firsthand view of the service experience, even in highly dynamic environments.

Unlike passive monitoring, active assurance requires no significant infrastructure. The approach uses cloud-native virtual test agents that can be dynamically allocated and scaled with continually changing 5G networks. 

By running active tests automatically and for on-demand troubleshooting, CSPs can continually test:

  • Data performance (throughput and latency) 
  • Network function performance for specific traffic types and loads
  • Audio quality, simulating actual voice conversations 
  • Performance of any application, from anywhere in the network

Active testing can integrate into the Continuous Integration/Continuous Deployment (CI/CD) toolchains operators use to manage software updates in 5G networks. As network engineers spin up a new service, the system can automatically add test profiles for that service into active assurance, as part of the service catalog.

Active Assurance Advantages

Passive monitoring will continue to play a role in the NOC. But by augmenting traditional tools with active testing, CSPs can:

  • Keep pace with dynamic infrastructure: Workloads or paths or virtual infrastructure components in 5G networks might exist one minute and disappear the next. With active monitoring, the assurance infrastructure follows the data path wherever it leads. 
  • Maintain SLAs in changing networks: When initially implementing active assurance, operators will typically program the system for specific thresholds to meet SLA targets. Over time, however, these systems can use artificial intelligence and machine learning (AI/ML) capabilities to baseline what healthy behavior looks like, and then continually adapt as the network changes. The network can continually, autonomously reprogram itself to maintain SLA compliance, without human operators having to re-key thresholds. 
  • Quickly diagnose issues: Some assurance platforms also use ML capabilities to assist in issue identification and resolution. While this was once a “nice-to-have” capability, it’s essential in highly complex and dynamic 5G networks. For example, operators could never staff enough human engineers to identify degradation as it occurs in an IoT deployment with thousands of actively communicating endpoints. But ML tools continuously performing statistical analyses can. When the system does identify early signs of degradation, it can run diagnostic tests to isolate the source of the issue. This reduces the need for human intervention, accelerates troubleshooting, and in many cases, allows operators to fix problems before they impact customers and SLAs. 
  • Isolate sporadic issues: With the ability to follow any path in the network end to end, active testing is excellent at sub-segmenting the infrastructure and correlating anomalies with higher-layer applications. This capability is essential for supporting SLAs for services like robotic automation, high-speed trading, or low-latency gaming. But it’s also helpful for sporadic issues, where it can isolate underlying problems even as the network adapts around them, and help CSPs ensure that seemingly minor issues don’t become massive outages. 

Getting Back to Basics

The buzz around 5G isn’t hype. Emerging telco networks have so much promise to change the world in ways that weren’t possible before, due to the limitations of underlying network technologies. With the ability to customize services for new levels of capacity, performance, and reliability, 5G networks can fuel a new generation of transformative innovations. But it’s not a foregone conclusion that they will. 

Before CSPs can deliver on futuristic use cases, they’ll need to make sure they’re covering the basics—like being able to verify that services are performing as expected and reliably detecting problems as they arise. With active testing, they can build assurance that’s as dynamic and flexible as 5G networks.

Click here to learn more about 5G on everything RF.

Click here to learn more about Spirent Communications on everything RF.

 

Contributed by :