What is an autonomously managed network and how does it differ from SDN? Huawei, together with the consulting company IDC, studied the criteria for evaluating the network infrastructure in terms of its ability to maintain its own operation without the help of an administrator.
What do customers want to see in a data center network infrastructure? It should of course be efficient, reliable and easy to maintain.It would be wonderful if the network configured and maintained itself. Modern SDN-controllers can do more and more, but how to assess their level of automation? How to classify this autonomy?
To answer these questions, we turned to the consulting company IDCand asked them to conduct a study, the results of which would help to understand how to characterize the autonomous management of a particular network and how to evaluate the effectiveness of such an implementation. Colleagues from IDC responded to our proposal and came to some interesting conclusions.
We should start with the context, namely the total digitalization that is sweeping the world. It requires modernization of both infrastructure and workflows. And the driving force behind this transformation is cloud technology.
Meanwhile, the cloud should not be viewed simply as a place to perform workloads. It is also a specific approach to work that involves a high level of automation. According to IDC analysts, we are entering an "era of multiple innovations." Companies are investing in technologies such as artificial intelligence, the Internet of Things, blockchain, and natural interfaces. But the ultimate goal is precisely the autonomy of systems and infrastructures. This is the context in which the prospects for data center networks should be evaluated.
The diagram shows the process of network automation, which is divided into several successive steps. It starts with the command line interface and the creation of scripts. The next step is the emergence of network factories to improve speed and performance. Next comes SDN controllers and virtualization tools. This stage also introduces tools for orchestration and automation of data center networks.
A qualitatively new level is the move to intent-based networking. But the goal of this progress is to create a fully autonomous network controlled by artificial intelligence. All market participants are considering this goal in one way or another.
What is network autonomy and how do we measure it? IDC has proposed a six-level model that allows you to accurately attribute a particular solution to one or another level of autonomy.
- Level 0. At this stage, the network is managed only through manual processes throughout the life cycle of the network. The network is not automated.
- Level 1. Network management is still predominantly manual throughout the network lifecycle.
- Level 2. Partial automation appears in some scenarios and is combined with standard policy analysis and management tools.
- Level 3. "Conditional Automation." The system already knows how to give recommendations and instructions, accepted or rejected by the operator.
- Level 4. The network is largely automated and autonomous. It is controlled by declarative methods based on intent. The operator only receives event notifications and makes decisions about accepting or rejecting network recommendations.
- Level 5. The network is fully automated and autonomous throughout its lifecycle. It is capable of independently enforcing policies, troubleshooting, and restoring services.
So what are the main challenges faced by a company innovating in data center networks? According to IDC, based on surveys of IT experts, the first and second places are aligning the level of network automation with the level of computing and storage automation, as well as ensuring flexibility, that is, the ability of the network to support mixed workloads and environments.
In third place is the problem of automating the network infrastructure, which, as is most often the case, is assembled from the products of different vendors. Here you need a management tool that can bring together the whole "zoo" of solutions and make it work in accordance with the required level of autonomy. Yet 90% of those surveyed agree that autonomy is a goal for their organization.
IDC research shows that autonomous network management is a hot trend, which in one way or another involves up to half of all companies involved in developing their IT infrastructure.
Let’s look at financial services companies as an example of digital transformation. Offline sales have declined radically over the past year, and financial institutions have been among the first to react.
Companies quickly shifted a significant portion of their activity to apps by organizing digital sales there. This made it possible to compensate for the drop in the offline channel in a short period of time and maintain revenue. At the same time, automation made it possible to minimize the level of mistakes made by company employees and noticeably accelerate a significant part of business processes.
At the same time, innovations in customer service entailed increasing the complexity of the IT infrastructure and the frequency of changes made to it. Up to 50 percent of the complex problems now being reported in data centers are in one way or another due to limitations of both the network resources themselves and the resources of the administration team.
Employees spend most of their time performing routine operations, although the workload associated with the implementation of new services is constantly growing. They require testing, checking for interoperability with other services, etc. Any implementation carries the risk of destroying what is already working. As a result, staff gets overwhelmed.
Perhaps this explains the following figure : Up to 40% of complex problems in data centers are caused by human error. Any changes in the network, such as the launch of new applications, deployment of services, etc., require a lot of attention and multiple checks, which are not always enough time. The result can be a serious accident in the data center.
How long does it take to solve a problem? Our data shows that on average it takes almost 80 minutes just to detect a problem. And these faults are not always related to physical devices. They can be at the level of protocols, service availability, etc.
The bottom line is that network support works day and night, but is still a target for numerous complaints. There would be no basis for many of them if the data center network gained some autonomy.
Let’s go back to IDC’s classification of autonomy levels. Here is a list of the capabilities that the network must exhibit at each of these levels. Solution Huawei Autonomous Driving Network meets all the requirements of Level 3. It can maintain its operation fully automatically, including starting and stopping processes, configuring equipment, and so on. In addition, our ADN is fully compliant with the awareness criterion, getting real-time information about the status of devices, processes, applications and services.
In partially automatic mode, ADN is able to analyze what is happening in the network, identifying the causes of events and offering recommendations for fixing them. By 2023, we plan to add a feedback function to ADN’s capabilities.
The management system will learn how to deal with problems in the network using practices that have proven effective in other similar infrastructures, including those owned by other companies.
According to their roadmap by 2028, we will have a system fully compliant with Level 5 autonomy.
So what will be the effect of implementing autonomous network management? Let’s start with the network design. When using Huawei Autonomous Driving Network, the customer does not need to manually create the architecture or design and configure the devices. The system only asks for the number of devices and links of a certain bandwidth to be used. It then automatically assembles the network infrastructure and offers it as a complete solution. The customer immediately gets a fully operational data center factory.
But it’s not enough to get a network infrastructure. It needs to keep virtual machines, applications and other processes running, each of which has its own bandwidth requirements. A standalone network can analyze the load and make recommendations on the optimal organization of information flows.
During operation, ADN continuously checks the traffic flow by, among other things, detecting the mutual influence of different services on each other. This allows the quality of the network to be improved in real-time by removing the bottlenecks that occur.
Optimization is carried out continuously. If the system detects service degradation, it immediately informs the operator, who only needs to make a predetermined decision. If, for instance, ADN notices the degradation of an optical module, it counts the number of affected processes and offers to activate a reserve channel.
All of the above features enable ADN to play a critical role in saving time for the network support staff by freeing them up for higher-level tasks.
The strength of the Huawei Autonomous Driving Network is that it is not just software that can be installed and receive service. The system implements a three-tiered model, with the base layer already located at the processor level of the switching and routing end devices. These hardware and software elements perform tasks of data collection and analysis, as well as switching of streams and frames. Equipped with such a processor, the switch transmits real-time information in the direction of the software platform, which in our case is iMaster NCE
It is the architecture of our ADN that sets it apart from other comparable products. Integration with hardware elements allows for unique depth of analysis, making it possible to implement processes of automatic network design setup, network device installation, etc. For example, you can create a "virtual twin" of the application and verify the service in the existing infrastructure. The result will be a detailed report that includes a list of potential trouble spots.
It remains to be noted that ADN is a service-oriented solution, making extensive use of cloud capabilities. We mentioned above that at the fifth level of autonomy, the network must be able to use troubleshooting algorithms shaped by the experience of other customers and industry experts. It is from the cloud that ADN will soon learn to get solutions for certain network problems identified based on signatures.
The approaches used to create ADN bring to mind once again our 1-3-5 principle: any problem on the network should be detected in one minute, localized in three minutes, and fixed in five minutes.
To summarize. Of course, ADN is the successor to the solutions laid down in SDN. It was a necessary stage in the development of the technology, but it had some shortcomings. First, the use of software-defined networks meant manual initial configuration of devices. Secondly, error detection also fell on the shoulders of network support specialists. Thirdly, in the case of SDN, of course, there was no question of automatically applying recovery scenarios derived from the cloud knowledge base. In creating its ADN solution, Huawei wanted our customers to free themselves from these tasks, focusing on what really needs attention.