For a couple of IETF meetings, the initiative around “YANG-Push & Kafka integration” has been growing and growing, over the last few IETFs there have been packed-room side meetings on the subject. In Nov 2022, I already started explaining the vision behind this ambitious project in this “YANG Push + Apache Kafka + Semantic = Network Visibility for Analytics” blog post. As this initiative gains in influence, visibility, and implementation experience, this new blog explains the latest developments, and how the numerous created IETF drafts relate to each other. It’s true that without looking at the big picture, some of those IETF drafts mentioned below might appear at first glance to be completely independent. However, each of them solves a particular problem in the YANG-Push with Kafka approach.
Let’s start with some background factual information:
- The networking industry has been evolving towards the Data Mesh principles (Data Mesh Principles in the Networking World), understanding the importance of preserving the data semantic along the toolchain, from the collection up to analytics.
- Apache Kakfa, the distributed event streaming for high-performance data pipelines, streaming analytics, and data integration, is used within many/most operator environments these days.
- YANG [RFC 7950] has become the data modeling language for network configuration and monitoring. If not convinced, look at the numerous YANG modules implementations, as documented in the YANG catalog.
- YANG-based telemetry is required for closed loop. This four year old “IETF YANG-push or Openconfig streaming telemetry is the best” blog debated the two solutions and I’ll soon be blogging why YANG-push is a superior solution in my opinion.
- The next big networking challenge is “self-managing networks”. While we know the objectives, THE challenge behind “self-managing networks” is to realize this vision. Have a look at service assurance for intent-based networking (concluded with 2 IETF RFCs), and follow the Digital Map Modelling (work in progress).
Now, let’s review the network analytics architecture (initiated by Thomas Graf), which integrates the YANG-Push exporters, a data collection, a YANG schema registry, Apache Kakfa itself and Time Series Database (TSDB).
Let me cut & post the problem statement from my previous blog: Typically, we stream model-driven telemetry from routers, to a YANG push receiver. From there, the data is forwarded, through a message broker (Apache Kafka being the kind of de facto standard in most operator environments) to a timeseries database for storing, where analytics tools can query and make sense of all the data. In others words, extracting information from data. What is the issue today? At the end of the data pipeline spectrum, where the data scientists are looking at the data, the data semantic is most of the time lost. Yes, they could ask some questions to the network architects (assuming they know questions to ask, which is already difficult) but the end goal is to rely on some automation. We have been spending a great deal of time to specify the YANG objects semantic, with specific data types: how can we preserve this semantic along the data pipeline chain?
In the step 1 (from the picture), where a YANG-Push (RFC 8641) is streamed from the router to the data collection, there are already a couple of proposed improvements:
- draft-ietf-netconf-udp-notif: specifies a UDP-based YANG-Push specifications, with the objective to provide a lightweight approach to enable higher frequency and less performance impact on publisher and receiver processes compared to already established notification mechanisms. Note the draft-ieft-netconf-udp-client-server draft specifies two YANG 1.1 modules to support the configuration of UDP clients and UDP servers, either as standalone or in conjunction with configuration of other protocol layers.
- draft-ietf-netconf-distributed-notif specifies extensions to the YANG notifications subscription to allow metrics being published directly from processors on line cards to the data collection, while subscription is still maintained at the route processor in a distributed forwarding system. Combined with the previous draft, this behavior is more aligned with the IPFIX way of stream records, with UDP, directly from the line cards.
- draft-ahuang-netconf-notif-yang: while developing a prototype, Alex Huang realized that the structure for NETCONF notifications is defined in [RFC 5277] using an XSD, but there is no YANG module defining the structure of the notification message sent by a server when the message is encoded in YANG-JSON [RFC 7951] or YANG-CBOR [RFC 9254]. And obviously, the architecture should support JSON and CBOR (for the binary encoding efficiency) in the future, hence this quick and obvious draft.
In the step 2, the data collection (in this case, pmacct collector, developed by Paolo Lucente) must be able to decode the YANG-Push message, for which it needs the specific (set of) YANG modules required to decode the telemetry message. The set of YANG modules includes: the YANG module itself, the imported YANG module(s) and the augmented YANG modules… and recursivity obviously plays a role here. Typical NMS’ would solve this problem by querying all YANG modules from the router, which is lengthy process, too costly for near real-time telemetry. Doing the query in advance is not a practical solution as the data collection system never knows in advance from which router vendor/type/OS and which YANG-Push message it will receive.
- draft-lincla-netconf-yang-library-augmentation: In her “YANG-Push Integration into Apache Kafka” final thesis, Zhuoyao Lin concluded that we can solve the previous problem by documenting the entire YANG model dependencies in a revised ietf-yang-library [RFC 8525] YANG model. This draft proposes a solution to augment the ietf-yang-library with the (YANG module) augmentation list, allowing to retrieve only the required YANG modules from the YANG-Push exporter, saving precious minutes for detecting network incidents.
In the step 3, 4, 5, and 6 (see the picture), the YANG schema registry ensures that the Kafka producer and consumer can learn from each other the schema and version for each message outbound, before actually sending the data. On that front, there is also some innovation.
- draft-tgraf-netconf-notif-sequencing: When the NETCONF event notification message is forwarded from the receiver to another system, such as the TSDB; the transport context is lost since it is not part of the NETCONF event notification message metadata. Therefore, the TSDB is unable to associate the message to the publishing process (the exporting router), nor able to detect message loss or reordering. The current workaround is to preserve the transport source IP address (assuming it’s THE unique Id we want for the exporter!!!) & sequence numbers of the exporter; next we must encode the information along the toolchain (which impacts the semantic readability of the message btw). Therefore, this draft specifies an augmentation of the NETCONF event notifications for the streaming router implementation, with sysName and sequenceNumber. The sysNames describe from which hostname the YANG-Push were streamed, while the sequenceNumber helps to recognize loss accross the two messaging systems.
- draft-ietf-netconf-yang-notifications-versioning: in live networks, it’s common to have routers with different OS versions, hence supporting different YANG module set and/or revisions. And we know that the semantics can change between different YANG module revisions, as discussed in the IETF Non-backwards Compability (NBC) and SEMVER discussion. Therefore, this draft proposes a new extension with the revision and the semantic version of the YANG push subscription state change, directly in the router. That way, the collection system would absorb this information directly from YANG-Push and the YANG schema registry can encode this important piece of information, solving this issue for the TSDB & analytics were this information will anyway have to be known.
- draft-tgraf-netconf-yang-push-observation-time: exactly like in the previous draft, the YANG schema registry misses one important piece of information, the YANG objects observation timestamping. Indeed, this is required to correlate network data among different network telemetry planes (Network Telemetry for YANG Push, IPFIX and BMP) or among different YANG-push subscription types. With [draft-tgraf-netconf-notif-sequencing], the delay between the YANG Notification export and the arrival at the downstream system storing the data can be measured. With network observation timestamping described in this document, the delay between the network observation and the data export of the YANG push publisher process can be measured as well, extending the delay measurement scope from the time the network observation and storing the data.
A couple of points regarding the opensource implementation, which is separated per draft and component:
- For draft-ietf-netconf-udp-notif, there is an UDP-notif mock generator and udp-notif collector.
- The confluent schema registry has been extended to accept YANG modules but this is still work in progress.
- libyangpush, an external library for pmacct, enables the communication between collector and schema registry.
- A demo of the libyangpush has been presented at IETF117 Hackthon. It showed a working schema registry collaborating with libyangpush to process YANG Push messages and create the corresponding schemas.
One extra important piece of information: The brand new IETF Network Management Operations (NMOP) Working Group (WG).
In a nutshell, this is an operator-driven WG focusing on existing and anticipated operational issues arising from the near-term deployment of network management technologies, and to consider potential solutions or workarounds for those issues. The current topics of focus for the working group are:
1. NETCONF/YANG Push integration with Apache Kafka & time series databases
2. Anomaly detection and incident management
3. Issues related to deployment/usage of YANG topology modules (e.g., to model a Digital Map)
4. Consider/plan an approach for updating RFC 3535-bis (collecting updated operator requirements for IETF network management solutions).
Looking at the first WG topic, there is now an official IETF venue in the form of this new Working Group. I invite you to subscribe to the NMOP mailer, and to participate on the mailing list or at the IETF hackathon.
The partners in crime (sorry in advance if I forgot some: there are way more people in this extended team): Thomas Graf, Marco Tollini, Ahmed Elhassany, Wanting Du, Leonardo Rodoni, Paolo Lucente, Camilo Cardona, Pierre Francois, Alex Huang Feng, Maxence Younsi, Olga Havel, Jean Quilbeuf, Zhuoyao LIN