Description of problem: Need non-packet counters like pmd stats to be published via ovsdb
Aaron, I'm reviewing this BZ, and can you state a bit more why you want these in the DB and how often they should be updated? Asking this as we have other BZs where people complain about the inconsistency of the "real-time" stats and the data in the DB. Also what kind of data do you want in, just PMD stats (guess the stuff from the ovs-appctl dpif-netdev/pmd-perf-show, which has a time stamp, and duration)? Anything else?
The initial RFE filed by Franck Baudin is specific to OVS-DPDK. Telco customers would need stats about PMD threads to balance workloads across OSP compute nodes properly. Back then, they would have to call ovs-appctl and parse the textual output in order to retrieve those stats for OVS-DPDK. "This prevents monitoring systems like collectd to publish those statistics, as collectd is extracting ovsdb stats. So if you want a meaningful OVS-DPDK metric collection, you need those stats. But so far, customers have not even tried to set-up such a monitoring chain." (Franck, 2023-08-30). With Franck's move to OCP, the current contacts for this RFE are Christophe Fontaine <cfontain> and Robin Jarry <rjarry>. Their current discussions in this context revolve around (a) how to expose additional stats related to DPDK and OVS-DPDK with the DPDK telemetry socket, (b) how to expose stats about OVN's logical ports (not OVS' or OVS-DPDK's metrics) from ovn-controller, (c) how to export stats to Prometheus for use in RHOSP's Service Telemetry Framework (STF). The sockets discussion from (a) is closely related to this RFE. Exposing stats via ovsdb has downsides: (I) Stats would always be pushed to ovsdb regardless of whether they are actually needed or not. (II) Stats would be pushed at predefined intervals which might not offer/match the granularity which the user needs. (III) OSP's current architecture with OVN/OVS already has scalability issues because every state change such as a VM powering on or off causes updates to OVN's southbound db. Updating it frequently with additional stats could worsen this bottleneck. (A workaround in discussion is adding a third ovsdb instance, besides OVN's nb db and sb db, solely for aggregating stats.) In contrast, a socket based approach like the DPDK telemetry socket does not suffer from these issues. Robin shared an example script which shows how to use the DPDK telemetry socket, i.e. query its JSON API to export live traffic stats for DPDK: https://git.sr.ht/~rjarry/dotfiles/tree/main/item/bin/dpdk-port-stats.py He also shared a dummy exporter which shows how to export data from DPDK's telemetry socket to Prometheus, e.g. for future use in STF: https://github.com/rjarry/dpdk/blob/main/usertools/prometheus-dpdk-exporter.py However, "a caveat about hardware stats is that very few DPDK drivers expose per RXQ statistics. You can only rely on software statistics (as OVS does for auto rebalancing) to have per RXQ stats" (Robin, 2023-08-30). Moreover, patches to expose OVS-DPDK specific stats through the DPDK telemetry socket could meet resistance from the DPDK community if they do not apply to DPDK in general. A generic OVS telemetry socket would allow us to expose, for example, all stats related to netdevs etc. in a machine-readable way, similar to what "ovs-appctl dpif-netdev/pmd-perf-show" does for humans. How and where to implement a similar socket for OVS(-DPDK) will require a upstream discussion with OVS community. Questions in this context would be: * Is there community interest in machine-readable stats? * Who are potential users of OVS or OVS-DPDK stats? * How do we want to implement a socket for exposing non-DPDK-specific OVS stats? * Which stats get exposed through the socket? * How would the API look like? JSON API or a binary API such as Cap'n Proto or Protocol Buffers? * What would be exposed through DPDK telemetry socket, what through the generic OVS socket?
(In reply to Jakob Meng from comment #3) > The initial RFE filed by Franck Baudin is specific to OVS-DPDK. Telco > customers would need stats about PMD threads to balance workloads across OSP > compute nodes properly. Back then, they would have to call ovs-appctl and > parse the textual output in order to retrieve those stats for OVS-DPDK. > "This prevents monitoring systems like collectd to publish those statistics, > as collectd is extracting ovsdb stats. So if you want a meaningful OVS-DPDK > metric collection, you need those stats. But so far, customers have not even > tried to set-up such a monitoring chain." (Franck, 2023-08-30). > > With Franck's move to OCP, the current contacts for this RFE are Christophe > Fontaine <cfontain> and Robin Jarry <rjarry>. > > Their current discussions in this context revolve around > (a) how to expose additional stats related to DPDK and OVS-DPDK with the > DPDK telemetry socket, > (b) how to expose stats about OVN's logical ports (not OVS' or OVS-DPDK's > metrics) from ovn-controller, > (c) how to export stats to Prometheus for use in RHOSP's Service Telemetry > Framework (STF). > > The sockets discussion from (a) is closely related to this RFE. Exposing > stats via ovsdb has downsides: > (I) Stats would always be pushed to ovsdb regardless of whether they are > actually needed or not. > (II) Stats would be pushed at predefined intervals which might not > offer/match the granularity which the user needs. > (III) OSP's current architecture with OVN/OVS already has scalability issues > because every state change such as a VM powering on or off causes updates to > OVN's southbound db. Updating it frequently with additional stats could > worsen this bottleneck. (A workaround in discussion is adding a third ovsdb > instance, besides OVN's nb db and sb db, solely for aggregating stats.) > > In contrast, a socket based approach like the DPDK telemetry socket does not > suffer from these issues. Robin shared an example script which shows how to > use the DPDK telemetry socket, i.e. query its JSON API to export live > traffic stats for DPDK: > > https://git.sr.ht/~rjarry/dotfiles/tree/main/item/bin/dpdk-port-stats.py > > He also shared a dummy exporter which shows how to export data from DPDK's > telemetry socket to Prometheus, e.g. for future use in STF: > > https://github.com/rjarry/dpdk/blob/main/usertools/prometheus-dpdk-exporter. > py > > However, "a caveat about hardware stats is that very few DPDK drivers expose > per RXQ statistics. You can only rely on software statistics (as OVS does > for auto rebalancing) to have per RXQ stats" (Robin, 2023-08-30). Moreover, > patches to expose OVS-DPDK specific stats through the DPDK telemetry socket > could meet resistance from the DPDK community if they do not apply to DPDK > in general. > > A generic OVS telemetry socket would allow us to expose, for example, all > stats related to netdevs etc. in a machine-readable way, similar to what > "ovs-appctl dpif-netdev/pmd-perf-show" does for humans. > > How and where to implement a similar socket for OVS(-DPDK) will require a > upstream discussion with OVS community. Questions in this context would be: > * Is there community interest in machine-readable stats? > * Who are potential users of OVS or OVS-DPDK stats? > * How do we want to implement a socket for exposing non-DPDK-specific OVS > stats? > * Which stats get exposed through the socket? > * How would the API look like? JSON API or a binary API such as Cap'n Proto > or Protocol Buffers? > * What would be exposed through DPDK telemetry socket, what through the > generic OVS socket? I do not think OVS needs another socket for statistics, we already have a defined API, which we should use. The Python library has some good examples of its use, https://github.com/openvswitch/ovs/tree/master/python/ovs. If I remember correctly Adrian was looking at making some of the existing outputs machine-readable, maybe we can extend this for statistics? From the RFE list: │ 1916259 │ e--- │ P:--,PM:0 │ amorenoz │ openvswitch2.13 │ ASSIGNED │ [RFE] Output ofproto/trace in machine-readable format │ 1922269 │ e--- │ P:--,PM:0 │ amorenoz │ openvswitch2.13 │ ASSIGNED │ [RFE] Print flows in machine-readable format Adding another yet another API for communicating to OVS was tried by others, and was not liked due to maintaining multiple implementations of APIs.
(In reply to Aaron Conole from comment #0) > Description of problem: > > Need non-packet counters like pmd stats to be published via ovsdb I think the description of the problem is leaning towards a possible solution. Considering we already have theses pmd stats, IMHO, the problem is: "We need more statistics (like PMD stats) in a machine-readable format" Two alternatives have been suggested: exposing them in OVSDB and using the json-rpc unix socket. While OVSDB makes sense for some stats that are tightly related to already-existing OVSDB objects (like Interfaces), that might not be true for things like PMD threads dpif-specific stuff like revalidation, upcalls, etc. Also, the update interval is configurable but this configuration is global, we would have to split it. Supporting json through the json-rpc unix socket would have the benefit of granularity and flexibility. However, having two interfaces for statistics might be confusing so if this option is chosen we could consider moving some/all current statistics to this interface.
(In reply to Eelco Chaudron from comment #4) > > I do not think OVS needs another socket for statistics, we already have a > defined API, which we should use. The Python library has some good examples > of its use, https://github.com/openvswitch/ovs/tree/master/python/ovs. > Eelco, are you referring to the json-rpc API here?
We discussed various approaches to retrieve statistics from OVS and OVS-DPDK, in particular UNIX socket, OVSDB queries and JSON output from ovs-xxx tools, with Eelco Chaudron, Ilya Maximets, Kevin Traynor and Robin Jarry. Robin favored the UNIX socket approach over OVSDB because it would allow to retrieve a specific set of metrics in real-time. Ilya countered that OVSDB would already be used to store statistics and it would be stored in memory, only configuration options would be written to disk. He argued that the user should change its queries to not ask for raw counters and instead leave computation of aggregated metrics to OVS which was opposed by Robin due to its inflexibility. The UNIX socket approach was rejected by Eelco and Ilya because the existing JSON-RPC based API should be used which other ovs-xxx utilities consume. Querying the OVSDB was opposed by Robin because its data could be up to five seconds old (depending on configured update interval) while he needs reliably timed metrics. Atm OVSDB does not report a timestamp, so age of statistics are unknown. However, a timestamp could be easily added to OVSDB schema. Initially Ilya rejected the idea to provide (machine-readable) JSON output instead of (human-readable) plain-text in ovs-xxx tools because he was afraid that a user-faced JSON API would require OVS developers to define a stable API. Robin reassured Ilya that the JSON output would not have to be fixed in time because breaking changes would break telemetry in worst case anyway. Any machine-readable API is better than parsing plain-text even it the output is not garanteed to be stable even across minor releases. Without the need to commit to a stable API for JSON output, Ilya agreed to follow this approach. We agreed to develop a proof of concept which will show how JSON output could be implemented for ovs-xxx tools. As examples we would start with commands 'ovs-appctl dpctl/show -s' and 'ovs-appctl dpif-netdev/pmd-perf-show'.
First proof of concept for ovs-appctl with JSON output format has been proposed: https://patchwork.ozlabs.org/project/openvswitch/patch/20231012100425.14467-1-jmeng@redhat.com/ It uses 'ovs-appctl dpif/show' as an example instead of 'ovs-appctl dpctl/show -s' and 'ovs-appctl dpif-netdev/pmd-perf-show' because the former is much easier/shorter to implement and test.
Second proof of concept for ovs-appctl with JSON output format has been proposed, following a different approach: https://patchwork.ozlabs.org/project/openvswitch/patch/20231020093954.1410995-1-jmeng@redhat.com/ Compared to the previous POC, this patch adds a global option for JSON output instead of implementing it for each command separately.
Closing this ticket, for tracking further enhancements we have created FDP-847.