Description of problem: Earlier we were able to scrape the OVS metrics like CPU and Memory utilization from Prometheus, with the recent change it runs on the host itself and the metrics are not exported Expected results: Able to get the OVS metrics to Prometheus
Since in previous versions we had OVS metrics, we should consider this a regression.
Note that this is not specific to ovnkube, but applies to SDN too.
Part of the ovn-kubernetes solution is https://github.com/ovn-org/ovn-kubernetes/pull/2723 . Still need to work out with upstream whether we enable the metrics as part of ovnkube-node (which we already expose with kbue-rbac-proxy) or whether it's done as part of a another container with the standalone upstream metrics executable. It could be a different daemonset that both SDN and OVN can use, but that's a lot more work/book-keeping (different image too) and the code isn't huge, so can be duplicated for SDN as well.
This was enabled for ovn-kubernetes by https://github.com/openshift/cluster-network-operator/pull/1393 and the core functionality was present in ovnkube since the beginning of 4.11.
To be clear, this has been enabled & available in 4.11 ovnkube/CNO since late April 2022. Just missed the tie between PR and bug.
Tested and verified in 4.11.0-rc.4 sh-4.4# curl 127.0.0.1:29105/metrics | grep -E 'process_virtual_memory_bytes|process_cpu_seconds_total|process_virtual_memory_max_bytes|process_resident_memory_bytes' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3# HELP ovs_db_process_cpu_seconds_total Total user and system CPU time spent in seconds. 13# TYPE ovs_db_process_cpu_seconds_total counter 80ovs_db_process_cpu_seconds_total 1.63 0 31380 0 0 74# HELP ovs_db_process_resident_memory_bytes Resident memory size in bytes. 7k# TYPE ovs_db_process_resident_memory_bytes gauge ovs_db_process_resident_memory_bytes 3.2751616e+07 0 --:--:-- --:--:-- --:--:-- 766k # HELP ovs_db_process_virtual_memory_bytes Virtual memory size in bytes. # TYPE ovs_db_process_virtual_memory_bytes gauge ovs_db_process_virtual_memory_bytes 9.6464896e+07 # HELP ovs_db_process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes. # TYPE ovs_db_process_virtual_memory_max_bytes gauge ovs_db_process_virtual_memory_max_bytes 1.8446744073709552e+19 # HELP ovs_vswitchd_process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE ovs_vswitchd_process_cpu_seconds_total counter ovs_vswitchd_process_cpu_seconds_total 25.24 # HELP ovs_vswitchd_process_resident_memory_bytes Resident memory size in bytes. # TYPE ovs_vswitchd_process_resident_memory_bytes gauge ovs_vswitchd_process_resident_memory_bytes 1.95637248e+08 # HELP ovs_vswitchd_process_virtual_memory_bytes Virtual memory size in bytes. # TYPE ovs_vswitchd_process_virtual_memory_bytes gauge ovs_vswitchd_process_virtual_memory_bytes 7.06281472e+08 # HELP ovs_vswitchd_process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes. # TYPE ovs_vswitchd_process_virtual_memory_max_bytes gauge ovs_vswitchd_process_virtual_memory_max_bytes 1.8446744073709552e+19 sh-4.4# exit exit [weliang@weliang ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-rc.4 True False 24m Cluster version is 4.11.0-rc.4 [weliang@weliang ~]$
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069