Bug 2002868
| Summary: | Node exporter not able to scrape OVS metrics | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Mohit Sheth <msheth> |
| Component: | Networking | Assignee: | Dan Williams <dcbw> |
| Networking sub component: | ovn-kubernetes | QA Contact: | Anurag saxena <anusaxen> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | spasquie, trozet, weliang, zzhao |
| Version: | 4.9 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.11.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | perfscale-ovn | ||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 10:37:25 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Mohit Sheth
2021-09-09 21:43:00 UTC
Since in previous versions we had OVS metrics, we should consider this a regression. Note that this is not specific to ovnkube, but applies to SDN too. Part of the ovn-kubernetes solution is https://github.com/ovn-org/ovn-kubernetes/pull/2723 . Still need to work out with upstream whether we enable the metrics as part of ovnkube-node (which we already expose with kbue-rbac-proxy) or whether it's done as part of a another container with the standalone upstream metrics executable. It could be a different daemonset that both SDN and OVN can use, but that's a lot more work/book-keeping (different image too) and the code isn't huge, so can be duplicated for SDN as well. This was enabled for ovn-kubernetes by https://github.com/openshift/cluster-network-operator/pull/1393 and the core functionality was present in ovnkube since the beginning of 4.11. To be clear, this has been enabled & available in 4.11 ovnkube/CNO since late April 2022. Just missed the tie between PR and bug. Tested and verified in 4.11.0-rc.4
sh-4.4# curl 127.0.0.1:29105/metrics | grep -E 'process_virtual_memory_bytes|process_cpu_seconds_total|process_virtual_memory_max_bytes|process_resident_memory_bytes'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 3# HELP ovs_db_process_cpu_seconds_total Total user and system CPU time spent in seconds.
13# TYPE ovs_db_process_cpu_seconds_total counter
80ovs_db_process_cpu_seconds_total 1.63
0 31380 0 0 74# HELP ovs_db_process_resident_memory_bytes Resident memory size in bytes.
7k# TYPE ovs_db_process_resident_memory_bytes gauge
ovs_db_process_resident_memory_bytes 3.2751616e+07
0 --:--:-- --:--:-- --:--:-- 766k
# HELP ovs_db_process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE ovs_db_process_virtual_memory_bytes gauge
ovs_db_process_virtual_memory_bytes 9.6464896e+07
# HELP ovs_db_process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE ovs_db_process_virtual_memory_max_bytes gauge
ovs_db_process_virtual_memory_max_bytes 1.8446744073709552e+19
# HELP ovs_vswitchd_process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE ovs_vswitchd_process_cpu_seconds_total counter
ovs_vswitchd_process_cpu_seconds_total 25.24
# HELP ovs_vswitchd_process_resident_memory_bytes Resident memory size in bytes.
# TYPE ovs_vswitchd_process_resident_memory_bytes gauge
ovs_vswitchd_process_resident_memory_bytes 1.95637248e+08
# HELP ovs_vswitchd_process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE ovs_vswitchd_process_virtual_memory_bytes gauge
ovs_vswitchd_process_virtual_memory_bytes 7.06281472e+08
# HELP ovs_vswitchd_process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE ovs_vswitchd_process_virtual_memory_max_bytes gauge
ovs_vswitchd_process_virtual_memory_max_bytes 1.8446744073709552e+19
sh-4.4# exit
exit
[weliang@weliang ~]$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-rc.4 True False 24m Cluster version is 4.11.0-rc.4
[weliang@weliang ~]$
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |