Bug 2111588 - [4.10z] Export OVS metrics
Summary: [4.10z] Export OVS metrics
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.10.z
Assignee: Tim Rozet
QA Contact: Weibin Liang
URL:
Whiteboard:
Depends On: 2111587
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-27 14:58 UTC by Tim Rozet
Modified: 2022-11-09 10:51 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2111587
Environment:
Last Closed: 2022-11-09 10:51:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1599 0 None open Bug 2111588: Enable OVS metrics for ovnk 2022-10-21 21:12:05 UTC
Github openshift ovn-kubernetes pull 1215 0 None open Bug 2111588: [release-4.10] export OVS metrics 2022-07-30 01:00:24 UTC
Red Hat Product Errata RHBA-2022:7298 0 None None None 2022-11-09 10:51:05 UTC

Comment 2 Weibin Liang 2022-08-15 14:39:09 UTC
https://github.com/openshift/ovn-kubernetes/pull/1215 should be merged in https://amd64.ocp.releases.ci.openshift.org/releasestream/4.10.0-0.nightly/release/4.10.0-0.nightly-2022-08-12-014102

Follow the testing steps in https://bugzilla.redhat.com/show_bug.cgi?id=2111587#c2, but verification test failed in 4.10.0-0.nightly-2022-08-15-084939

[weliang@weliang ~]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-08-15-084939   True        False         32m     Cluster version is 4.10.0-0.nightly-2022-08-15-084939
[weliang@weliang ~]$ oc exec ovnkube-master-5lblp -- curl 127.0.0.1:29105/metrics | grep -i "ovs_db\|ovs_vswitch"
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 36031    0 36031    0     0   748k      0 --:--:-- --:--:-- --:--:--  748k
[weliang@weliang ~]$ oc exec ovnkube-master-hsk2b -- curl 127.0.0.1:29105/metrics | grep -i "ovs_db\|ovs_vswitch"
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 36247    0 36247    0     0   599k      0 --:--:-- --:--:-- --:--:--  599k
[weliang@weliang ~]$ oc exec ovnkube-master-xb6gw -- curl 127.0.0.1:29105/metrics | grep -i "ovs_db\|ovs_vswitch"
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 35977    0 35977    0     0   516k      0 --:--:-- --:--:-- --:--:--  516k

# Can get OVN related metrics:
[weliang@weliang ~]$ oc exec ovnkube-master-xb6gw -- curl 127.0.0.1:29105/metrics | grep -i "ovn_northd"
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 35971    0 35971    0     0   675k      0 --:--:-- --:--:-- --:--:--  675k
# HELP ovn_northd_build_flows_ctx_95th_percentile 
# TYPE ovn_northd_build_flows_ctx_95th_percentile gauge
ovn_northd_build_flows_ctx_95th_percentile 0
# HELP ovn_northd_build_flows_ctx_long_term_avg 
# TYPE ovn_northd_build_flows_ctx_long_term_avg gauge
ovn_northd_build_flows_ctx_long_term_avg 0
# HELP ovn_northd_build_flows_ctx_maximum 
# TYPE ovn_northd_build_flows_ctx_maximum gauge
ovn_northd_build_flows_ctx_maximum 0
# HELP ovn_northd_build_flows_ctx_minimum 
# TYPE ovn_northd_build_flows_ctx_minimum gauge

Comment 5 Weibin Liang 2022-10-31 14:30:24 UTC
Tested and verified in 4.10.0-0.nightly-2022-10-31-100546

[weliang@weliang ~]$ oc exec ovnkube-master-25c4f -- curl 127.0.0.1:29105/metrics | grep -i "ovs_db\|ovs_vswitch"
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 51351    0 51351    0     0   659k      0 --:--:-- --:--:-- --:--:--  651k
# HELP ovs_db_process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE ovs_db_process_cpu_seconds_total counter
ovs_db_process_cpu_seconds_total 2.91
# HELP ovs_db_process_max_fds Maximum number of open file descriptors.
# TYPE ovs_db_process_max_fds gauge
ovs_db_process_max_fds 1024
# HELP ovs_db_process_open_fds Number of open file descriptors.
# TYPE ovs_db_process_open_fds gauge
ovs_db_process_open_fds 27
# HELP ovs_db_process_resident_memory_bytes Resident memory size in bytes.
# TYPE ovs_db_process_resident_memory_bytes gauge
ovs_db_process_resident_memory_bytes 2.4072192e+07
# HELP ovs_db_process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE ovs_db_process_start_time_seconds gauge
ovs_db_process_start_time_seconds 1.66722386917e+09
# HELP ovs_db_process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE ovs_db_process_virtual_memory_bytes gauge
ovs_db_process_virtual_memory_bytes 8.7834624e+07
# HELP ovs_db_process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE ovs_db_process_virtual_memory_max_bytes gauge
ovs_db_process_virtual_memory_max_bytes 1.8446744073709552e+19
# HELP ovs_vswitchd_bridge A metric with a constant '1' value labeled by bridge name present on the instance.
# TYPE ovs_vswitchd_bridge gauge
ovs_vswitchd_bridge{bridge="br-ex"} 1
ovs_vswitchd_bridge{bridge="br-int"} 1
# HELP ovs_vswitchd_bridge_flows_total Represents the number of OpenFlow flows on the OVS bridge.
# TYPE ovs_vswitchd_bridge_flows_total gauge
ovs_vswitchd_bridge_flows_total{bridge="br-ex"} 35
ovs_vswitchd_bridge_flows_total{bridge="br-int"} 2553
# HELP ovs_vswitchd_bridge_ports_total Represents the number of OVS ports on the bridge.
# TYPE ovs_vswitchd_bridge_ports_total gauge
ovs_vswitchd_bridge_ports_total{bridge="br-ex"} 3
ovs_vswitchd_bridge_ports_total{bridge="br-int"} 27
# HELP ovs_vswitchd_bridge_reconfigure Number of times OVS bridges were reconfigured.
# TYPE ovs_vswitchd_bridge_reconfigure gauge
ovs_vswitchd_bridge_reconfigure 477
# HELP ovs_vswitchd_bridge_total Represents total number of OVS bridges on the system.
# TYPE ovs_vswitchd_bridge_total gauge
ovs_vswitchd_bridge_total 2
# HELP ovs_vswitchd_dp A metric with a constant '1' value labeled by datapath name present on the instance.
# TYPE ovs_vswitchd_dp gauge
ovs_vswitchd_dp{datapath="ovs-system",type="system"} 1
# HELP ovs_vswitchd_dp_flows_lookup_hit Represents number of packets matching the existing flows while processing incoming packets in the datapath.
# TYPE ovs_vswitchd_dp_flows_lookup_hit gauge
ovs_vswitchd_dp_flows_lookup_hit{datapath="ovs-system"} 1.1543825e+07
# HELP ovs_vswitchd_dp_flows_lookup_lost number of packets destined for user space process but subsequently dropped before  reaching  userspace.
# TYPE ovs_vswitchd_dp_flows_lookup_lost gauge
ovs_vswitchd_dp_flows_lookup_lost{datapath="ovs-system"} 0
# HELP ovs_vswitchd_dp_flows_lookup_missed Represents the number of packets not matching any existing flow  and require  user space processing.
# TYPE ovs_vswitchd_dp_flows_lookup_missed gauge
ovs_vswitchd_dp_flows_lookup_missed{datapath="ovs-system"} 62196
# HELP ovs_vswitchd_dp_flows_total Represents the number of flows in datapath.
# TYPE ovs_vswitchd_dp_flows_total gauge
ovs_vswitchd_dp_flows_total{datapath="ovs-system"} 451
# HELP ovs_vswitchd_dp_if_total Represents the number of ports connected to the datapath.
# TYPE ovs_vswitchd_dp_if_total gauge
ovs_vswitchd_dp_if_total{datapath="ovs-system"} 25
# HELP ovs_vswitchd_dp_masks_hit Represents the total number of masks visited for matching incoming packets.
# TYPE ovs_vswitchd_dp_masks_hit gauge
ovs_vswitchd_dp_masks_hit{datapath="ovs-system"} 3.6366695e+07
# HELP ovs_vswitchd_dp_masks_hit_ratio Represents the average number of masks visited per packet the  ratio between hit and total number of packets processed by the datapath.
# TYPE ovs_vswitchd_dp_masks_hit_ratio gauge
ovs_vswitchd_dp_masks_hit_ratio{datapath="ovs-system"} 3.13
# HELP ovs_vswitchd_dp_masks_total Represents the number of masks in a datapath.
# TYPE ovs_vswitchd_dp_masks_total gauge
ovs_vswitchd_dp_masks_total{datapath="ovs-system"} 60
# HELP ovs_vswitchd_dp_packets_total Represents the total number of packets datapath processed which is the sum of hit and missed.
# TYPE ovs_vswitchd_dp_packets_total gauge
ovs_vswitchd_dp_packets_total{datapath="ovs-system"} 1.1606021e+07
# HELP ovs_vswitchd_dp_total Represents total number of datapaths on the system.
# TYPE ovs_vswitchd_dp_total gauge
ovs_vswitchd_dp_total 1
# HELP ovs_vswitchd_dpif_execute Number of times the OpenFlow actions were executed in userspace on behalf of the datapath.
# TYPE ovs_vswitchd_dpif_execute gauge
ovs_vswitchd_dpif_execute 62254
# HELP ovs_vswitchd_dpif_flow_del Number of times flows were deleted from the datapath (Linux kernel datapath module).
# TYPE ovs_vswitchd_dpif_flow_del gauge
ovs_vswitchd_dpif_flow_del 53107
# HELP ovs_vswitchd_dpif_flow_flush Number of times flows were flushed from the datapath (Linux kernel datapath module).
# TYPE ovs_vswitchd_dpif_flow_flush gauge
ovs_vswitchd_dpif_flow_flush 1
# HELP ovs_vswitchd_dpif_flow_get Number of times flows were retrieved from the datapath (Linux kernel datapath module).
# TYPE ovs_vswitchd_dpif_flow_get gauge
ovs_vswitchd_dpif_flow_get 26
# HELP ovs_vswitchd_dpif_flow_put Number of times flows were added to the datapath (Linux kernel datapath module).
# TYPE ovs_vswitchd_dpif_flow_put gauge
ovs_vswitchd_dpif_flow_put 56001
# HELP ovs_vswitchd_dpif_port_add Number of times a netdev was added as a port to the dpif.
# TYPE ovs_vswitchd_dpif_port_add gauge
ovs_vswitchd_dpif_port_add 73
# HELP ovs_vswitchd_dpif_port_del Number of times a netdev was removed from the dpif.
# TYPE ovs_vswitchd_dpif_port_del gauge
ovs_vswitchd_dpif_port_del 98
# HELP ovs_vswitchd_handlers_total Represents the number of handlers thread. This thread reads upcalls from dpif, forwards each upcall's packet and possibly sets up a kernel flow as a cache.
# TYPE ovs_vswitchd_handlers_total gauge
ovs_vswitchd_handlers_total 2
# HELP ovs_vswitchd_hw_offload Represents whether netdev flow offload to hardware is enabled or not -- false(0) and true(1).
# TYPE ovs_vswitchd_hw_offload gauge
ovs_vswitchd_hw_offload 0
# HELP ovs_vswitchd_interface_collisions_total The total number of packet collisions transmitted by Open vSwitch interface(s).
# TYPE ovs_vswitchd_interface_collisions_total gauge
ovs_vswitchd_interface_collisions_total 0
# HELP ovs_vswitchd_interface_resets_total The number of link state changes observed by Open vSwitch interface(s).
# TYPE ovs_vswitchd_interface_resets_total gauge
ovs_vswitchd_interface_resets_total 2
# HELP ovs_vswitchd_interface_rx_dropped_total The total number of received packets dropped by Open vSwitch interface(s).
# TYPE ovs_vswitchd_interface_rx_dropped_total gauge
ovs_vswitchd_interface_rx_dropped_total 8
# HELP ovs_vswitchd_interface_rx_errors_total The total number of received packets with errors by Open vSwitch interface(s).
# TYPE ovs_vswitchd_interface_rx_errors_total gauge
ovs_vswitchd_interface_rx_errors_total 0
# HELP ovs_vswitchd_interface_tx_dropped_total The total number of transmitted packets dropped by Open vSwitch interface(s).
# TYPE ovs_vswitchd_interface_tx_dropped_total gauge
ovs_vswitchd_interface_tx_dropped_total 2
# HELP ovs_vswitchd_interface_tx_errors_total The total number of transmitted packets with errors by Open vSwitch interface(s).
# TYPE ovs_vswitchd_interface_tx_errors_total gauge
ovs_vswitchd_interface_tx_errors_total 0
# HELP ovs_vswitchd_netlink_overflow Netlink messages dropped by the daemon due to buffer overflow.
# TYPE ovs_vswitchd_netlink_overflow gauge
ovs_vswitchd_netlink_overflow 0
# HELP ovs_vswitchd_netlink_received Number of netlink messages received by the kernel.
# TYPE ovs_vswitchd_netlink_received gauge
ovs_vswitchd_netlink_received 234375
# HELP ovs_vswitchd_netlink_recv_jumbo Number of netlink messages that were received fromthe kernel were more than the allocated buffer.
# TYPE ovs_vswitchd_netlink_recv_jumbo gauge
ovs_vswitchd_netlink_recv_jumbo 62017
# HELP ovs_vswitchd_netlink_sent Number of netlink message sent to the kernel.
# TYPE ovs_vswitchd_netlink_sent gauge
ovs_vswitchd_netlink_sent 272044
# HELP ovs_vswitchd_ofproto_dpif_expired Number of times the flows were removed for reasons - idle timeout, hard timeout, flow delete,  group delete, meter delete, or eviction.
# TYPE ovs_vswitchd_ofproto_dpif_expired gauge
ovs_vswitchd_ofproto_dpif_expired 0
# HELP ovs_vswitchd_ofproto_flush Number of times the flows from all of ofproto's flow tables were flushed.
# TYPE ovs_vswitchd_ofproto_flush gauge
ovs_vswitchd_ofproto_flush 1
# HELP ovs_vswitchd_ofproto_packet_out Number of times the controller injected the packet into the kernel datapath.
# TYPE ovs_vswitchd_ofproto_packet_out gauge
ovs_vswitchd_ofproto_packet_out 44
# HELP ovs_vswitchd_ofproto_recv_openflow Number of times an OpenFlow message was handled.
# TYPE ovs_vswitchd_ofproto_recv_openflow gauge
ovs_vswitchd_ofproto_recv_openflow 16436
# HELP ovs_vswitchd_ofproto_reinit_ports Number of times all the OpenFlow ports were reinitialized.
# TYPE ovs_vswitchd_ofproto_reinit_ports gauge
ovs_vswitchd_ofproto_reinit_ports 0
# HELP ovs_vswitchd_packet_in Specifies the number of times ovs-vswitchd has handled the packet-ins on behalf of kernel datapath.
# TYPE ovs_vswitchd_packet_in gauge
ovs_vswitchd_packet_in 61695
# HELP ovs_vswitchd_packet_in_drop Specifies the number of times the ovs-vswitchd has dropped the packet-ins due to resource constraints.
# TYPE ovs_vswitchd_packet_in_drop gauge
ovs_vswitchd_packet_in_drop 0
# HELP ovs_vswitchd_process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE ovs_vswitchd_process_cpu_seconds_total counter
ovs_vswitchd_process_cpu_seconds_total 47.28
# HELP ovs_vswitchd_process_max_fds Maximum number of open file descriptors.
# TYPE ovs_vswitchd_process_max_fds gauge
ovs_vswitchd_process_max_fds 65535
# HELP ovs_vswitchd_process_open_fds Number of open file descriptors.
# TYPE ovs_vswitchd_process_open_fds gauge
ovs_vswitchd_process_open_fds 79
# HELP ovs_vswitchd_process_resident_memory_bytes Resident memory size in bytes.
# TYPE ovs_vswitchd_process_resident_memory_bytes gauge
ovs_vswitchd_process_resident_memory_bytes 1.9288064e+08
# HELP ovs_vswitchd_process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE ovs_vswitchd_process_start_time_seconds gauge
ovs_vswitchd_process_start_time_seconds 1.66722386941e+09
# HELP ovs_vswitchd_process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE ovs_vswitchd_process_virtual_memory_bytes gauge
ovs_vswitchd_process_virtual_memory_bytes 5.415936e+08
# HELP ovs_vswitchd_process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE ovs_vswitchd_process_virtual_memory_max_bytes gauge
ovs_vswitchd_process_virtual_memory_max_bytes 1.8446744073709552e+19
# HELP ovs_vswitchd_pstream_open Specifies the number of time passive connections were opened for the remote peer to connect.
# TYPE ovs_vswitchd_pstream_open gauge
ovs_vswitchd_pstream_open 5
# HELP ovs_vswitchd_rconn_discarded Specifies the number of messages that have been dropped because the send queue had to be flushed because of reconnection.
# TYPE ovs_vswitchd_rconn_discarded gauge
ovs_vswitchd_rconn_discarded 0
# HELP ovs_vswitchd_rconn_overflow Specifies the number of messages that have been dropped because of the queue overflow.
# TYPE ovs_vswitchd_rconn_overflow gauge
ovs_vswitchd_rconn_overflow 0
# HELP ovs_vswitchd_rconn_queued Specifies the number of messages that have been queued because it couldn’t be sent using the underlying virtual connection to OpenFlow devices.
# TYPE ovs_vswitchd_rconn_queued gauge
ovs_vswitchd_rconn_queued 9462
# HELP ovs_vswitchd_rconn_sent Specifies the number of messages that have been sent to the underlying virtual connection (unix, tcp, or ssl) to OpenFlow devices.
# TYPE ovs_vswitchd_rconn_sent gauge
ovs_vswitchd_rconn_sent 9462
# HELP ovs_vswitchd_revalidators_total Represents the number of revalidators thread. This thread processes datapath flows, updates OpenFlow statistics, and updates or removes them if necessary.
# TYPE ovs_vswitchd_revalidators_total gauge
ovs_vswitchd_revalidators_total 2
# HELP ovs_vswitchd_stream_open Specifies the number of attempts to connect to a remote peer (active connection).
# TYPE ovs_vswitchd_stream_open gauge
ovs_vswitchd_stream_open 1
# HELP ovs_vswitchd_tc_policy Represents the policy used with HW offloading -- none(0), skip_sw(1), and skip_hw(2).
# TYPE ovs_vswitchd_tc_policy gauge
ovs_vswitchd_tc_policy 0
# HELP ovs_vswitchd_txn_aborted Specifies the number of times the OVSDB  transaction has been aborted.
# TYPE ovs_vswitchd_txn_aborted gauge
ovs_vswitchd_txn_aborted 0
# HELP ovs_vswitchd_txn_error Specifies the number of times the OVSDB transaction has errored out.
# TYPE ovs_vswitchd_txn_error gauge
ovs_vswitchd_txn_error 0
# HELP ovs_vswitchd_txn_incomplete Specifies the number of times the OVSDB transaction did not complete and the client had to re-try.
# TYPE ovs_vswitchd_txn_incomplete gauge
ovs_vswitchd_txn_incomplete 767
# HELP ovs_vswitchd_txn_success Specifies the number of times the OVSDB transaction has successfully completed.
# TYPE ovs_vswitchd_txn_success gauge
ovs_vswitchd_txn_success 597
# HELP ovs_vswitchd_txn_try_again Specifies the number of times the OVSDB transaction failed and the client had to re-try.
# TYPE ovs_vswitchd_txn_try_again gauge
ovs_vswitchd_txn_try_again 0
# HELP ovs_vswitchd_txn_unchanged Specifies the number of times the OVSDB transaction resulted in no change to the database.
# TYPE ovs_vswitchd_txn_unchanged gauge
ovs_vswitchd_txn_unchanged 1120
# HELP ovs_vswitchd_txn_uncommitted Specifies the number of times the OVSDB transaction were uncommitted.
# TYPE ovs_vswitchd_txn_uncommitted gauge
ovs_vswitchd_txn_uncommitted 0
# HELP ovs_vswitchd_vconn_open Specifies the number of attempts to connect to an OpenFlow Device.
# TYPE ovs_vswitchd_vconn_open gauge
ovs_vswitchd_vconn_open 0
# HELP ovs_vswitchd_vconn_received Specifies the number of messages received from the OpenFlow Device.
# TYPE ovs_vswitchd_vconn_received gauge
ovs_vswitchd_vconn_received 17421
# HELP ovs_vswitchd_vconn_sent Specifies the number of messages sent to the OpenFlow Device.
# TYPE ovs_vswitchd_vconn_sent gauge
ovs_vswitchd_vconn_sent 11783
# HELP ovs_vswitchd_xlate_actions Number of times an OpenFlow actions were translated into datapath actions.
# TYPE ovs_vswitchd_xlate_actions gauge
ovs_vswitchd_xlate_actions 522730
# HELP ovs_vswitchd_xlate_actions_oversize Number of times the translated OpenFlow actions into a datapath actions were too big for a netlink attribute.
# TYPE ovs_vswitchd_xlate_actions_oversize gauge
ovs_vswitchd_xlate_actions_oversize 0
# HELP ovs_vswitchd_xlate_actions_too_many_output Number of times the number of datapath actions were more than what the kernel can handle reliably.
# TYPE ovs_vswitchd_xlate_actions_too_many_output gauge
ovs_vswitchd_xlate_actions_too_many_output 0
[weliang@weliang ~]$ 
[weliang@weliang ~]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-10-31-100546   True        False         22m     Cluster version is 4.10.0-0.nightly-2022-10-31-100546
[weliang@weliang ~]$

Comment 8 errata-xmlrpc 2022-11-09 10:51:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.40 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7298


Note You need to log in before you can comment on or make changes to this bug.