Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1821965

Summary: [Debugging] Add pinctrl statistics
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Mark Michelson <mmichels>
Component: OVNAssignee: Mohammad Heib <mheib>
Status: CLOSED ERRATA QA Contact: Ehsan Elahi <eelahi>
Severity: high Docs Contact:
Priority: high    
Version: FDP 20.CCC: ctrautma, mheib, rkhan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-02-24 17:47:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mark Michelson 2020-04-07 23:49:29 UTC
Create a way of reporting the number of packets that have been sent to ovn-controller via a controller() action. This can also report the actions taken by OVN, whether there were errors, and which datapaths punted the packets to ovn-controller.

If the command can be centralized, that would be fantastic. However, if that is not feasible, we can instead have the command be at the ovn-controller layer.

Comment 1 Mark Michelson 2021-10-25 18:28:33 UTC
Update based on OVN team meeting 25 October, 2021:

Initially, I suggested using the OVS coverage API to add statistics for each type of incoming packet-in that pinctrl handles. This would allow for "coverage/show" to illustrate the hot points in pinctrl so we could know what might be causing OVN to be performing poorly.

During the meeting today, though, the team came up with an alternate solution, since having 20-something #defines for the different coverage counters would be unpalatable. Instead, what we can do is add coverage counters for the following sections:

1) process_packet_in(): Seeing this increase rapidly would tell us that ovn-controller is having to handle a great many packets.
2) notify_pinctrl_main(): Seeing this increase rapidly would tell us that pinctrl is waking up the main thread consistently.

Then, instead of adding coverage counters to each individual type of packet-in, we can instead add more debug-level logging across these functions. Specifically, the debug logs should state what type of message is being handled (e.g. DHCP, IGMP, DNS), and where the message was received from (source IP/MAC and OpenFlow port).

The idea is that an admin might notice OVN being slow, so they check the coverage counters. If they see one of the two coverage counters increasing rapidly, they can then enable debug logging and see what the culprit is. This way, they could see, for instance, that a certain VM is spamming DNS requests or something.

Comment 2 Mohammad Heib 2021-11-18 11:02:43 UTC
suggested solution submitted upstream:
https://patchwork.ozlabs.org/project/ovn/patch/20211118105406.508257-1-mheib@redhat.com/

@mmichels, if you please can take a look at this change and see if it answer this BZ requirement or need 
to add more things.

Comment 6 Ehsan Elahi 2022-01-17 16:08:42 UTC
Verified with a network of the following topology:

------router------
|       |        |
|       |        |
ls1    ls2      ls3
|       |        |
|       |        |
vm1    vm2      vm3

Reproduced on 
[root@bz_1821965 ~]# rpm -qa |grep -E 'ovn|openvswitch'
openvswitch2.15-2.15.0-53.el8fdp.x86_64
ovn-2021-central-21.09.1-23.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch
ovn-2021-host-21.09.1-23.el8fdp.x86_64
ovn-2021-21.09.1-23.el8fdp.x86_64

[root@bz_1821965 ~]# ls /var/run/ovn/
ovn-controller.1273217.ctl  ovn-controller.pid  ovnnb_db.ctl  ovnnb_db.pid  ovnnb_db.sock  ovn-northd.1273050.ctl  ovn-northd.pid  ovnsb_db.ctl  ovnsb_db.pid  ovnsb_db.sock

[root@bz_1821965 ~]# ovs-appctl -t /var/run/ovn/ovn-controller.1273217.ctl vlog/set dbg

[root@bz_1821965 ~]# cat /var/log/ovn/ovn-controller.log | grep NXT_PACKET_IN2 | grep table_id=10 | wc -l
0
[root@bz_1821965 ~]# cat /var/log/ovn/ovn-controller.log | grep "pinctrl received  packet-in" | grep opcode=PUT_ARP | grep OF_Table_ID=10 | wc -l
0
[root@bz_1821965 ~]# ovs-ofctl dump-flows br-int table=10 | grep arp | grep controller | grep -v n_packets=0 | wc -l
1

Verified on
[root@bz_1821965 ~]#rpm -qa |grep -E 'ovn|openvswitch'
ovn-2021-host-21.12.0-11.el8fdp.x86_64
openvswitch2.15-2.15.0-53.el8fdp.x86_64
ovn-2021-21.12.0-11.el8fdp.x86_64
ovn-2021-central-21.12.0-11.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch

[root@bz_1821965 ~]# ls /var/run/ovn/
ovn-controller.741050.ctl  ovn-controller.pid  ovnnb_db.ctl  ovnnb_db.pid  ovnnb_db.sock  ovn-northd.740806.ctl  ovn-northd.pid  ovnsb_db.ctl  ovnsb_db.pid  ovnsb_db.sock
[root@bz_1821965 ~]# ovs-appctl -t /var/run/ovn/ovn-controller.741050.ctl vlog/set dbg

[root@bz_1821965 ~]# cat /var/log/ovn/ovn-controller.log | grep NXT_PACKET_IN2 | grep table_id=10 | wc -l
11
[root@bz_1821965 ~]# cat /var/log/ovn/ovn-controller.log | grep "pinctrl received  packet-in" | grep opcode=PUT_ARP | grep OF_Table_ID=10 | wc -l
3
[root@bz_1821965 ~]# ovs-ofctl dump-flows br-int table=10 | grep arp | grep controller | grep -v n_packets=0 | wc -l
1

Comment 8 errata-xmlrpc 2022-02-24 17:47:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0674

Comment 9 Red Hat Bugzilla 2023-09-15 00:30:53 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days