The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1821965 - [Debugging] Add pinctrl statistics
Summary: [Debugging] Add pinctrl statistics
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: FDP 20.C
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Mohammad Heib
QA Contact: Ehsan Elahi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-07 23:49 UTC by Mark Michelson
Modified: 2023-09-15 00:30 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-02-24 17:47:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-564 0 None None None 2021-10-21 12:27:46 UTC
Red Hat Product Errata RHBA-2022:0674 0 None None None 2022-02-24 17:47:57 UTC

Description Mark Michelson 2020-04-07 23:49:29 UTC
Create a way of reporting the number of packets that have been sent to ovn-controller via a controller() action. This can also report the actions taken by OVN, whether there were errors, and which datapaths punted the packets to ovn-controller.

If the command can be centralized, that would be fantastic. However, if that is not feasible, we can instead have the command be at the ovn-controller layer.

Comment 1 Mark Michelson 2021-10-25 18:28:33 UTC
Update based on OVN team meeting 25 October, 2021:

Initially, I suggested using the OVS coverage API to add statistics for each type of incoming packet-in that pinctrl handles. This would allow for "coverage/show" to illustrate the hot points in pinctrl so we could know what might be causing OVN to be performing poorly.

During the meeting today, though, the team came up with an alternate solution, since having 20-something #defines for the different coverage counters would be unpalatable. Instead, what we can do is add coverage counters for the following sections:

1) process_packet_in(): Seeing this increase rapidly would tell us that ovn-controller is having to handle a great many packets.
2) notify_pinctrl_main(): Seeing this increase rapidly would tell us that pinctrl is waking up the main thread consistently.

Then, instead of adding coverage counters to each individual type of packet-in, we can instead add more debug-level logging across these functions. Specifically, the debug logs should state what type of message is being handled (e.g. DHCP, IGMP, DNS), and where the message was received from (source IP/MAC and OpenFlow port).

The idea is that an admin might notice OVN being slow, so they check the coverage counters. If they see one of the two coverage counters increasing rapidly, they can then enable debug logging and see what the culprit is. This way, they could see, for instance, that a certain VM is spamming DNS requests or something.

Comment 2 Mohammad Heib 2021-11-18 11:02:43 UTC
suggested solution submitted upstream:
https://patchwork.ozlabs.org/project/ovn/patch/20211118105406.508257-1-mheib@redhat.com/

@mmichels, if you please can take a look at this change and see if it answer this BZ requirement or need 
to add more things.

Comment 6 Ehsan Elahi 2022-01-17 16:08:42 UTC
Verified with a network of the following topology:

------router------
|       |        |
|       |        |
ls1    ls2      ls3
|       |        |
|       |        |
vm1    vm2      vm3

Reproduced on 
[root@bz_1821965 ~]# rpm -qa |grep -E 'ovn|openvswitch'
openvswitch2.15-2.15.0-53.el8fdp.x86_64
ovn-2021-central-21.09.1-23.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch
ovn-2021-host-21.09.1-23.el8fdp.x86_64
ovn-2021-21.09.1-23.el8fdp.x86_64

[root@bz_1821965 ~]# ls /var/run/ovn/
ovn-controller.1273217.ctl  ovn-controller.pid  ovnnb_db.ctl  ovnnb_db.pid  ovnnb_db.sock  ovn-northd.1273050.ctl  ovn-northd.pid  ovnsb_db.ctl  ovnsb_db.pid  ovnsb_db.sock

[root@bz_1821965 ~]# ovs-appctl -t /var/run/ovn/ovn-controller.1273217.ctl vlog/set dbg

[root@bz_1821965 ~]# cat /var/log/ovn/ovn-controller.log | grep NXT_PACKET_IN2 | grep table_id=10 | wc -l
0
[root@bz_1821965 ~]# cat /var/log/ovn/ovn-controller.log | grep "pinctrl received  packet-in" | grep opcode=PUT_ARP | grep OF_Table_ID=10 | wc -l
0
[root@bz_1821965 ~]# ovs-ofctl dump-flows br-int table=10 | grep arp | grep controller | grep -v n_packets=0 | wc -l
1

Verified on
[root@bz_1821965 ~]#rpm -qa |grep -E 'ovn|openvswitch'
ovn-2021-host-21.12.0-11.el8fdp.x86_64
openvswitch2.15-2.15.0-53.el8fdp.x86_64
ovn-2021-21.12.0-11.el8fdp.x86_64
ovn-2021-central-21.12.0-11.el8fdp.x86_64
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch

[root@bz_1821965 ~]# ls /var/run/ovn/
ovn-controller.741050.ctl  ovn-controller.pid  ovnnb_db.ctl  ovnnb_db.pid  ovnnb_db.sock  ovn-northd.740806.ctl  ovn-northd.pid  ovnsb_db.ctl  ovnsb_db.pid  ovnsb_db.sock
[root@bz_1821965 ~]# ovs-appctl -t /var/run/ovn/ovn-controller.741050.ctl vlog/set dbg

[root@bz_1821965 ~]# cat /var/log/ovn/ovn-controller.log | grep NXT_PACKET_IN2 | grep table_id=10 | wc -l
11
[root@bz_1821965 ~]# cat /var/log/ovn/ovn-controller.log | grep "pinctrl received  packet-in" | grep opcode=PUT_ARP | grep OF_Table_ID=10 | wc -l
3
[root@bz_1821965 ~]# ovs-ofctl dump-flows br-int table=10 | grep arp | grep controller | grep -v n_packets=0 | wc -l
1

Comment 8 errata-xmlrpc 2022-02-24 17:47:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0674

Comment 9 Red Hat Bugzilla 2023-09-15 00:30:53 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.