Hide Forgot
Currently, the sos report lacks some information that might be useful to troubleshoot hw-offloading problems. The following logs should be added: 1) ovs hw-offloaded dp flows: BZ 1824854 might have already done that. But consider backporting it to the rhel 8.2 sos packag 2) tc filter (including hw/sw stats) on all interfaces 3) devlink information (at least "param show" and "eswitch show")
(In reply to Adrián Moreno from comment #0) > Currently, the sos report lacks some information that might be useful to > troubleshoot hw-offloading problems. > > The following logs should be added: > 1) ovs hw-offloaded dp flows: BZ 1824854 might have already done that. But > consider backporting it to the rhel 8.2 sos packag That is https://github.com/sosreport/sos/pull/2051 , right? Currently planned to RHEL8.4 due to rebase. We might add it to 8.3.z depending on severity/priority (as a classical z-stream bug), but why 8.2 and some EUS, please? > > 2) tc filter (including hw/sw stats) on all interfaces Something like adding line after: https://github.com/sosreport/sos/blob/master/sos/report/plugins/networking.py#L205 ? (or in pseudo-code: for i in $(ls /sys/class/net/); do tc filter $i # collect these outputs done ) ? > > 3) devlink information (at least "param show" and "eswitch show") Could you be more specific (i.e. whole commands to be called, e.g. in a pseudocode like above)? When these commands should be called? (or also in networking plugin)?
Sorry I've not been very clear, my intention was to work on this myself (when I find some time) but if you are jumping on it, that's great! (In reply to Pavel Moravec from comment #1) > (In reply to Adrián Moreno from comment #0) > > Currently, the sos report lacks some information that might be useful to > > troubleshoot hw-offloading problems. > > > > The following logs should be added: > > 1) ovs hw-offloaded dp flows: BZ 1824854 might have already done that. But > > consider backporting it to the rhel 8.2 sos packag > > That is https://github.com/sosreport/sos/pull/2051 , right? Yes, that PR includes the logs I'm referring to. > Currently > planned to RHEL8.4 due to rebase. We might add it to 8.3.z depending on > severity/priority (as a classical z-stream bug), but why 8.2 and some EUS, > please? > Yes, that PR includes the logs I'm referring to. The reason is mainly because we have enabled OvS tc hardware offloading on RHEL 8.2 and without the full offloaded datapath rules, it's quite difficult debug any issue related to hw offload. > > > > > 2) tc filter (including hw/sw stats) on all interfaces > > Something like adding line after: > > https://github.com/sosreport/sos/blob/master/sos/report/plugins/networking. > py#L205 ? > > (or in pseudo-code: > > for i in $(ls /sys/class/net/); do > tc filter $i # collect these outputs > done > > ) ? > Yes. However, I'd add a "-s" flag to the tc command to get the statistics as well > > > > > 3) devlink information (at least "param show" and "eswitch show") > > Could you be more specific (i.e. whole commands to be called, e.g. in a > pseudocode like above)? When these commands should be called? (or also in > networking plugin)? Sure, I don't have a system with the right devices handy but it would be something like: # Commands that show information of all available devices $ devlink param show $ devlink dev info #Per-device commands $ for dev in $(devlink dev); do \ devlink dev eswitch show $dev; \ done These commands show the chip/ASIC specific information for compatible switch devices, so I guess the network plugin sounds like the right place but I don't have enough knowledge of sos to have a strong opinion on this.
(In reply to Adrián Moreno from comment #2) > > > 3) devlink information (at least "param show" and "eswitch show") > > > > Could you be more specific (i.e. whole commands to be called, e.g. in a > > pseudocode like above)? When these commands should be called? (or also in > > networking plugin)? > > Sure, I don't have a system with the right devices handy but it would be > something like: > > # Commands that show information of all available devices > $ devlink param show > $ devlink dev info > > #Per-device commands > $ for dev in $(devlink dev); do \ > devlink dev eswitch show $dev; \ > done > > These commands show the chip/ASIC specific information for compatible switch > devices, so I guess the network plugin sounds like the right place but I > don't have enough knowledge of sos to have a strong opinion on this. +1 to network plugin. These are a companion to ethtool commands, lets say. Sample outputs for devlink commands now at http://pastebin.test.redhat.com/910915
Preliminary patch: diff --git a/sos/report/plugins/networking.py.orig b/sos/report/plugins/networking.py index 5bdb697..81315f4 100644 --- a/sos/report/plugins/networking.py.orig +++ b/sos/report/plugins/networking.py @@ -102,8 +102,16 @@ class Networking(Plugin): "ip neigh show nud noarp", "biosdevname -d", "tc -s qdisc show", + "devlink dev param show", + "devlink dev info", ]) + devlinks = self.collect_cmd_output("devlink dev") + if devlinks['status'] == 0: + devlinks_list = devlinks['output'].splitlines() + for devlink in devlinks_list: + self.add_cmd_output("devlink dev eswitch show %s" % devlink) + # below commands require some kernel module(s) to be loaded # run them only if the modules are loaded, or if explicitly requested # via --allow-system-changes option @@ -139,7 +147,8 @@ class Networking(Plugin): "ethtool -l " + eth, "ethtool --phy-statistics " + eth, "ethtool --show-priv-flags " + eth, - "ethtool --show-eee " + eth + "ethtool --show-eee " + eth, + "tc -s filter show dev " + eth ], tags=eth) # skip EEPROM collection by default, as it might hang or
Thanks Marcelo for the examples, that clarified some my questions. Upstream PR raised: https://github.com/sosreport/sos/pull/2383 As I still feel some uncertainty about particular commands syntax, please review the PR if I did it right. Preliminary, this will be available in RHEL8.5.
Hi Adrián, we have identified this bugfix as important to verify much prferably on some real (non-mocked) environment. Could you please verify the bug against the sos-4.1-1.el8 package, or ask somebody from the knowledge domain to do so / aka for OtherQE? Thanks in advance.
Hi Pavel, Sure, I'll see if I can get my hands on the right environment
Hi Pavel, I've tested in a real environment and found that we're missing the qdisc name on the "tc filter show" command I was surprised to see that without specifying any qdisc the command returns nothing: [heat-admin@overcloud-computeovshwoffload-0 ~]$ sudo tc filter show dev lxbond [heat-admin@overcloud-computeovshwoffload-0 ~]$ While: [heat-admin@overcloud-computeovshwoffload-0 ~]$ sudo tc filter show dev lxbond ingress filter block 43 protocol 802.1Q pref 3 flower chain 0 filter block 43 protocol 802.1Q pref 3 flower chain 0 handle 0x1 vlan_id 100 vlan_ethtype ip dst_mac 01:00:5e:00:00:12 src_mac 52:54:00:1d:fe:d2 eth_type ipv4 ip_flags nofrag not_in_hw action order 1: skbedit ptype host pipe index 3 ref 1 bind 1 action order 2: mirred (Ingress Redirect to device br-link2) stolen index 7 ref 1 bind 1 cookie 81ef8cc3de424beeef27009d5f38947e filter block 43 protocol 802.1Q pref 3 flower chain 0 handle 0x2 [...] It's confusing because "man tc" does show: tc [ OPTIONS ] filter show dev DEV However the "help" command shows: [heat-admin@overcloud-computeovshwoffload-0 ~]$ sudo tc filter help Usage: tc filter [ add | del | change | replace | show ] [ dev STRING ] tc filter [ add | del | change | replace | show ] [ block BLOCK_INDEX ] tc filter get dev STRING parent CLASSID protocol PROTO handle FILTERID pref PRIO FILTER_TYPE tc filter get block BLOCK_INDEX protocol PROTO handle FILTERID pref PRIO FILTER_TYPE [ pref PRIO ] protocol PROTO [ chain CHAIN_INDEX ] [ estimator INTERVAL TIME_CONSTANT ] [ root | ingress | egress | parent CLASSID ] [ handle FILTERID ] [ [ FILTER_TYPE ] [ help | OPTIONS ] ] tc filter show [ dev STRING ] [ root | ingress | egress | parent CLASSID ] tc filter show [ block BLOCK_INDEX ] Where: FILTER_TYPE := { rsvp | u32 | bpf | fw | route | etc. } FILTERID := ... format depends on classifier, see there OPTIONS := ... try tc filter add <desired FILTER_KIND> help [heat-admin@overcloud-computeovshwoffload-0 ~]$ I'll send a patch to fix the man page. In the mean time, I think this is what we need is: diff --git a/sos/report/plugins/networking.py b/sos/report/plugins/networking.py index acfa027f..09075363 100644 --- a/sos/report/plugins/networking.py +++ b/sos/report/plugins/networking.py @@ -156,7 +156,7 @@ class Networking(Plugin): "ethtool --phy-statistics " + eth, "ethtool --show-priv-flags " + eth, "ethtool --show-eee " + eth, - "tc -s filter show dev " + eth + "tc -s filter show dev " + eth + " ingress", ], tags=eth) # skip EEPROM collection by default, as it might hang or
Thanks for spotting it, raising upstream PR: https://github.com/sosreport/sos/pull/2550
We actually want both, because otherwise it will only show ingress filters. [root@horizon ~]# ip link add veth1 type veth peer name veth2 [root@horizon ~]# tc qdisc show dev veth1 [root@horizon ~]# tc qdisc add dev veth1 ingress [root@horizon ~]# tc filter add dev veth1 ingress matchall action drop vvvvvvvvv [root@horizon ~]# tc qdisc add dev veth1 root handle 1: htb [root@horizon ~]# tc filter add dev veth1 parent 1: handle 42 matchall action drop ^^ [root@horizon ~]# tc qdisc show dev veth1 qdisc htb 1: root refcnt 2 r2q 10 default 0 direct_packets_stat 0 direct_qlen 1000 qdisc ingress ffff: parent ffff:fff1 ---------------- [root@horizon ~]# tc filter show dev veth1 filter parent 1: protocol all pref 49152 matchall chain 0 ^^^^^^^^^^ filter parent 1: protocol all pref 49152 matchall chain 0 handle 0x2a <--- 0x2a = 42 not_in_hw action order 1: gact action drop random type none pass val 0 index 2 ref 1 bind 1 [root@horizon ~]# tc filter show dev veth1 ingress filter protocol all pref 49152 matchall chain 0 filter protocol all pref 49152 matchall chain 0 handle 0x1 not_in_hw action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 Andrea can explain better the semantics around ingress/egress.
Thanks for a prompt feedback. I have updated the PR accordingly, let me know if this versiion is correct :) https://github.com/sosreport/sos/pull/2550/files
LGTM! Btw, I wanted to add a 'Reviewed-by:' tag but wasn't sure how, so I hit the 'approve' in github too. Please let me know if that wasn't appropriate.. Thanks.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (sos bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:4388