Bug 2030462
| Summary: | Performance drop while Mirroring OVS Ports in an OVS HW Offload Scenario | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | coldford <coldford> | ||||||
| Component: | rhosp-openvswitch | Assignee: | OSP Team <rhos-maint> | ||||||
| Status: | VERIFIED --- | QA Contact: | OSP Team <rhos-maint> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 16.2 (Train) | CC: | atzin, broose, cfields, echaudro, fboboc, guilherme.giacon, hakhande, jfargen, jimm, john.choo, lariel, mburns, mleitner, nnarang, ptalbert, ralonsoh, rkhan, spower | ||||||
| Target Milestone: | z3 | Keywords: | Bugfix, OtherQA, TestOnly, Triaged | ||||||
| Target Release: | 16.2 (Train on RHEL 8.4) | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | Type: | Bug | |||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 1915308, 2037489, 2042651 | ||||||||
| Bug Blocks: | 2131355 | ||||||||
| Attachments: |
|
||||||||
|
Comment 5
Marcelo Ricardo Leitner
2021-12-08 22:56:53 UTC
As I don't have a timestamp to correlate, lets check all errors logged:
$ perf script | sed -n 's/.*netlink://p' | sort -u
No kallsyms or vmlinux with build-id 5019ef4e3f199fc0f67e3b59495e1cdb92ca2e07 was found
netlink_extack: msg=mlx5_core: devices are not on same switch HW, can't offload forwarding
netlink_extack: msg=mlx5_core: The offload action is not supported
netlink_extack: msg=mlx5_core: Unsupported key
Flow in question:
ufid:a9b6fbe4-a76a-421f-bd14-905f7fec31d5, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens3f0_48),packet_type(ns=0/0,id=0/0),eth(src=80:19:00:00:00:01,dst=00:00:00:00:44:44),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0x3,hlimit=0/0,frag=no), packets:981670521, bytes:4775302999578, used:0.000s, dp:tc, actions:set(tunnel(tun_id=0x0,src=172.20.16.13,dst=10.75.125.168,ttl=64,flags(key))),gre_sys,push_vlan(vid=610,pcp=0),bond1
gre_sys action is supported, as there's this flow using it and got offloaded:
ufid:8b3584e2-d624-4d2e-a629-bcbd79eeeffb, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(bond1),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:44:44,dst=80:19:00:00:00:01),eth_type(0x8100),vlan(vid=610,pcp=0),encap(eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0x3,ttl=0/0,frag=no)), packets:984521022, bytes:4851832336920, used:0.160s, offloaded:yes, dp:tc, actions:pop_vlan,ens3f0_48,set(tunnel(tun_id=0x0,src=172.20.16.13,dst=10.75.125.168,ttl=64,flags(key))),gre_sys
So I'm thinking it's about the "Unsupported key" one. I don't know why, but the first is matching on:
ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0x3,hlimit=0/0,frag=no)
^^^^^^^^^^^^
tclass goes with ipv6 flow label, and:
FLOW_DISSECTOR_KEY_FLOW_LABEL, /* struct flow_dissector_key_tags */
but:
mlx5/en_tc.c: __parse_cls_flower()
if (dissector->used_keys &
~(BIT(FLOW_DISSECTOR_KEY_META) |
BIT(FLOW_DISSECTOR_KEY_CONTROL) |
BIT(FLOW_DISSECTOR_KEY_BASIC) |
BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) |
BIT(FLOW_DISSECTOR_KEY_VLAN) |
BIT(FLOW_DISSECTOR_KEY_CVLAN) |
BIT(FLOW_DISSECTOR_KEY_IPV4_ADDRS) |
BIT(FLOW_DISSECTOR_KEY_IPV6_ADDRS) |
BIT(FLOW_DISSECTOR_KEY_PORTS) |
BIT(FLOW_DISSECTOR_KEY_ENC_KEYID) |
BIT(FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) |
BIT(FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) |
BIT(FLOW_DISSECTOR_KEY_ENC_PORTS) |
BIT(FLOW_DISSECTOR_KEY_ENC_CONTROL) |
BIT(FLOW_DISSECTOR_KEY_TCP) |
BIT(FLOW_DISSECTOR_KEY_IP) |
BIT(FLOW_DISSECTOR_KEY_CT) |
BIT(FLOW_DISSECTOR_KEY_ENC_IP) |
BIT(FLOW_DISSECTOR_KEY_ENC_OPTS) |
BIT(FLOW_DISSECTOR_KEY_MPLS))) {
NL_SET_ERR_MSG_MOD(extack, "Unsupported key"); <-- our error msg
netdev_dbg(priv->netdev, "Unsupported key used: 0x%x\n", <-- [A]
dissector->used_keys);
return -EOPNOTSUPP;
}
FLOW_DISSECTOR_KEY_FLOW_LABEL is not listed as supported.
I see other flows using tclass=0/0x3, but they were handling broadcasts and outputting to multiple ports. They likely hit the error of not offloading because the interfaces are not on the same HW (different error).
Please enable that debug message and let's confirm that it is getting printed.
To enable it:
# echo 'func __parse_cls_flower +p' > /sys/kernel/debug/dynamic_debug/control
You should see the log in dmesg or journal, depending on your configs.
If confirmed, we need to understand why ovs is adding this match.
Expected code to be logged on the msg is: crash> px (int)FLOW_DISSECTOR_KEY_FLOW_LABEL $2 = 0xb <--- Realized during the mtg today that tclass=0/0x3 may be translated into ip_tos 0/0x3, which should be supported as I saw other flows using it and got offloaded. Hmmm. Still worth the test above, btw. The number in unsupported key msg can be translated with this table: http://pastebin.test.redhat.com/1014795 (stored forever) Please note that the extack would be logged only once and only when the flow is created in the datapath. I'm afraid we're not capturing the error due to test procedure. We need to wait for the hot flows to idle out and then try again. Only then the error will be logged. (In reply to Marcelo Ricardo Leitner from comment #7) > As I don't have a timestamp to correlate, lets check all errors logged: > > $ perf script | sed -n 's/.*netlink://p' | sort -u > No kallsyms or vmlinux with build-id > 5019ef4e3f199fc0f67e3b59495e1cdb92ca2e07 was found > netlink_extack: msg=mlx5_core: devices are not on same switch HW, can't > offload forwarding > netlink_extack: msg=mlx5_core: The offload action is not supported > netlink_extack: msg=mlx5_core: Unsupported key > > Flow in question: > ufid:a9b6fbe4-a76a-421f-bd14-905f7fec31d5, > skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0), > ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ens3f0_48),packet_type(ns=0/ > 0,id=0/0),eth(src=80:19:00:00:00:01,dst=00:00:00:00:44:44),eth_type(0x86dd), > ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0x3,hlimit=0/0, > frag=no), packets:981670521, bytes:4775302999578, used:0.000s, dp:tc, > actions:set(tunnel(tun_id=0x0,src=172.20.16.13,dst=10.75.125.168,ttl=64, > flags(key))),gre_sys,push_vlan(vid=610,pcp=0),bond1 > > gre_sys action is supported, as there's this flow using it and got offloaded: > ufid:8b3584e2-d624-4d2e-a629-bcbd79eeeffb, > skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0), > ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(bond1),packet_type(ns=0/0, > id=0/0),eth(src=00:00:00:00:44:44,dst=80:19:00:00:00:01),eth_type(0x8100), > vlan(vid=610,pcp=0),encap(eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0. > 0.0/0.0.0.0,proto=0/0,tos=0/0x3,ttl=0/0,frag=no)), packets:984521022, > bytes:4851832336920, used:0.160s, offloaded:yes, dp:tc, > actions:pop_vlan,ens3f0_48,set(tunnel(tun_id=0x0,src=172.20.16.13,dst=10.75. > 125.168,ttl=64,flags(key))),gre_sys > > So I'm thinking it's about the "Unsupported key" one. I don't know why, but > the first is matching on: > ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0x3,hlimit=0/0,frag=no) > ^^^^^^^^^^^^ > > tclass goes with ipv6 flow label, and: > FLOW_DISSECTOR_KEY_FLOW_LABEL, /* struct flow_dissector_key_tags */ > > but: > mlx5/en_tc.c: __parse_cls_flower() > > if (dissector->used_keys & > ~(BIT(FLOW_DISSECTOR_KEY_META) | > BIT(FLOW_DISSECTOR_KEY_CONTROL) | > BIT(FLOW_DISSECTOR_KEY_BASIC) | > BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) | > BIT(FLOW_DISSECTOR_KEY_VLAN) | > BIT(FLOW_DISSECTOR_KEY_CVLAN) | > BIT(FLOW_DISSECTOR_KEY_IPV4_ADDRS) | > BIT(FLOW_DISSECTOR_KEY_IPV6_ADDRS) | > BIT(FLOW_DISSECTOR_KEY_PORTS) | > BIT(FLOW_DISSECTOR_KEY_ENC_KEYID) | > BIT(FLOW_DISSECTOR_KEY_ENC_IPV4_ADDRS) | > BIT(FLOW_DISSECTOR_KEY_ENC_IPV6_ADDRS) | > BIT(FLOW_DISSECTOR_KEY_ENC_PORTS) | > BIT(FLOW_DISSECTOR_KEY_ENC_CONTROL) | > BIT(FLOW_DISSECTOR_KEY_TCP) | > BIT(FLOW_DISSECTOR_KEY_IP) | > BIT(FLOW_DISSECTOR_KEY_CT) | > BIT(FLOW_DISSECTOR_KEY_ENC_IP) | > BIT(FLOW_DISSECTOR_KEY_ENC_OPTS) | > BIT(FLOW_DISSECTOR_KEY_MPLS))) { > NL_SET_ERR_MSG_MOD(extack, "Unsupported key"); > <-- our error msg > netdev_dbg(priv->netdev, "Unsupported key used: 0x%x\n", > <-- [A] > dissector->used_keys); > return -EOPNOTSUPP; > } > > FLOW_DISSECTOR_KEY_FLOW_LABEL is not listed as supported. > I see other flows using tclass=0/0x3, but they were handling broadcasts and > outputting to multiple ports. They likely hit the error of not offloading > because the interfaces are not on the same HW (different error). > > Please enable that debug message and let's confirm that it is getting > printed. > To enable it: > # echo 'func __parse_cls_flower +p' > > /sys/kernel/debug/dynamic_debug/control > > You should see the log in dmesg or journal, depending on your configs. > If confirmed, we need to understand why ovs is adding this match. Hi Marcelo, mlx5 checks the keys first, before parsing and checking the actions so if there were other flows with this FLOW_LABEL key I would expect them to fail on the unsupported key check (before they fail on the forwarding check) (In reply to Jamie Fargen from comment #19) > We are now seeing these messages in /var/log/openvswitch/ovs-vswitchd.log. > > 2021-12-13T21:09:18.214Z|00022|netdev_offload_tc(handler2)|DBG|offloading > attribute icmpv6_type isn't supported > 2021-12-13T21:10:01.247Z|00006|netdev_offload_tc(handler10)|DBG|offloading > attribute icmpv6_type isn't supported > 2021-12-13T21:10:02.068Z|00007|netdev_offload_tc(handler10)|DBG|offloading > attribute icmpv6_type isn't supported > 2021-12-13T21:10:03.213Z|00003|netdev_offload_tc(handler26)|DBG|offloading > attribute icmpv6_type isn't supported Indeed currently OVS code blocks offloading rules that match ICMPv6 type: else if (key->dl_type == htons(ETH_TYPE_IPV6) && key->nw_proto == IPPROTO_ICMPV6) { if (mask->tp_src) { VLOG_DBG_RL(&rl, "offloading attribute icmpv6_type isn't supported"); return EOPNOTSUPP; } And that should be fine (to not be offloaded). ICMPs are supposed to be low traffic. Created attachment 1846641 [details]
net/mlx5: Set reformat action when needed for termination rules
Created attachment 1846642 [details]
net/mlx5: DR, Use FW API when updating FW-owned flow table
42b3d7b671b ("net/mlx5: Set reformat action when needed for termination rules") from v5.13-rc4
a01a43fa16e1 ("net/mlx5: DR, Use FW API when updating FW-owned flow table") from v5.15-rc1
attached and cleanly applied on 8.4 branch are probably the needed fix for the issue.
brew build link:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=41973349
Need clone bugzilla for RHEL to bring in driver patches Customer is requesting hotfix and this with patches when included with RHEL 8.4z Increased priority and severity as per the urgency shown by VZ for this. Ping? Is this item solved, or are we waiting for some testing? If so, any ETA? |