Bug 2037214
| Summary: | Bond CNI: Bond types don`t work correctly except active-backup | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | elevin |
| Component: | Documentation | Assignee: | kquinn |
| Status: | CLOSED DUPLICATE | QA Contact: | Nikita <nkononov> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.10 | CC: | cgoncalves, ealcaniz, liali, mmirecki, sscheink |
| Target Milestone: | --- | ||
| Target Release: | 4.12.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-09-20 12:14:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2039755 | ||
| Bug Blocks: | |||
|
Description
elevin
2022-01-05 09:18:50 UTC
Could you please provide detailed logs? How was the bond CNI and the bond itself in RHEL configured? Franck pointed me to "man ip link" (https://man7.org/linux/man-pages/man8/ip-link.8.html): state auto|enable|disable - set the virtual link state as seen by the specified VF. Setting to auto means a reflection of the PF link state, enable lets the VF to communicate with other VFs on this host even if the PF link state is down, disable causes the HW to drop any packets sent by the VF. Marcin, perhaps the VF state is set to "enable". Per Franck's recommendation, we want it defaulted to "auto". A user-configurable option would be nice but not a must-have for OCP 4.10. "auto" is a default value of "link-state":
12: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 0c:42:a1:bc:f7:b1 brd ff:ff:ff:ff:ff:ff
vf 0 link/ether b6:79:17:6b:0a:b0 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 1 link/ether 62:6b:a6:cf:56:f8 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 2 link/ether d6:5d:93:4e:c5:bd brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 3 link/ether 9e:8c:a5:54:75:0f brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 4 link/ether 52:a3:c0:2d:93:1d brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
Even more sriovnetwork doesn`t cahnge "link-state:
Spec:
Link State: disable
Network Namespace: bond-test
Resource Name: three
Spoof Chk: off
4: ens8f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 40:a6:b7:38:b4:e0 brd ff:ff:ff:ff:ff:ff
vf 0 link/ether fa:ab:bb:63:76:87 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 1 link/ether ca:7d:c1:7f:4e:aa brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 2 link/ether 1a:57:27:b9:fa:9e brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 3 link/ether 06:34:c4:5d:70:d2 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 4 link/ether 96:07:cc:c4:b7:79 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
last my statement "Even more sriovnetwork doesn`t cahnge "link-state" is not correct Evgeny, thanks! Could you please file a new BZ against RHEL reporting this issue and set a dependency with this one? Marcin, it is unlikely to expect the RHEL bond issue to be fixed in time before OCP 4.10 GA. The release notes and/or bond CNI documentation should highlight this known issue (Doc Type field). Today, I did a basic test with mode 1,2,4,5,6 bonds over SR-IOV VFs, and didn't see any issue. ping could succeed even after failover. So, for BZ #2037214 and https://bugzilla.redhat.com/show_bug.cgi?id=2039755 , if the containers are connected by OVS/OVN virtual network, please make sure the icmp traffic is not blocked by OVS/OVN rules after failover as the src mac will change after failover. kernel: 4.18.0-305.el8.x86_64 NIC: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] (In reply to LiLiang from comment #6) > Today, I did a basic test with mode 1,2,4,5,6 bonds over SR-IOV VFs, and > didn't see any issue. ping could succeed even after failover. > > So, for BZ #2037214 and https://bugzilla.redhat.com/show_bug.cgi?id=2039755 > , if the containers are connected by OVS/OVN virtual network, please make > sure the icmp traffic is not blocked by OVS/OVN rules after failover as the > src mac will change after failover. > > kernel: 4.18.0-305.el8.x86_64 > NIC: Mellanox Technologies MT2892 Family [ConnectX-6 Dx] I did those tests with VM, not container. This VF configuration must be set via the "linkState" parameter in SR-IOV: https://docs.openshift.com/container-platform/4.10/networking/hardware_networks/configuring-sriov-net-attach.html#nw-sriov-network-object_configuring-sriov-net-attach This needs to be added as a requirement to the bond CNI documentation page that Kevin Quinn is working on in https://github.com/openshift/openshift-docs/pull/47172 Carlos, "linkState" is auto by default. "trust on" fix the issue https://docs.google.com/presentation/d/1GWLNMZl7oaVDCT7jmFl6qOsRfVR7CSoknluKP8JKqlc/edit#slide=id.g255339b51f_0_890 What if the VF is set to non-auto for a different workload, moved back to the VF pool and later added to a bond? Is the link state reset to "auto" or left unchanged? If left unchanged, the requirement that the link state must be "auto" is valid and should be documented. Evgeny `trust on` is needed to allow the bond inside the pod to change the mac address of the VF Carlos every time we move a vf back from a pod we restore its default mode. but agree with you better to document it Good, we are all in agreement. The doc request is to note that the SR-IOV VF link state must not be changed from the default "auto" value. This was replaced by the following bugs, relating to the specific modes: alb: https://bugzilla.redhat.com/show_bug.cgi?id=2109123 tlb: https://bugzilla.redhat.com/show_bug.cgi?id=2106906 *** This bug has been marked as a duplicate of bug 2109123 *** |