Bug 2224982
| Summary: | [ovs dpdk bonding]ovs-acppctl bond/show status is earlier than ovs-vsctl get interface link_state | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | mhou <mhou> |
| Component: | openvswitch | Assignee: | Mike Pattrick <mpattric> |
| openvswitch sub component: | ovs-dpdk | QA Contact: | mhou <mhou> |
| Status: | CLOSED NOTABUG | Docs Contact: | |
| Severity: | medium | ||
| Priority: | unspecified | CC: | ctrautma, jhsiao, ktraynor |
| Version: | RHEL 9.0 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-07-25 02:23:32 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
*** Bug 2224981 has been marked as a duplicate of this bug. *** After check with Mike. Mike thinks this phenomenon is expected. The bond log message happens during the bridge update, so it could be delayed if something else on the bridge is taking too long. So neither are instantaneous. The DPDK log message is just on a 5 second timer. the bond log message starts off in the main run() function. run() keeps everything up to date and that means calling bond_run(). bond_run() updates which ports are active / inactive, up or down, and decides if flows need to be revalidated. These run in different threads, and because of that 5 second timer. You're a lot more likely to get the bond log message first. If you ran this test 100 times, I bet 75% of the time, the bond log message would come first. Because run() takes less than 5 seconds usually. may_enable is not the same as link status,It also includes things like lacp status. If corresponding switch port down, the lacp status will change. Then appctl check the bond change and display it. lacp status changed earlier than link flapping. That's why may_enable: false. ovs-vsctl get interface ens1f0np0 link_state is complex. OVS is setting that value in OVS DB. So there could be a delay there. |
Description of problem: create balance-slb port or balance-tcp port with lacp as active and bond-downdelay=60000 bond-updelay=60000. When try down active member, find may_enable status (ovs-appctl bond/show)changed before lisk_state changed(ovs-vsctl get interface) This issue occurred when ovs side configure lacp as active. Version-Release number of selected component (if applicable): openvswitch3.1-3.1.0-35.el9fdp test nic: # ethtool -i ens1f0np0 driver: mlx5_core version: 5.14.0-341.el9.x86_64 firmware-version: 16.34.1002 (MT_0000000080) expansion-rom-version: bus-info: 0000:12:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes # ethtool -i ens1f1np1 driver: mlx5_core version: 5.14.0-341.el9.x86_64 firmware-version: 16.34.1002 (MT_0000000080) expansion-rom-version: bus-info: 0000:12:00.1 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes How reproducible: always Steps to Reproduce: 1. create ovs dpdkbonding port. systemctl restart openvswitch ovs-appctl vlog/set console:lacp:dbg syslog:lacp:dbg file:lacp:dbg ovs-appctl vlog/set console:bond:dbg syslog:bond:dbg file:bond:dbg ovs-appctl vlog/set console:ovs_rcu:dbg syslog:ovs_rcu:dbg file:ovs_rcu:dbg ovs-appctl vlog/set console:memory:dbg syslog:memory:dbg file:memory:dbg ovs-vsctl set Open_vSwitch . other_config={} ovs-vsctl --if-exists del-br bondbridge ovs-vsctl --may-exist add-br bondbridge ovs-vsctl set int bondbridge mtu_request=9000 ovs-ofctl mod-port bondbridge bondbridge up ovs-vsctl add-bond bondbridge balance-slb ens1f1np1 ens1f0np0 lacp=active bond_mode=balance-slb -- set Interface ens1f1np1 type=dpdk options:dpdk-devargs=class=eth,mac=04:3f:72:b0:35:33 options:n_rxq=4 mtu_request=9000 -- set Interface ens1f0np0 type=dpdk options:dpdk-devargs=class=eth,mac=04:3f:72:b0:35:32 options:n_rxq=4 mtu_request=9000 -- set Port balance-slb bond-downdelay=60000 bond-updelay=60000 please note: use pci address as devargs won't aviod this issue. ovs-vsctl add-bond bondbridge balance-slb ens1f1np1 ens1f0np0 lacp=active bond_mode=balance-slb -- set Interface ens1f1np1 type=dpdk options:dpdk-devargs=0000:12:00.0 options:n_rxq=4 mtu_request=9000 -- set Interface ens1f0np0 type=dpdk options:dpdk-devargs=0000:12:00.1 options:n_rxq=4 mtu_request=9000 -- set Port balance-slb bond-downdelay=60000 bond-updelay=60000 2. check lacp status and bond status # ovs-appctl bond/show ---- balance-slb ---- bond_mode: balance-slb bond may use recirculation: no, Recirc-ID : -1 bond-hash-basis: 0 lb_output action: disabled, bond-id: -1 all members active: false updelay: 60000 ms downdelay: 60000 ms next rebalance: 5495 ms lacp_status: negotiated lacp_fallback_ab: false active-backup primary: <none> active member mac: 04:3f:72:b0:35:32(ens1f0np0) member ens1f0np0: enabled active member may_enable: true member ens1f1np1: enabled may_enable: true # ovs-appctl lacp/show ---- balance-slb ---- status: active negotiated sys_id: 04:3f:72:b0:35:32 sys_priority: 65534 aggregation key: 7 lacp_time: slow member: ens1f0np0: current attached port_id: 8 port_priority: 65535 may_enable: true actor sys_id: 04:3f:72:b0:35:32 actor sys_priority: 65534 actor port_id: 8 actor port_priority: 65535 actor key: 7 actor state: activity aggregation synchronized collecting distributing partner sys_id: b0:c5:3c:f6:36:c7 partner sys_priority: 32768 partner port_id: 265 partner port_priority: 32768 partner key: 56 partner state: activity aggregation synchronized collecting distributing member: ens1f1np1: current attached port_id: 7 port_priority: 65535 may_enable: true actor sys_id: 04:3f:72:b0:35:32 actor sys_priority: 65534 actor port_id: 7 actor port_priority: 65535 actor key: 7 actor state: activity aggregation synchronized collecting distributing partner sys_id: b0:c5:3c:f6:36:c7 partner sys_priority: 32768 partner port_id: 266 partner port_priority: 32768 partner key: 56 partner state: activity aggregation synchronized collecting distributing 3. down current active member on switch side. # swcfg port_down 9364 "eth1/3/1" spawn ssh admin.88.13 User Access Verification (admin.88.13) Password: Cisco Nexus Operating System (NX-OS) Software TAC support: http://www.cisco.com/tac Copyright (C) 2002-2022, Cisco and/or its affiliates. All rights reserved. The copyrights to certain works contained in this software are owned by other third parties and used and distributed under their own licenses, such as open source. This software is provided "as is," and unless otherwise stated, there is no warranty, express or implied, including but not limited to warranties of merchantability and fitness for a particular purpose. Certain components of this software are licensed under the GNU General Public License (GPL) version 2.0 or GNU General Public License (GPL) version 3.0 or the GNU Lesser General Public License (LGPL) Version 2.1 or Lesser General Public License (LGPL) Version 2.0. A copy of each such license is available at http://www.opensource.org/licenses/gpl-2.0.php and http://opensource.org/licenses/gpl-3.0.html and http://www.opensource.org/licenses/lgpl-2.1.php and http://www.gnu.org/licenses/old-licenses/library.txt. sw-cisco9364# \r ^ % Incomplete command at '^' marker. sw-cisco9364# terminal length 0 sw-cisco9364# configure t Enter configuration commands, one per line. End with CNTL/Z. sw-cisco9364(config)# interface eth1/3/1 sw-cisco9364(config-if)# shutdown sw-cisco9364(config-if)# end sw-cisco9364# show interface eth1/3/1 status -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- Eth1/3/1 -- disabled trunk auto auto QSFP-4SFP25G-CU3M sw-cisco9364# end sw-cisco9364# 4. check the members status flapping on ovs-switchd.log. wait for 2-3s, the port link status changed. 2023-07-24T06:55:32.678Z|01850|bond|INFO|member ens1f0np0: link state down 2023-07-24T06:55:32.678Z|01851|bond|INFO|member ens1f0np0: will be disabled if it stays down for 60000 ms 2023-07-24T06:55:35.660Z|01852|netdev_dpdk|DBG|Port 0 Link Down same situation can monitor on ovs-appctl bond/show and ovs-vsctl get interface ens1f0np0 link_state 5. confiure balance-slb port with lacp passive ovs-vsctl add-bond bondbridge balance-slb ens1f1np1 ens1f0np0 lacp=passive bond_mode=balance-slb -- set Interface ens1f1np1 type=dpdk options:dpdk-devargs=0000:12:00.0 options:n_rxq=4 mtu_request=9000 -- set Interface ens1f0np0 type=dpdk options:dpdk-devargs=0000:12:00.1 options:n_rxq=4 mtu_request=9000 -- set Port balance-slb bond-downdelay=60000 bond-updelay=60000 6. down slave interface ens1f0np0 and check ovs-vswitchd.lgo swcfg port_down 9364 "eth1/3/1" spawn ssh admin.88.13 User Access Verification (admin.88.13) Password: Cisco Nexus Operating System (NX-OS) Software TAC support: http://www.cisco.com/tac Copyright (C) 2002-2022, Cisco and/or its affiliates. All rights reserved. The copyrights to certain works contained in this software are owned by other third parties and used and distributed under their own licenses, such as open source. This software is provided "as is," and unless otherwise stated, there is no warranty, express or implied, including but not limited to warranties of merchantability and fitness for a particular purpose. Certain components of this software are licensed under the GNU General Public License (GPL) version 2.0 or GNU General Public License (GPL) version 3.0 or the GNU Lesser General Public License (LGPL) Version 2.1 or Lesser General Public License (LGPL) Version 2.0. A copy of each such license is available at http://www.opensource.org/licenses/gpl-2.0.php and http://opensource.org/licenses/gpl-3.0.html and http://www.opensource.org/licenses/lgpl-2.1.php and http://www.gnu.org/licenses/old-licenses/library.txt. sw-cisco9364# \r ^ % Incomplete command at '^' marker. sw-cisco9364# terminal length 0 sw-cisco9364# configure t Enter configuration commands, one per line. End with CNTL/Z. sw-cisco9364(config)# interface eth1/3/1 sw-cisco9364(config-if)# shutdown sw-cisco9364(config-if)# end sw-cisco9364# show interface eth1/3/1 status -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- Eth1/3/1 -- disabled trunk auto auto QSFP-4SFP25G-CU3M sw-cisco9364# end sw-cisco9364# 2023-07-24T07:27:44.443Z|02177|netdev_dpdk|DBG|Port 0 Link Down 2023-07-24T07:27:45.952Z|02178|bond|DBG|bond balance-slb: ens1f0np0 0kB (h88: 0kB), ens1f1np1 0kB 2023-07-24T07:27:54.635Z|02179|bond|INFO|member ens1f0np0: link state down 2023-07-24T07:27:54.635Z|02180|bond|INFO|member ens1f0np0: will be disabled if it stays down for 60000 ms Actual results: 1. When ovs is configured with lacp active, the member status check of the port is earlier than the link status of the port. Expected results: 1. link status flapping should earlier than member status check. Additional info: On beaker job, I found that the member state is earlier than the port state 30 seconds to flip https://beaker.engineering.redhat.com/recipes/14281186#task163400891 https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/07/80999/8099964/14281186/163400891/ovs-vswitchd.log 2023-07-23T21:27:51.472Z|00292|bond|DBG|bond balance-slb: ens1f0np0 1kB (h40: 0kB + h161: 0kB + h183: 0kB + h140: 0kB + h241: 0kB), ens1f1np1 0kB (h156: 0kB + h108: 0kB + h5: 0kB + h72: 0kB) 2023-07-23T21:28:01.935Z|00293|bond|DBG|bond balance-slb: ens1f0np0 0kB (h40: 0kB + h161: 0kB + h183: 0kB + h140: 0kB + h241: 0kB), ens1f1np1 0kB (h156: 0kB + h108: 0kB + h5: 0kB + h72: 0kB) 2023-07-23T21:28:03.937Z|00294|bond|INFO|member ens1f0np0: link state down 2023-07-23T21:28:03.937Z|00295|bond|INFO|member ens1f0np0: will be disabled if it stays down for 60000 ms 2023-07-23T21:28:11.951Z|00296|bond|DBG|bond balance-slb: ens1f0np0 0kB (h40: 0kB + h161: 0kB + h183: 0kB + h140: 0kB + h241: 0kB), ens1f1np1 0kB (h156: 0kB + h108: 0kB + h5: 0kB + h72: 0kB) 2023-07-23T21:28:21.961Z|00297|bond|DBG|bond balance-slb: ens1f0np0 0kB (h40: 0kB + h161: 0kB + h183: 0kB + h140: 0kB + h241: 0kB), ens1f1np1 0kB (h156: 0kB + h108: 0kB + h5: 0kB + h72: 0kB) 2023-07-23T21:28:31.971Z|00298|bond|DBG|bond balance-slb: ens1f0np0 0kB (h40: 0kB + h161: 0kB + h183: 0kB + h140: 0kB + h241: 0kB), ens1f1np1 0kB (h156: 0kB + h108: 0kB + h5: 0kB + h72: 0kB) 2023-07-23T21:28:33.247Z|00299|netdev_dpdk|DBG|Port 0 Link Down 2023-07-23T21:28:42.266Z|00300|bond|DBG|bond balance-slb: ens1f0np0 0kB (h40: 0kB + h161: 0kB + h183: 0kB + h140: 0kB + h241: 0kB), ens1f1np1 0kB (h156: 0kB + h108: 0kB + h5: 0kB + h72: 0kB) 2023-07-23T21:28:52.277Z|00301|bond|DBG|bond balance-slb: ens1f0np0 0kB (h183: 0kB + h40: 0kB + h161: 0kB + h140: 0kB + h241: 0kB), ens1f1np1 0kB (h156: 0kB + h108: 0kB + h5: 0kB + h72: 0kB) 2023-07-23T21:29:02.287Z|00302|bond|DBG|bond balance-slb: ens1f0np0 0kB (h183: 0kB + h40: 0kB + h161: 0kB), ens1f1np1 0kB (h156: 0kB + h108: 0kB) 2023-07-23T21:29:03.938Z|00303|bond|INFO|member ens1f0np0: disabled 2023-07-23T21:29:03.939Z|00304|bond|INFO|bond balance-slb: active member is now ens1f1np1