Bug 1859244
Summary: | Failure when modifying bridge multicast-snooping from 0 to 1 | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Fernando F. Mancera <ferferna> | ||||
Component: | kernel | Assignee: | Ivan Vecera <ivecera> | ||||
kernel sub component: | Bridge | QA Contact: | Fei Liu <feliu> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | unspecified | ||||||
Priority: | unspecified | CC: | acardace, atragler, bgalvani, lrintel, network-qe, rkhan, sukulkar, thaller, till | ||||
Version: | 8.4 | ||||||
Target Milestone: | rc | ||||||
Target Release: | 8.4 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
URL: | git://cnb.usersys.redhat.com/users/ivecera/rhel-8.git#bz1859244 | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-4.18.0-242.el8 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-05-18 13:54:36 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1855160 | ||||||
Attachments: |
|
Description
Fernando F. Mancera
2020-07-21 14:44:11 UTC
Created attachment 1701925 [details] Shell script to reproduce the issue # uname -a Linux hp-dl388g8-04.rhts.eng.pek2.redhat.com 4.18.0-226.el8.x86_64 #1 SMP Wed Jul 15 07:40:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux # nmcli -o con show br_test connection.id: br_test connection.uuid: 96c53115-7475-47e8-be16-23c1083a1838 connection.type: bridge connection.interface-name: br_test connection.timestamp: 1595342459 connection.autoconnect-slaves: 1 (yes) connection.lldp: disable 802-3-ethernet.cloned-mac-address: AA:CF:A8:2D:6A:48 802-3-ethernet.mtu: 1500 ipv4.method: manual ipv4.addresses: 192.168.0.254/24 ipv4.dhcp-client-id: mac ipv6.method: manual ipv6.addresses: 2000::a/64 ipv6.addr-gen-mode: eui64 ipv6.dhcp-duid: ll ipv6.dhcp-iaid: mac bridge.stp: no It seems not to reproduce initially when activating/creating the device. However, it could be reproduced by externally clearing multicast_snooping (via sysfs), and then reactivate the profil: [1595339576.9233] platform-linux: sysctl: setting 'net:/sys/class/net/br_test/bridge/multicast_snooping' to '1' (current value is '0') [1595339576.9233] platform-linux: sysctl: failed to set 'bridge/multicast_snooping' to '1': (17) File exists With the attached script I am sometimes able to reproduce this. It does not always happen, but when it happens you see SETTING SNOOPING: ./rh1859244-reproducer.sh: line 24: echo: write error: File exists SETTING SNOOPING DONE 0 Seems upstream kernel reworked this code heavily in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=19e3a9c90c53479fecaa02307bf2db5ab8b3ffe3 Reassigning to Kernel for investigation. The root cause is a race condition between br_multicast_set_hash_max() and br_multicast_toggle()... A setting of sysfs variable bridge/hash_max calls br_multicast_set_hash_max() that calls br_mdb_rehash() to copy old hash table to new hashtable. The hashtable is protected by RCU and the old one freed after grace period. If you set bridge/multicast_snooping that calls br_multicast_toggle() before grace period the old hash table still exists and -EEXIST is returned. The mentioned commit removes the custom hash-table implementation in facor of Linux generic rhashtable. After this refactoring the mentioned race-condition is no longer possible. Simplified reproducer: #!/bin/sh ip link add br_test type bridge ip link set up br_test n=1 while true; do ip link set down br_test echo 0 > /sys/class/net/br_test/bridge/multicast_snooping sleep 0.1 ip link set up br_test echo 4096 > /sys/class/net/br_test/bridge/hash_max echo 1 > /sys/class/net/br_test/bridge/multicast_snooping || break n=$((n+1)) test "$n" -eq 100 && break done echo "Number of attempts: $n" ip link del br_test Without proposed patch-set: [root@el8lat ~]# ./test.sh ./test.sh: line 13: echo: write error: File exists Number of attempts: 2 With proposed patch-set: [root@el8lat ~]# ./test.sh Number of attempts: 100 reproduce with the reproducer in comment5 ./test.sh ./test.sh: line 12: echo: write error: File exists Number of attempts: 2 # nmcli -v nmcli tool, version 1.26.0-8.el8 # uname -a Linux dell-per740-16.rhts.eng.pek2.redhat.com 4.18.0-240.el8.x86_64 #1 SMP Wed Sep 23 05:13:10 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux Patch(es) available on kernel-4.18.0-240.8.el8.dt4 # uname -a Linux dell-per740-26.rhts.eng.pek2.redhat.com 4.18.0-240.8.el8.dt4.x86_64 #1 SMP Fri Oct 30 14:11:00 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux# cat test.sh ip link add br_test type bridge ip link set up br_test n=1 while true; do ip link set down br_test echo 0 > /sys/class/net/br_test/bridge/multicast_snooping sleep 0.1 ip link set up br_test echo 4096 > /sys/class/net/br_test/bridge/hash_max echo 1 > /sys/class/net/br_test/bridge/multicast_snooping || break n=$((n+1)) test "$n" -eq 100 && break done echo "Number of attempts: $n" ip link del br_test [root@dell-per740-26 ~]# ./test.sh Number of attempts: 100 run bridge case , didn't find new issue Patch(es) available on kernel-4.18.0-242.el8 # uname -r 4.18.0-242.el8.x86_64 [root@dell-per740-16 ~]# ./test.sh Number of attempts: 100 [root@dell-per740-16 ~]# cat test.sh #!/bin/sh ip link add br_test type bridge ip link set up br_test n=1 while true; do ip link set down br_test echo 0 > /sys/class/net/br_test/bridge/multicast_snooping sleep 0.1 ip link set up br_test echo 4096 > /sys/class/net/br_test/bridge/hash_max echo 1 > /sys/class/net/br_test/bridge/multicast_snooping || break n=$((n+1)) test "$n" -eq 100 && break done echo "Number of attempts: $n" ip link del br_test run bridge case , didn't find new issue https://beaker.engineering.redhat.com/jobs/4709353 https://beaker.engineering.redhat.com/jobs/4709354 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1578 |