Bug 1859244
| Summary: | Failure when modifying bridge multicast-snooping from 0 to 1 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Fernando F. Mancera <ferferna> | ||||
| Component: | kernel | Assignee: | Ivan Vecera <ivecera> | ||||
| kernel sub component: | Bridge | QA Contact: | Fei Liu <feliu> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | unspecified | ||||||
| Priority: | unspecified | CC: | acardace, atragler, bgalvani, lrintel, network-qe, rkhan, sukulkar, thaller, till | ||||
| Version: | 8.4 | Flags: | pm-rhel:
mirror+
|
||||
| Target Milestone: | rc | ||||||
| Target Release: | 8.4 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| URL: | git://cnb.usersys.redhat.com/users/ivecera/rhel-8.git#bz1859244 | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | kernel-4.18.0-242.el8 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-05-18 13:54:36 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1855160 | ||||||
| Attachments: |
|
||||||
|
Description
Fernando F. Mancera
2020-07-21 14:44:11 UTC
Created attachment 1701925 [details] Shell script to reproduce the issue # uname -a Linux hp-dl388g8-04.rhts.eng.pek2.redhat.com 4.18.0-226.el8.x86_64 #1 SMP Wed Jul 15 07:40:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux # nmcli -o con show br_test connection.id: br_test connection.uuid: 96c53115-7475-47e8-be16-23c1083a1838 connection.type: bridge connection.interface-name: br_test connection.timestamp: 1595342459 connection.autoconnect-slaves: 1 (yes) connection.lldp: disable 802-3-ethernet.cloned-mac-address: AA:CF:A8:2D:6A:48 802-3-ethernet.mtu: 1500 ipv4.method: manual ipv4.addresses: 192.168.0.254/24 ipv4.dhcp-client-id: mac ipv6.method: manual ipv6.addresses: 2000::a/64 ipv6.addr-gen-mode: eui64 ipv6.dhcp-duid: ll ipv6.dhcp-iaid: mac bridge.stp: no It seems not to reproduce initially when activating/creating the device. However, it could be reproduced by externally clearing multicast_snooping (via sysfs), and then reactivate the profil: [1595339576.9233] platform-linux: sysctl: setting 'net:/sys/class/net/br_test/bridge/multicast_snooping' to '1' (current value is '0') [1595339576.9233] platform-linux: sysctl: failed to set 'bridge/multicast_snooping' to '1': (17) File exists With the attached script I am sometimes able to reproduce this. It does not always happen, but when it happens you see SETTING SNOOPING: ./rh1859244-reproducer.sh: line 24: echo: write error: File exists SETTING SNOOPING DONE 0 Seems upstream kernel reworked this code heavily in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=19e3a9c90c53479fecaa02307bf2db5ab8b3ffe3 Reassigning to Kernel for investigation. The root cause is a race condition between br_multicast_set_hash_max() and br_multicast_toggle()...
A setting of sysfs variable bridge/hash_max calls br_multicast_set_hash_max() that calls br_mdb_rehash() to copy old hash table to new hashtable. The hashtable is protected by RCU and the old one freed after grace period. If you set bridge/multicast_snooping that calls br_multicast_toggle() before grace period the old hash table still exists and -EEXIST is returned.
The mentioned commit removes the custom hash-table implementation in facor of Linux generic rhashtable. After this refactoring the mentioned race-condition is no longer possible.
Simplified reproducer:
#!/bin/sh
ip link add br_test type bridge
ip link set up br_test
n=1
while true; do
ip link set down br_test
echo 0 > /sys/class/net/br_test/bridge/multicast_snooping
sleep 0.1
ip link set up br_test
echo 4096 > /sys/class/net/br_test/bridge/hash_max
echo 1 > /sys/class/net/br_test/bridge/multicast_snooping || break
n=$((n+1))
test "$n" -eq 100 && break
done
echo "Number of attempts: $n"
ip link del br_test
Without proposed patch-set: [root@el8lat ~]# ./test.sh ./test.sh: line 13: echo: write error: File exists Number of attempts: 2 With proposed patch-set: [root@el8lat ~]# ./test.sh Number of attempts: 100 reproduce with the reproducer in comment5 ./test.sh ./test.sh: line 12: echo: write error: File exists Number of attempts: 2 # nmcli -v nmcli tool, version 1.26.0-8.el8 # uname -a Linux dell-per740-16.rhts.eng.pek2.redhat.com 4.18.0-240.el8.x86_64 #1 SMP Wed Sep 23 05:13:10 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux Patch(es) available on kernel-4.18.0-240.8.el8.dt4
# uname -a
Linux dell-per740-26.rhts.eng.pek2.redhat.com 4.18.0-240.8.el8.dt4.x86_64 #1 SMP Fri Oct 30 14:11:00 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux# cat test.sh
ip link add br_test type bridge
ip link set up br_test
n=1
while true; do
ip link set down br_test
echo 0 > /sys/class/net/br_test/bridge/multicast_snooping
sleep 0.1
ip link set up br_test
echo 4096 > /sys/class/net/br_test/bridge/hash_max
echo 1 > /sys/class/net/br_test/bridge/multicast_snooping || break
n=$((n+1))
test "$n" -eq 100 && break
done
echo "Number of attempts: $n"
ip link del br_test
[root@dell-per740-26 ~]# ./test.sh
Number of attempts: 100
run bridge case , didn't find new issue
Patch(es) available on kernel-4.18.0-242.el8
# uname -r
4.18.0-242.el8.x86_64
[root@dell-per740-16 ~]# ./test.sh
Number of attempts: 100
[root@dell-per740-16 ~]# cat test.sh
#!/bin/sh
ip link add br_test type bridge
ip link set up br_test
n=1
while true; do
ip link set down br_test
echo 0 > /sys/class/net/br_test/bridge/multicast_snooping
sleep 0.1
ip link set up br_test
echo 4096 > /sys/class/net/br_test/bridge/hash_max
echo 1 > /sys/class/net/br_test/bridge/multicast_snooping || break
n=$((n+1))
test "$n" -eq 100 && break
done
echo "Number of attempts: $n"
ip link del br_test
run bridge case , didn't find new issue
https://beaker.engineering.redhat.com/jobs/4709353
https://beaker.engineering.redhat.com/jobs/4709354
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1578 |