RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1859244 - Failure when modifying bridge multicast-snooping from 0 to 1
Summary: Failure when modifying bridge multicast-snooping from 0 to 1
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kernel
Version: 8.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.4
Assignee: Ivan Vecera
QA Contact: Fei Liu
URL: git://cnb.usersys.redhat.com/users/iv...
Whiteboard:
Depends On:
Blocks: 1855160
TreeView+ depends on / blocked
 
Reported: 2020-07-21 14:44 UTC by Fernando F. Mancera
Modified: 2021-05-18 13:55 UTC (History)
9 users (show)

Fixed In Version: kernel-4.18.0-242.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-18 13:54:36 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Shell script to reproduce the issue (2.68 KB, text/plain)
2020-07-21 15:09 UTC, Thomas Haller
no flags Details

Description Fernando F. Mancera 2020-07-21 14:44:11 UTC
Description of problem:

It is not possible to change bridge multicast-snooping from 0 to 1.

Version-Release number of selected component (if applicable):

NetworkManager 1.26.0

How reproducible:

High

Actual results:

It is not possible to do it.

Expected results:

It is possible to do it.

Comment 1 Thomas Haller 2020-07-21 15:09:57 UTC
Created attachment 1701925 [details]
Shell script to reproduce the issue

# uname -a
Linux hp-dl388g8-04.rhts.eng.pek2.redhat.com 4.18.0-226.el8.x86_64 #1 SMP Wed Jul 15 07:40:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux


# nmcli -o con show br_test
connection.id:                          br_test
connection.uuid:                        96c53115-7475-47e8-be16-23c1083a1838
connection.type:                        bridge
connection.interface-name:              br_test
connection.timestamp:                   1595342459
connection.autoconnect-slaves:          1 (yes)
connection.lldp:                        disable
802-3-ethernet.cloned-mac-address:      AA:CF:A8:2D:6A:48
802-3-ethernet.mtu:                     1500
ipv4.method:                            manual
ipv4.addresses:                         192.168.0.254/24
ipv4.dhcp-client-id:                    mac
ipv6.method:                            manual
ipv6.addresses:                         2000::a/64
ipv6.addr-gen-mode:                     eui64
ipv6.dhcp-duid:                         ll
ipv6.dhcp-iaid:                         mac
bridge.stp:                             no


It seems not to reproduce initially when activating/creating the device. However, it could be reproduced by externally clearing multicast_snooping (via sysfs), and then reactivate the profil: 

[1595339576.9233] platform-linux: sysctl: setting 'net:/sys/class/net/br_test/bridge/multicast_snooping' to '1' (current value is '0')
[1595339576.9233] platform-linux: sysctl: failed to set 'bridge/multicast_snooping' to '1': (17) File exists





With the attached script I am sometimes able to reproduce this. It does not always happen, but when it happens you see

  SETTING SNOOPING:
  ./rh1859244-reproducer.sh: line 24: echo: write error: File exists
  SETTING SNOOPING DONE
  0



Seems upstream kernel reworked this code heavily in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=19e3a9c90c53479fecaa02307bf2db5ab8b3ffe3



Reassigning to Kernel for investigation.

Comment 3 Ivan Vecera 2020-09-30 17:47:11 UTC
The root cause is a race condition between br_multicast_set_hash_max() and br_multicast_toggle()...

A setting of sysfs variable bridge/hash_max calls br_multicast_set_hash_max() that calls br_mdb_rehash() to copy old hash table to new hashtable. The hashtable is protected by RCU and the old one freed after grace period. If you set bridge/multicast_snooping that calls br_multicast_toggle() before grace period the old hash table still exists and -EEXIST is returned.

The mentioned commit removes the custom hash-table implementation in facor of Linux generic rhashtable. After this refactoring the mentioned race-condition is no longer possible.

Simplified reproducer:

#!/bin/sh

ip link add br_test type bridge
ip link set up br_test

n=1
while true; do
        ip link set down br_test
        echo 0 > /sys/class/net/br_test/bridge/multicast_snooping
        sleep 0.1
        ip link set up br_test
        echo 4096 > /sys/class/net/br_test/bridge/hash_max
        echo 1 > /sys/class/net/br_test/bridge/multicast_snooping || break
        n=$((n+1))
        test "$n" -eq 100 && break
done

echo "Number of attempts: $n"

ip link del br_test

Comment 5 Ivan Vecera 2020-09-30 17:52:02 UTC
Without proposed patch-set:

[root@el8lat ~]# ./test.sh 
./test.sh: line 13: echo: write error: File exists
Number of attempts: 2

With proposed patch-set:

[root@el8lat ~]# ./test.sh 
Number of attempts: 100

Comment 6 Fei Liu 2020-10-14 07:56:55 UTC
reproduce with the reproducer in comment5

./test.sh 
./test.sh: line 12: echo: write error: File exists
Number of attempts: 2


# nmcli -v
nmcli tool, version 1.26.0-8.el8
# uname -a
Linux dell-per740-16.rhts.eng.pek2.redhat.com 4.18.0-240.el8.x86_64 #1 SMP Wed Sep 23 05:13:10 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux

Comment 7 Jan Stancek 2020-10-31 07:34:03 UTC
Patch(es) available on kernel-4.18.0-240.8.el8.dt4

Comment 8 Fei Liu 2020-11-02 07:08:20 UTC
# uname -a
Linux dell-per740-26.rhts.eng.pek2.redhat.com 4.18.0-240.8.el8.dt4.x86_64 #1 SMP Fri Oct 30 14:11:00 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux# cat test.sh 
ip link add br_test type bridge
ip link set up br_test

n=1
while true; do
        ip link set down br_test
        echo 0 > /sys/class/net/br_test/bridge/multicast_snooping
        sleep 0.1
        ip link set up br_test
        echo 4096 > /sys/class/net/br_test/bridge/hash_max
        echo 1 > /sys/class/net/br_test/bridge/multicast_snooping || break
        n=$((n+1))
        test "$n" -eq 100 && break
done

echo "Number of attempts: $n"

ip link del br_test

[root@dell-per740-26 ~]# ./test.sh 
Number of attempts: 100


run bridge case , didn't find new issue

Comment 9 Jan Stancek 2020-11-04 19:14:16 UTC
Patch(es) available on kernel-4.18.0-242.el8

Comment 14 Fei Liu 2020-11-09 03:50:26 UTC
# uname -r
4.18.0-242.el8.x86_64

[root@dell-per740-16 ~]# ./test.sh 
Number of attempts: 100
[root@dell-per740-16 ~]# cat test.sh 
#!/bin/sh

ip link add br_test type bridge
ip link set up br_test

n=1
while true; do
        ip link set down br_test
        echo 0 > /sys/class/net/br_test/bridge/multicast_snooping
        sleep 0.1
        ip link set up br_test
        echo 4096 > /sys/class/net/br_test/bridge/hash_max
        echo 1 > /sys/class/net/br_test/bridge/multicast_snooping || break
        n=$((n+1))
        test "$n" -eq 100 && break
done

echo "Number of attempts: $n"

ip link del br_test

run bridge case , didn't find new issue
https://beaker.engineering.redhat.com/jobs/4709353
https://beaker.engineering.redhat.com/jobs/4709354

Comment 16 errata-xmlrpc 2021-05-18 13:54:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1578


Note You need to log in before you can comment on or make changes to this bug.