Bug 2270062 - NetworkManager-1.46 breaks wifi with RTL8723BS wifi chips
Summary: NetworkManager-1.46 breaks wifi with RTL8723BS wifi chips
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 40
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Lubomir Rintel
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-03-18 11:30 UTC by Hans de Goede
Modified: 2024-03-28 20:07 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Hans de Goede 2024-03-18 11:30:39 UTC
After updating a small 2-in-1 with a RTL8723BS wifi chip to F40 wifi stopped working. While debugging this I updated only NetworkManager from the latest F39 packages to F40 on another model 2-in-1 with a RTL8723BS wifi chip, leaving the rest of the distro at F39 and then wifi broke there too.

Downgrading Networkmanager to the F39 packages on the first 2-in-1 which was updated to F40, keeping the rest of the distro including wpa_supplicant at F40 restored wifi functionality.

Note chances are this is an issue with the RTL8723BS kernel driver rather then with NM itself (the driver comes from drivers/staging). I guess that one of the new wifi features in NM 1.46 may very well be hitting a so far unused broken code-path or something like that.

I didn't really notice anything useful in the logs during the failure. If you can provide me with some instructions how to gather more detailed debug logs then I'll gather those and we can see from there.

Also is there a way to force NM to not use some of the wifi features which are new in 1.46. I would be happy to try that so that this can be pinned down to being caused by one specific new feature...

I'm also happy to build a patched NM package and give that a try.


Reproducible: Always

Comment 1 Beniamino Galvani 2024-03-18 12:22:38 UTC
This is possibly caused by https://fedoraproject.org/wiki/Changes/StableSSIDMACAddress

Please check if any of the steps in section "Upgrade/compatibility impact" helps (you need to restart NetworkManager after those changes).

Comment 2 Hans de Goede 2024-03-19 14:22:02 UTC
(In reply to Beniamino Galvani from comment #1)
> This is possibly caused by
> https://fedoraproject.org/wiki/Changes/StableSSIDMACAddress
> 
> Please check if any of the steps in section "Upgrade/compatibility impact"
> helps (you need to restart NetworkManager after those changes).

Thanks that indeed is what triggers the problem, doing:

touch /etc/NetworkManager/conf.d/22-wifi-mac-addr.conf

and then restarting NetworkManager works around the problem.

Note I have other devices with Intel wifi working without problems with the same access-point, so I really suspect this is an issue with the rtl8723bs driver.

Note FWIW connecting fails *really* quickly with this option enabled. Is it possible that the driver simply does not allow changing the MAC address and that that is treated as an error rather then falling back to using the global static MAC address ?

What are the next steps om debugging this ?

Comment 3 Beniamino Galvani 2024-03-19 15:48:23 UTC
Please increase the logging level of NM by setting "level=TRACE" in the [logging] section of /etc/NetworkManager/NetworkManager.conf. Also, add the "-ddd" argument to OTHER_ARGS in /etc/sysconfig/wpa_supplicant. Then reboot the system, reproduce the connect failure and attach the "journalctl -b" output. Thank you.

Comment 5 Íñigo Huguet 2024-03-21 07:32:27 UTC
(In reply to Hans de Goede from comment #2)
> Note FWIW connecting fails *really* quickly with this option enabled. Is it
> possible that the driver simply does not allow changing the MAC address and
> that that is treated as an error rather then falling back to using the
> global static MAC address ?

That's what we suspect.

> What are the next steps om debugging this ?

Apart from providing the logs, please try to change manually the MAC address and tell us what happens:
    sudo ip link set DEVICE_NAME address A_VALID_MAC_ADDRESS

Even if this is a staging driver, if this is the reason it might fail for other drivers too, so we need to handle that case.

Comment 6 Beniamino Galvani 2024-03-21 09:33:25 UTC
While waiting for the logs, a possible fix would be:

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1898

Comment 8 Hans de Goede 2024-03-28 10:28:59 UTC
Sorry for being slow to reply. I have attached the requested logs now (as a private attachment because of included SSIDs).

> Apart from providing the logs, please try to change manually the MAC address and tell us what happens:
>   sudo ip link set DEVICE_NAME address A_VALID_MAC_ADDRESS

So I've given this a try and it seems to work (no errors printed, "echo $?" shows a 0 return value), but running "ip link show wlan0" after shows that the MAC address is unchanged. So I guess that the driver does not support changing the MAC address and also fails to properly report something like -EOPNOTSUPP as error.

Note looking at the logs NM does seem to figure out that setting the new MAC has failed, there are a bunch of:

"(wlan0): set-hw-addr: new MAC address .... not successfully set"

messages in the log, both for setting the MAC address for scanning, as well as setting a stable-ssid one, but in the stable-ssid case it seems this is then treated as a fatal error ?

Comment 9 Beniamino Galvani 2024-03-28 13:03:20 UTC
If I'm looking at the right place, it seems the driver just pretends the MAC was changed without actually updating the kernel interface:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/rtl8723bs/os_dep/os_intfs.c?h=v6.8#n278

NM always fetches the MAC after changing it to see if it was set properly.

And you are right, the failure to change the MAC during the scan is treated as non-fatal, while it's fatal during association.

I think we have come to an agreement on [1] to ignore the error if the MAC policy is set via global configuration.

[1] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1898

Comment 10 Hans de Goede 2024-03-28 20:07:12 UTC
(In reply to Beniamino Galvani from comment #9)
> If I'm looking at the right place, it seems the driver just pretends the MAC
> was changed without actually updating the kernel interface:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/
> drivers/staging/rtl8723bs/os_dep/os_intfs.c?h=v6.8#n278

I think the intention there is to allow changing the MAC before the first "if up" of the device by replacing the mac in the cached copy of the eeprom/nvram.

I'll add some debugging prints there (when I can make some more time to work on this). I assume that the bup flag checked there will always be true when NM tries to change the MAC. So I think that function should return some error when padapter->bup is true and the memcpy is skipped. I'm not familiar enough with that code to make it actually change the MAC address.

Assuming this behaves as expected when I add some debug prints I'll write a patch for the driver to at least return an error. Any preference for what the error should be?  Given the padapter->bup check I guess -EBUSY would make some sorta sense ... ?

> NM always fetches the MAC after changing it to see if it was set properly.

That explains things and that seems to be a good precaution.


Note You need to log in before you can comment on or make changes to this bug.