Bug 2240970 - 6.5.5-100.fc37.x86_64 wifi broken
Summary: 6.5.5-100.fc37.x86_64 wifi broken
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 37
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-09-27 12:30 UTC by William Bader
Modified: 2024-01-22 17:21 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-01-22 17:21:37 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
log file (839.69 KB, text/plain)
2023-09-27 12:40 UTC, William Bader
no flags Details
journalctl log file of second boot with 6.5.5-100.fc37.x86_64 (116.88 KB, text/plain)
2023-09-28 00:14 UTC, William Bader
no flags Details

Description William Bader 2023-09-27 12:30:18 UTC
1. Please describe the problem:

I updated to the 6.5.5 kernel with dnfdragora. On rebooting, wifi no longer works.
Rebooting back into 6.4.15-100.fc37.x86_64 restores wifi.

2. What is the Version-Release number of the kernel:

6.5.5

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

I can check if necessary. It worked in the final 6.4 kernel and does not work in 6.5.5


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Boot into 6.5.5

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

If I do this, can I get back to the normal stable kernels, or I am locked into rawhide?

6. Are you running any modules that not shipped with directly Fedora's kernel?:

Not as far as I know.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

It didn't find the boot, so I copied from /var/log/messages with the bad 6.5 boot and then a good reboot into 6.4 to show the differences.

Reproducible: Always

Comment 1 William Bader 2023-09-27 12:40:03 UTC
Created attachment 1990814 [details]
log file

I couldn't find the bad boot with journalctl -b #, so I copied a section of /var/log/messages that shows scripts rebuilding files line initramfs after installing the new kernel with dnfdragora, the boot into 6.5.5-100.fc37.x86_64, where iwlwifi timed out, a non-reported 'oops', and a successful reboot into 6.4.15-100.fc37.x86_64

My laptop is a Lenovo ThinkPad T15p Gen 3 with an i7-12800H Alder Lake-H cpu, 64 GB RAM, 1 TB storage, 4k 15" screen, Iris Xe Graphics (96EU) 300/1400 MHz, and an NVIDIA Corporation GA107M [GeForce RTX 3050 Mobile]

lspci returns
00:00.0 Host bridge: Intel Corporation 12th Gen Core Processor Host Bridge/DRAM Registers (rev 02)
00:01.0 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x16 Controller #1 (rev 02)
00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
00:04.0 Signal processing controller: Intel Corporation Alder Lake Innovation Platform Framework Processor Participant (rev 02)
00:06.0 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 (rev 02)
00:06.2 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #2 (rev 02)
00:07.0 PCI bridge: Intel Corporation Alder Lake-P Thunderbolt 4 PCI Express Root Port #0 (rev 02)
00:0a.0 Signal processing controller: Intel Corporation Platform Monitoring Technology (rev 01)
00:0d.0 USB controller: Intel Corporation Alder Lake-P Thunderbolt 4 USB Controller (rev 02)
00:0d.2 USB controller: Intel Corporation Alder Lake-P Thunderbolt 4 NHI #0 (rev 02)
00:14.0 USB controller: Intel Corporation Alder Lake PCH USB 3.2 xHCI Host Controller (rev 01)
00:14.2 RAM memory: Intel Corporation Alder Lake PCH Shared SRAM (rev 01)
00:14.3 Network controller: Intel Corporation Alder Lake-P PCH CNVi WiFi (rev 01)
00:16.0 Communication controller: Intel Corporation Alder Lake PCH HECI Controller (rev 01)
00:16.3 Serial controller: Intel Corporation Alder Lake AMT SOL Redirection (rev 01)
00:1c.0 PCI bridge: Intel Corporation Device 51b8 (rev 01)
00:1c.5 PCI bridge: Intel Corporation Device 51bd (rev 01)
00:1f.0 ISA bridge: Intel Corporation Alder Lake PCH eSPI Controller (rev 01)
00:1f.3 Audio device: Intel Corporation Alder Lake PCH-P High Definition Audio Controller (rev 01)
00:1f.4 SMBus: Intel Corporation Alder Lake PCH-P SMBus Host Controller (rev 01)
00:1f.5 Serial bus controller: Intel Corporation Alder Lake-P PCH SPI Controller (rev 01)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (16) I219-LM (rev 01)
01:00.0 3D controller: NVIDIA Corporation GA107M [GeForce RTX 3050 Mobile] (rev a1)
04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
0a:00.0 SD Host controller: Genesys Logic, Inc GL9755 SD Host Controller

Comment 2 William Bader 2023-09-27 12:47:05 UTC
Also, the log in the previous attachment shows a recent issue with the nvidia kernel module with lines like the ones below, but they started happening a few days ago. It started happening with a 6.4 kernel installed, so it probably isn't the problem, and I was going to wait it out, but maybe it has worse consequences on the 6.5 kernel.

Sep 27 14:12:02 scslaptop55 systemd[1]: nvidia-fallback.service - Fallback to nouveau as nvidia did not load was skipped because of a failed condition check (ConditionPathExists=!/sys/module/nvidia).

Sep 27 14:12:03 scslaptop55 kernel: NVRM: API mismatch: the client has the version 535.113.01, but#012NVRM: this kernel module has the version 535.104.05. Please#012NVRM: make sure that this kernel module and all NVIDIA driver#012NVRM: components have the same version.

Comment 3 Mark Pearson 2023-09-27 17:07:46 UTC
Hi

Thanks for flagging this - we'll see if we can reproduce and I've flagged it to Intel.
Could you try with the latest iwlwifi FW on linux-firmware please:
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/iwlwifi-ty-a0-gf-a0-84.ucode

You'll need to install in /lib/firmware and rebuild the initramfs.
Just in case it's a FW vs kernel version mismatch issue

Thanks
Mark

Comment 4 RedBearAK 2023-09-27 19:11:09 UTC
I just had a similar problem on an Acer Aspire 5 Slim, Ryzen 3700u system, where I've been using an Intel AX210 card that I installed as an upgrade from the original Qualcomm card. I've been running Fedora on the system with no issue with the same WiFi card since at least back to F36, and another almost identical laptop with a similar (possibly identical) Intel card back to at least F34, with no issue with the WiFi card. 

```sh
04:00.0 Network controller: Intel Corporation Wi-Fi 6 AX210/AX211/AX411 160MHz (rev 1a)
```

I already rebooted back into 6.4.15 and it's working again. 

When I was in 6.5.5, there was no WiFi tab in GNOME Settings, indicating that the hardware wasn't being detected or enabled or initialized in such a way that would cause it to be seen by GNOME Settings as available hardware. I didn't dig into it any further than that, just rebooted to 6.4.15. 

I should say, this is on F38, and this is the first 6.5.x kernel version that has shown up for F38 during a DNF upgrade. I figured it would be solid by now after all the testing  of the 6.5.x series on F39 beta.

Comment 5 William Bader 2023-09-27 20:12:49 UTC
(In reply to Mark Pearson from comment #3)

> Could you try with the latest iwlwifi FW on linux-firmware please:
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
> tree/iwlwifi-ty-a0-gf-a0-84.ucode
> 
> You'll need to install in /lib/firmware and rebuild the initramfs.
> Just in case it's a FW vs kernel version mismatch issue

I can try it if you tell me the commands to run and the commands to revert it if it doesn't work on 6.5 or 6.4.

I'm a little cautious because I'm traveling, and I'm stuck if I do something where simply booting an old kernel won't get me back on the wifi.
I temporarily set installonly_limit=0 in /etc/dnf/dnf.conf to keep the old kernels from being purged.
I've done kernel bisections before and I've tested kernels from koji, but that might not help if the problem is firmware.

I have Mate Desktop with Fedora 37, which has NetworkManager in the panel, and when I booted 6.5, right clicking on NetworkManager didn't show anything with Wi-Fi Networks. The line was either missing or grayed out. Similar to comment #4 it looked like the wifi hardware wasn't detected, and I rebooted to the last 6.4 kernel and looked at /var/log/messages. When I booted 6.5, I could tell that something was wrong because the splash screen stayed up for several minutes (which I think that the log shows was due to a series of time-outs and retries).

Regards, William

Comment 6 William Bader 2023-09-28 00:14:16 UTC
Created attachment 1990876 [details]
journalctl log file of second boot with 6.5.5-100.fc37.x86_64

Someone reported the same problem with kernel 6.5.2 on ArchLinux and added that it happened only the first time booting, and after that it was ok. I have the link below.

https://bbs.archlinux.org/viewtopic.php?id=288765 (Wifi randomly lost after kernel upgrade to 6.5.2)

So then I tried rebooting to 6.5.5-100.fc37.x86_64 and it is working, although /var/log/messages shows a number of lines "kernel: iwlwifi 0000:00:14.3: WRT: Invalid buffer destination", so something still isn't right.

I attached the results of `journalctl --no-hostname -k`

Maybe the network problem hasn't been heavily reported because rebooting is a natural thing to try, and seems to fix the connection problem (although /var/log/messages still shows some errors).

Is it possible that the first boot of a 6.5 kernel updates firmware but something with the update leaves the device in a bad state for the rest of the boot?

The 6.5.5 kernel seems to have fixed the other issue with different nvidia versions in the kernel and modules.

Regards, William

Comment 7 RedBearAK 2023-09-28 01:04:21 UTC
Copied from a comment on the Fedora subreddit, posted by user "UsedToLikeThisStuff": 

I was able to run `echo -n 1 | sudo tee /sys/bus/pci/devices/$device/remove` where $device is the PCI ID of the Wi-Fi card, and then perform a rescan with `echo -n 1 | sudo tee /sys/bus/pci/devices/$pcibusdevice/rescan` where $pcibusdevice is the PCI bus device it was connected to, and it came back up right away. Subsequent reboots seem to have working Wi-Fi, but I don’t know if it’s a random thing or not.

My suspicion is that the new firmware causes the iwlwifi kmod to crash when the previous firmware was used last time it came up.

.

Also from another user (maxb-od/maxbugod):  

Just do `rm /usr/lib/firmware/iwlwifi-ty-a0-gf-a0-83.ucode.xz`
It seems new AX210 firmaware is broken. You can alway restore the firmware by
`dnf reinstall iwlax2xx-firmware`

.

From comments on this post:  

https://www.reddit.com/r/Fedora/comments/16ttbms/f38_updated_to_kernel_655_intel_ax210_wifi_card/

Comment 8 William Bader 2023-09-28 08:58:55 UTC
It looks like it happens only on the first boot of 6.5, and later boots get "kernel: iwlwifi 0000:00:14.3: WRT: Invalid buffer destination" a few times, but wifi seems to work, so the original problem could be hard to reproduce. I didn't try the procedure in comment #7 , but I have a /usr/lib/firmware/iwlwifi-ty-a0-gf-a0-83.ucode.xz provided by iwlwifi-mvm-firmware-20230919-1.fc37.noarch , so my situation matches that comment. I'm not sure, but I think that after the dnfdragora run that loaded the 6.5.5 kernel, I tried running dnfdragora again, and it installed a package with firmware.

Comment 9 Aoife Moloney 2023-11-23 01:50:40 UTC
This message is a reminder that Fedora Linux 37 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 37 on 2023-12-05.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '37'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 37 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 10 Aoife Moloney 2024-01-22 17:21:37 UTC
Fedora Linux 37 entered end-of-life (EOL) status on 2023-12-05.

Fedora Linux 37 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.