Bug 1848631

Summary: [BISECTED][ath9k_htc] kernel 5.6.19 regression with Atheros 9271: No WLAN after update
Product: [Fedora] Fedora Reporter: Ali Akcaagac <aliakc>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 32CC: acaringi, airlied, alexander.hesse85, bskeggs, caleb, davidjeremias82, hdegoede, ichavero, itamar, jarodwilson, jeremy, jforbes, jglisse, john.j5live, jonathan, josef, jpisaniello, kernel-maint, lgoncalv, linville, masami256, mchehab, mjg59, paulg-b, steved, y9t7sypezp
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-23 13:18:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kernel.log
none
bisect.log none

Description Ali Akcaagac 2020-06-18 16:18:36 UTC
Created attachment 1697986 [details]
kernel.log

1. Please describe the problem:
After updating to kernel 5.6.19 and rebooting into the system I ended up with No WLAN connection.


2. What is the Version-Release number of the kernel:
5.6.19 x86_64


3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
Yes I did work previously and with all kernels before the 5.6.19 update. I have an atheros AR9271 connected to an USB hub that usually works. It gets detected, the firmware loads up, networkmanager connects it to WLAN done.

With the update to 5.6.19 the Atheros device is still detected, the firmware loads up properly but NO WLAN connection happens.

I had to downgrade to 5.6.18 again to get it working.


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:


6. Are you running any modules that not shipped with directly Fedora's kernel?:


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Steve 2020-06-18 18:17:05 UTC
Try removing and adding the wireless networking device.

> [   23.699050] usb 1-1.2: ath9k_htc: Firmware ath9k_htc/htc_9271-1.4.0.fw requested

There are some ath9k commits in 5.8, so if the above doesn't work, you could try testing with a 5.8 kernel:

https://koji.fedoraproject.org/koji/packageinfo?packageID=8

kernel-core and kernel-modules should be sufficient.

After downloading to an empty directory, install with:

# dnf install kernel*.rpm

Comment 2 Steve 2020-06-18 18:19:28 UTC
Clarification: Try removing and adding the wireless networking device [from the desktop networking app].

Comment 3 Ali Akcaagac 2020-06-18 18:53:02 UTC
Well I rmmod and insmod the module on 5.6.19 and still no WLAN. Networkmanager doesn't even show up any Wifi entries (including not showing up the corresponding wifi menu entry).

Same with:
kernel-5.8.0-0.rc1.20200617git69119673bd50.1.fc33.x86_64
kernel-core-5.8.0-0.rc1.20200617git69119673bd50.1.fc33.x86_64
kernel-modules-5.8.0-0.rc1.20200617git69119673bd50.1.fc33.x86_64

Atheros is basicly DEAD with one of the changes that made it into 5.6.19 (and upwards)

ath9k: Fix general protection fault in ath9k_hif_usb_rx_cb
ath9k: Fix use-after-free Read in ath9k_wmi_ctrl_rx
ath9k: Fix use-after-free Read in htc_connect_service
ath9k: Fix use-after-free Write in ath9k_htc_rx_msg
ath9k_htc: Silence undersized packet warnings
ath9x: Fix stack-out-of-bounds Write in ath9k_hif_usb_rx_cb

So basicly one of these fixes (or more) causes the regression for me.

Comment 4 Ali Akcaagac 2020-06-18 19:21:38 UTC
I'd like to mention, that this Atheros WLAN device is the following brand:

Alfa AWUS036NHA

The thing, that made its rounds years ago within e.g. the snort community and further. One of the best devices around to access networks :)

So basicly no hardware from the grave table even if the device comes into its age ;)

Comment 5 Steve 2020-06-18 19:57:03 UTC
> Atheros is basicly DEAD with one of the changes that made it into 5.6.19 (and upwards)

Thanks for pointing that out. Since you have a reliable reproducer and two well-defined end-points, this would be an ideal application of kernel bisection.

For the record:

$ git log --oneline --grep ath9k v5.6.18..v5.6.19
4eb946e960 ath9k: Fix general protection fault in ath9k_hif_usb_rx_cb
4254517ccd ath9x: Fix stack-out-of-bounds Write in ath9k_hif_usb_rx_cb
f3b4fef399 ath9k: Fix use-after-free Write in ath9k_htc_rx_msg
1bc633311a ath9k: Fix use-after-free Read in ath9k_wmi_ctrl_rx
771df971ab ath9k: Fix use-after-free Read in htc_connect_service
99e8e9ca13 ath9k_htc: Silence undersized packet warnings

$ git branch
* linux-5.6.y

Comment 6 Steve 2020-06-18 20:28:13 UTC
I suggest updating the bug summary to something like this:

[ath9k_htc] kernel 5.6.19 regression with Atheros 9271: No WLAN after update

For the record:

$ egrep -C0 '1-1.2|ath9k' kernel-log.txt 
[    2.540416] usb 1-1.2: new high-speed USB device number 4 using ehci-pci
[    2.653745] usb 1-1.2: New USB device found, idVendor=0cf3, idProduct=9271, bcdDevice= 1.08
[    2.653751] usb 1-1.2: New USB device strings: Mfr=16, Product=32, SerialNumber=48
[    2.653754] usb 1-1.2: Product: UB91C
[    2.653757] usb 1-1.2: Manufacturer: ATHEROS
[    2.653760] usb 1-1.2: SerialNumber: 12345
--
[   23.699050] usb 1-1.2: ath9k_htc: Firmware ath9k_htc/htc_9271-1.4.0.fw requested
[   23.702128] usbcore: registered new interface driver ath9k_htc
[   24.059330] usb 1-1.2: ath9k_htc: Transferred FW: ath9k_htc/htc_9271-1.4.0.fw, size: 51008
[   24.311190] ath9k_htc 1-1.2:1.0: ath9k_htc: HTC initialized with 33 credits

Comment 7 Ali Akcaagac 2020-06-18 21:26:57 UTC
I put the bisecting part on my todo list for the upcoming weekend. Thanks for the feedback. I'll report back, once I can isolate the causing *fix*

Comment 8 Ali Akcaagac 2020-06-20 19:34:32 UTC
I was able to bisect it down to this:

4254517ccd ath9x: Fix stack-out-of-bounds Write in ath9k_hif_usb_rx_cb

After half a dozen of recompiles I ended up with the following log (please have a look at the attachment). I suggest reviewing the affecting commit or leave it out until it gets properly fixed.

Temporary solution would be a recompile of the fedora kernel 5.6.19-301 (or upwards) with the above commit reverted.

Comment 9 Ali Akcaagac 2020-06-20 19:35:10 UTC
Created attachment 1698205 [details]
bisect.log

Comment 10 Ali Akcaagac 2020-06-20 19:37:07 UTC
The correct one:

    ath9k: Fix general protection fault in ath9k_hif_usb_rx_cb
    
    commit 2bbcaaee1fcbd83272e29f31e2bb7e70d8c49e05 upstream.

Comment 11 Ali Akcaagac 2020-06-20 20:57:22 UTC
Most likely related:

https://bugzilla.kernel.org/show_bug.cgi?id=208251

People ended up bisecting to the same *causing* commit.

Comment 12 Steve 2020-06-20 23:29:54 UTC
Thanks for doing the bisection and for posting your results upstream.

I suggest putting "[BISECTED]" at the beginning of the bug summary and adding a link to the upstream bug in the "Links" section near the top of this bug report.

For the record, this is the commit in the mainline repo:

$ git describe --contains 2bbcaaee1fcbd83272e29f31e2bb7e70d8c49e05
v5.8-rc1~165^2~261^2~85^2~7
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2bbcaaee1fcbd83272e29f31e2bb7e70d8c49e05

The upstream report is against 5.4.47, which had the same* commit merged recently:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=b5c8896bc14f54e5c4dd5a6e42879f125b8abd2d

* Stable commit IDs may be different from mainline commit IDs.

Comment 13 Steve 2020-06-21 00:03:57 UTC
> Temporary solution would be a recompile of the fedora kernel 5.6.19-301 (or upwards) with the above commit reverted.

Fedora kernels follow upstream kernels very closely. The 5.6.y stable branch has reached EOL*, so the next Fedora kernel release will probably be on the 5.7.y branch:

https://bodhi.fedoraproject.org/updates/?packages=kernel

* https://www.kernel.org/

Comment 14 Alex 2020-07-04 08:40:45 UTC
I just asked on a related thread about this issue on askfedora and was told, that the update to kernel 5.7.x should have fixed the issue.

Well, it didnt for me. Should i file a new bug? Could i assist by providing any logs?

Comment 15 Florian Sievert 2020-07-04 09:10:36 UTC
I also can confirm that an upgrade to kernel 5.7.6 does not fix the issue for the AWSUS036NHA here.

Comment 16 Ali Akcaagac 2020-07-04 11:24:58 UTC
(In reply to Alex from comment #14)
> Should i file a new bug?
Not needed since the issue is already known. Many people bisected all affected kernels and ended up with the same single commit, that causes the issue.

Further reading here:

https://bugzilla.kernel.org/show_bug.cgi?id=208251

https://lore.kernel.org/lkml/CAEJqkgjV8p6LtBV8YUGbNb0vYzKOQt4-AMAvYw5mzFr3eicyTg@mail.gmail.com/
https://lore.kernel.org/linux-wireless/87lfkff9qe.fsf@codeaurora.org/

Affected are 5.4.x, 5.6.19, 5.7.x and 5.8.0-rc3

One of the maintainers provided a patch that hopefully will soon be caught up within the kernels and stable kernels.

> I just asked on a related thread about this issue on askfedora and was told,
> that the update to kernel 5.7.x should have fixed the issue.

Don't listen to askfedora. No one can shoot out of hot air that "some bug" has been fixed in "some upcoming kernel". It was most likely an generic answer to try a newer kernel in the hope, that this issue has been solved.

Comment 17 Steve 2020-07-04 21:59:45 UTC
This appears to be the revert patch:

[v2] Revert "ath9k: Fix general protection fault in ath9k_hif_usb_rx_cb" 
https://patchwork.kernel.org/patch/11637341/

The upstream kernel git repos are the best place to look for kernel updates. These are searches for "ath9k":

mainline:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=ath9k

stable (5.7.y):
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=linux-5.7.y&qt=grep&q=ath9k

Comment 18 David Vásquez 2020-07-14 08:31:02 UTC
kernel-5.6.16-300 and same issue...

Comment 19 David Vásquez 2020-07-14 08:32:19 UTC
Sorry, kernel-5.7.8-200 same issue...

Comment 20 Hans de Goede 2020-07-14 10:00:19 UTC
I've send an email to Greg Kroah-Hartman / stable.org asking for the offending commit to be fixed. I expect the revert to show up in 5.7.9.

Comment 21 Ali Akcaagac 2020-07-22 18:45:43 UTC
Confirming that the issue has been solved by the fix that made it into 5.7.9 (and other branches).

https://bugzilla.kernel.org/show_bug.cgi?id=208251#c29

Comment 22 Justin M. Forbes 2020-07-23 13:18:02 UTC
Thank you for confirming.