Bug 1846802 - ath10k_pci firmware crashed
Summary: ath10k_pci firmware crashed
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: linux-firmware
Version: 35
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: David Woodhouse
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1844085 2034244 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-14 15:06 UTC by James
Modified: 2023-09-18 00:21 UTC (History)
16 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2022-12-13 15:15:17 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description James 2020-06-14 15:06:31 UTC
Description of problem:
Seen in dmesg, after the network connection died:

[63222.552529] ath10k_pci 0000:02:00.0: firmware crashed! (guid 45af115a-c256-436a-8fb0-417d66113942)
[63222.552535] ath10k_pci 0000:02:00.0: qca9377 hw1.1 target 0x05020001 chip_id 0x003821ff sub 1028:1810
[63222.552537] ath10k_pci 0000:02:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 0 testmode 0
[63222.553670] ath10k_pci 0000:02:00.0: firmware ver WLAN.TF.2.1-00021-QCARMSWP-1 api 6 features wowlan,ignore-otp crc32 42e41877
[63222.554091] ath10k_pci 0000:02:00.0: board_file api 2 bmi_id N/A crc32 8aedfa4a
[63222.554095] ath10k_pci 0000:02:00.0: htt-ver 3.56 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
[63222.554299] ath10k_pci 0000:02:00.0: firmware register dump:
[63222.554302] ath10k_pci 0000:02:00.0: [00]: 0x05020001 0x00000000 0x809E1F2B 0x00000044
[63222.554304] ath10k_pci 0000:02:00.0: [04]: 0x809E1F2B 0x0040E9B8 0x00000000 0x00180207
[63222.554306] ath10k_pci 0000:02:00.0: [08]: 0xDEADC0DE 0x0000000C 0x00000023 0x00431664
[63222.554308] ath10k_pci 0000:02:00.0: [12]: 0x809FA4B5 0x0040E9A8 0xAA0A10CB 0x00018200
[63222.554310] ath10k_pci 0000:02:00.0: [16]: 0x0042A080 0x0042A0C8 0x0000001B 0x00000001
[63222.554311] ath10k_pci 0000:02:00.0: [20]: 0x8093853B 0x0040E988 0x0001810C 0x00000000
[63222.554313] ath10k_pci 0000:02:00.0: [24]: 0xFFFFFFDF 0x00080200 0x00407C40 0x00400000
[63222.554314] ath10k_pci 0000:02:00.0: [28]: 0x809FAF2D 0x0040E9E8 0x00000000 0x00000001
[63222.554316] ath10k_pci 0000:02:00.0: [32]: 0x00407E84 0x00429D60 0x00000003 0x00180207
[63222.554318] ath10k_pci 0000:02:00.0: [36]: 0x00000000 0x00000000 0x00000000 0x00000000
[63222.554319] ath10k_pci 0000:02:00.0: [40]: 0x00000000 0x00000000 0x00000000 0x00000000
[63222.554320] ath10k_pci 0000:02:00.0: [44]: 0x00000000 0x00000000 0x00000000 0x00000000
[63222.554322] ath10k_pci 0000:02:00.0: [48]: 0x00000000 0x00000000 0x00000000 0x00000000
[63222.554323] ath10k_pci 0000:02:00.0: [52]: 0x00000000 0x00000000 0x00000000 0x00000000
[63222.554325] ath10k_pci 0000:02:00.0: [56]: 0x00000000 0x00000000 0x00000000 0x00000000
[63222.554326] ath10k_pci 0000:02:00.0: Copy Engine register dump:
[63222.554343] ath10k_pci 0000:02:00.0: [00]: 0x00034400  11  11   3   3
[63222.554358] ath10k_pci 0000:02:00.0: [01]: 0x00034800  31  31  44  45
[63222.554373] ath10k_pci 0000:02:00.0: [02]: 0x00034c00  42  42 105 106
[63222.554387] ath10k_pci 0000:02:00.0: [03]: 0x00035000  18  18  20  18
[63222.554402] ath10k_pci 0000:02:00.0: [04]: 0x00035400 6273 6271 129  65
[63222.554417] ath10k_pci 0000:02:00.0: [05]: 0x00035800   0   0  64   0
[63222.554432] ath10k_pci 0000:02:00.0: [06]: 0x00035c00  27  27  16  16
[63222.554446] ath10k_pci 0000:02:00.0: [07]: 0x00036000   1   1   1   1
[63222.597663] ath10k_pci 0000:02:00.0: device has crashed during init
[63222.619693] ath10k_pci 0000:02:00.0: device has crashed during init
[63222.619696] ath10k_pci 0000:02:00.0: failed to wait for target init: -70
[63222.621385] ieee80211 phy0: Hardware restart was requested
[63222.891724] ath10k_pci 0000:02:00.0: device successfully recovered

Despite the last message, the device remained in a hosed state until a suspend/resume cycle.

Hardware is Qualcomm Atheros QCA9377 802.11ac Wireless Network Adapter (rev 31), ids 168c:0042.

This may be sporadic or a regression -- never saw this with kernel 5.6.15.

Version-Release number of selected component (if applicable):
kernel-5.6.16-300.fc32.x86_64
linux-firmware-20200519-108.fc32.noarch

Comment 1 Steeve McCauley 2020-07-30 12:35:41 UTC
I'm also getting this firmware crash on a Dell XPS 13, I believe it is normally on 5GHz connections.  Reconnecting usually fixes the problem but periodically I've had to reboot.

$ lspci -s 02:00.0 -k
02:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32)
	Subsystem: Bigfoot Networks, Inc. Killer 1435 Wireless-AC
	Kernel driver in use: ath10k_pci
	Kernel modules: ath10k_pci
$ cat /etc/redhat-release 
Fedora release 32 (Thirty Two)
$ rpm -qi linux-firmware
Name        : linux-firmware
Version     : 20200721
Release     : 110.fc32
Architecture: noarch
Install Date: Fri 24 Jul 2020 07:33:20 PM
Group       : Unspecified
Size        : 277141944
License     : GPL+ and GPLv2+ and MIT and Redistributable, no modification permitted
Signature   : RSA/SHA256, Wed 22 Jul 2020 07:36:14 AM, Key ID 6c13026d12c944d0
Source RPM  : linux-firmware-20200721-110.fc32.src.rpm
Build Date  : Wed 22 Jul 2020 03:49:04 AM
Build Host  : buildvm-a64-24.iad2.fedoraproject.org
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : http://www.kernel.org/
Bug URL     : https://bugz.fedoraproject.org/linux-firmware
Summary     : Firmware files used by the Linux kernel
Description :
This package includes firmware files required for some devices to
operate.

$ dmesg

[225714.270905] ath10k_pci 0000:02:00.0: firmware crashed! (guid 94ad16e6-c2cb-4f77-af6a-cdaa4ae23905)
[225714.270916] ath10k_pci 0000:02:00.0: qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1a56:143a
[225714.270921] ath10k_pci 0000:02:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 0 testmode 0
[225714.272630] ath10k_pci 0000:02:00.0: firmware ver WLAN.RM.4.4.1-00140-QCARMSWPZ-1 api 6 features wowlan,ignore-otp,mfp crc32 29eb8ca1
[225714.274006] ath10k_pci 0000:02:00.0: board_file api 2 bmi_id N/A crc32 4ac0889b
[225714.274012] ath10k_pci 0000:02:00.0: htt-ver 3.60 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
[225714.284267] ath10k_pci 0000:02:00.0: failed to get memcpy hi address for firmware address 4: -16
[225714.284270] ath10k_pci 0000:02:00.0: failed to read firmware dump area: -16
[225714.284273] ath10k_pci 0000:02:00.0: Copy Engine register dump:
[225714.284285] ath10k_pci 0000:02:00.0: [00]: 0x00034400  11  11   3   3
[225714.284296] ath10k_pci 0000:02:00.0: [01]: 0x00034800  18  18 351 352
[225714.284307] ath10k_pci 0000:02:00.0: [02]: 0x00034c00  52  52  50  51
[225714.284317] ath10k_pci 0000:02:00.0: [03]: 0x00035000  18  18  20  18
[225714.284328] ath10k_pci 0000:02:00.0: [04]: 0x00035400 173 151 140  76
[225714.284339] ath10k_pci 0000:02:00.0: [05]: 0x00035800   0   0  64   0
[225714.284349] ath10k_pci 0000:02:00.0: [06]: 0x00035c00  20  20  20  20
[225714.284360] ath10k_pci 0000:02:00.0: [07]: 0x00036000   1   1   1   0
[225714.291033] ath10k_pci 0000:02:00.0: failed to read hi_board_data address: -28
[225714.349112] ieee80211 phy0: Hardware restart was requested
[225714.645002] ath10k_pci 0000:02:00.0: device successfully recovered
[225837.363645] wlp2s0: deauthenticating from 64:cc:22:4f:eb:6b by local choice (Reason: 3=DEAUTH_LEAVING)
[225846.372714] wlp2s0: authenticate with 64:cc:22:4f:eb:6b
[225846.416778] wlp2s0: send auth to 64:cc:22:4f:eb:6b (try 1/3)
[225846.417500] wlp2s0: authenticated
[225846.420260] wlp2s0: associate with 64:cc:22:4f:eb:6b (try 1/3)
[225846.421587] wlp2s0: RX AssocResp from 64:cc:22:4f:eb:6b (capab=0x1011 status=0 aid=2)
[225846.426283] wlp2s0: associated
[225846.426735] ath: EEPROM regdomain: 0x807c
[225846.426737] ath: EEPROM indicates we should expect a country code
[225846.426738] ath: doing EEPROM country->regdmn map search
[225846.426740] ath: country maps to regdmn code: 0x3a
[225846.426743] ath: Country alpha2 being used: CA
[225846.426744] ath: Regpair used: 0x3a
[225846.426747] ath: regdomain 0x807c dynamically updated by country element
[225846.459261] wlp2s0: Limiting TX power to 30 (30 - 0) dBm as advertised by 64:cc:22:4f:eb:

Comment 2 James 2020-08-18 20:17:09 UTC
Note I've changed to an Intel card, but I'll keep this open for others who've experienced it.

Comment 3 Steeve McCauley 2020-08-18 20:25:49 UTC
I've since heard that the wireless card (killer) in the Dell XPS laptops is not great and that most recommend replacing with intel.  Unfortunately the laptop model (9370) I have the cared is soldered onto the motherboard.

Comment 4 Peter Robinson 2020-10-23 14:30:50 UTC
*** Bug 1844085 has been marked as a duplicate of this bug. ***

Comment 5 Éric Brunet 2021-02-09 19:39:29 UTC
I also have a Dell XPS 13, and I have similar problem with my wifi on my uptodate fedora 32.

I have seen the crash mentioned earlier, only on 5 Ghz networks. I tried to disable bluetooth, to disable power management on the card, but it didn't help. I also tried to
upgrade the firmware to a newest version from the git repo; my logs now read

> kernel: ath10k_pci 0000:02:00.0: firmware ver WLAN.RM.4.4.1-00241-QCARMSWPZ-1 api 6 features wowlan,ignore-otp,mfp crc32 2fb5425f

instead of

> kernel: ath10k_pci 0000:02:00.0: firmware ver WLAN.RM.4.4.1-00157-QCARMSWPZ-1 api 6 features wowlan,ignore-otp,mfp crc32 90eebefb

I haven't seen the crash again yet, but the situation is no better. Now, I have events like this:

[ 1877.468640] ath10k_pci 0000:02:00.0: timed out waiting peer stats info
[ 1880.476656] ath10k_pci 0000:02:00.0: wmi command 90113 timeout, restarting hardware
[ 1880.476675] ath10k_pci 0000:02:00.0: could not request stats (-11)
[ 1880.476885] ath10k_pci 0000:02:00.0: could not request peer stats info: -108
[ 1880.477981] ath10k_pci 0000:02:00.0: could not request stats (-108)
[ 1880.478230] ath10k_pci 0000:02:00.0: could not request stats (-108)
[ 1880.537794] ieee80211 phy0: Hardware restart was requested
[ 1880.816130] ath10k_pci 0000:02:00.0: device successfully recovered

It says the device has recovered, but the connexion is dead. In NetworkManager, the connexion is still up, and some bytes seem to be exchanged with the access point, but I cannot load a web page, for instance. I need to disconnect the connexion and reconnect it, and then things work... for a while.

The problem is not rare at all. Running a the speed test at https://www.speedtest.net/ triggers it each time, for instance. This means I can reproduce the bug at will.

The 2.4 GHz networks seem to work fine, but it goes so much faster (when it works) on 5 Ghz...

Is there anything I can do to help ?

Comment 6 Éric Brunet 2021-03-01 18:40:08 UTC
At home, the wifi is served by my ISP's box (The ISP is Bouygtel, and the box is called a bbox). The bbox can be configured to have two wifi hotspots, one on 5GHz and the other on 2.4GHz. On my XPS 13, the wifi works perfectly on the 2.4GHz network, and crash very quickly on the 5GHz network in its default configuration, as described above.

Today, I realized that the same computer was working flawlessly on the 5GHz wifi network of my workplace. So I investigated a bit, and found the following:

At home, I can choose the bandwidth for the wifi on the bbox. On the 2.4 GHz network, I can choose between 20MHz and 40MHz, and on the 5GHz network, I can choose between 20MHz, 40MHz and 80MHz. The default was 80MHz, and with that setting my connection crashes very fast on the 5GHz network. However, if I choose a bandwidth of 20MHe or 40MHz, it seems to work nicely.

So, to sum up: 

I see this bug on my computer only on 5GHz network with a bandwidth of 80MHz.

Comment 7 Fedora Program Management 2021-04-29 16:30:57 UTC
This message is a reminder that Fedora 32 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 32 on 2021-05-25.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '32'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 32 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 8 Chandradeep Dey 2021-05-14 16:15:48 UTC
For me changing the frequency did nothing. However, changing the beacon interval from 100ms to 50ms worked. I have not seen the crash for days now.

Comment 9 Chandradeep Dey 2021-05-14 16:44:05 UTC
Also, @James can you change the version to Fedora 34 please?

Comment 10 Éric Brunet 2021-06-05 08:38:08 UTC
(In reply to Chandradeep Dey from comment #8)
> For me changing the frequency did nothing. However, changing the beacon
> interval from 100ms to 50ms worked. I have not seen the crash for days now.

I have upgraded my box to a fedora 34, and told my router to use again de 80MHz bandwidth, and I got again the same crash. The router is now back to 40MHz again.

I am curious: how do you change the "beacon interval" ? Is it at the router level, at the computer level ? I'd like to try to see if it fixes my problem too.

Comment 11 Chandradeep Dey 2021-06-05 13:28:37 UTC
(In reply to Éric Brunet from comment #10)
> I am curious: how do you change the "beacon interval" ? Is it at the router
> level, at the computer level ? I'd like to try to see if it fixes my problem
> too.
It was a setting in my router's "system parameters" menu.

Comment 12 Nick te Lindert 2021-07-04 18:38:22 UTC
Same laptop xps 13 9380 also frequent firmware crashes. The problem is gone when you turn off powersave on in NetworkManager.conf

Comment 13 Nick te Lindert 2021-07-04 18:42:07 UTC
For your information i added this to NetworkManager/conf.d/wifi.conf

[connection]
# Values are 0 (use default), 1 (ignore/don't touch), 2 (disable) or 3 (enable).
wifi.powersave = 2

Comment 14 Éric Brunet 2021-07-04 18:52:25 UTC
Hi! Thanks for the suggestion.
I have an XPS 13 9380, and I was aware of that suggestion: I have had this setting in place since February, but it didn't suppress the crashes.

For me the crashes only happen for 5GHz networks, mostly for a 80MHz band, but I think I got some for a 40MHz band. (Not sure, though.) Sometimes, the firmware crash makes my whole computer freeze, and I need to reboot.

My workaround is to disable 5GHz networks, SSID by SSID (I don't know how to do it globally). For instance, for the eduroam ssid, I typed:

nmcli connection modify Goya 802-11-wireless.band "bg"

Éric

Comment 15 Nick te Lindert 2021-07-04 19:16:12 UTC
(In reply to Éric Brunet from comment #14)
> Hi! Thanks for the suggestion.
> I have an XPS 13 9380, and I was aware of that suggestion: I have had this
> setting in place since February, but it didn't suppress the crashes.
> 
> For me the crashes only happen for 5GHz networks, mostly for a 80MHz band,
> but I think I got some for a 40MHz band. (Not sure, though.) Sometimes, the
> firmware crash makes my whole computer freeze, and I need to reboot.
> 
> My workaround is to disable 5GHz networks, SSID by SSID (I don't know how to
> do it globally). For instance, for the eduroam ssid, I typed:
> 
> nmcli connection modify Goya 802-11-wireless.band "bg"
> 
> Éric

I am sorry to here that, although i am not really suprised. I think this model has a lot of problems for a linux compatible system(Sorry a little off-topic).

Comment 16 Chandradeep Dey 2021-07-05 21:39:00 UTC
(In reply to Nick te Lindert from comment #13)
> For your information i added this to NetworkManager/conf.d/wifi.conf
> 
> [connection]
> # Values are 0 (use default), 1 (ignore/don't touch), 2 (disable) or 3
> (enable).
> wifi.powersave = 2

I have tried this before and it didn't work. Weirdly enough, reducing the beacon interval is supposed to have a similar effect, letting the client sleep less, which did work for me.

Comment 17 Ben Bromley 2021-12-20 13:45:52 UTC
*** Bug 2034244 has been marked as a duplicate of this bug. ***

Comment 18 Ben Bromley 2021-12-20 13:50:28 UTC
Still having this issue on Fedora 35

Comment 19 falk 2022-02-17 09:43:31 UTC
Same problem with a Dell Vostro 5515:

03:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32)
	Subsystem: Dell Device 0310
	Kernel driver in use: ath10k_pci
	Kernel modules: ath10k_pci

journal:
ath10k_warn: 13 callbacks suppressed
Feb 17 10:39:22 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: timed out waiting peer stats info
Feb 17 10:39:27 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: wmi command 90113 timeout, restarting hardware
Feb 17 10:39:27 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: could not request stats (-11)
Feb 17 10:39:27 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: could not request peer stats info: -108
Feb 17 10:39:27 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: failed to read hi_board_data address: -16
Feb 17 10:39:30 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: failed to receive initialized event from target: 00000000
Feb 17 10:39:33 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: failed to receive initialized event from target: 00000000
Feb 17 10:39:33 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: failed to wait for target init: -110
Feb 17 10:39:33 fpx-laptop-gac kernel: ieee80211 phy0: Hardware restart was requested
Feb 17 10:39:33 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: could not request stats (-108)
Feb 17 10:39:33 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: device successfully recovered
Feb 17 10:39:40 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: wmi command 90124 timeout, restarting hardware
Feb 17 10:39:40 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: could not request peer stats info: -11
Feb 17 10:39:40 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: failed to read hi_board_data address: -16
Feb 17 10:39:43 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: failed to receive initialized event from target: 00000000
Feb 17 10:39:46 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: failed to receive initialized event from target: 00000000
Feb 17 10:39:46 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: failed to wait for target init: -110
Feb 17 10:39:46 fpx-laptop-gac kernel: ieee80211 phy0: Hardware restart was requested
Feb 17 10:39:46 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: could not request stats (-108)
Feb 17 10:39:46 fpx-laptop-gac kernel: ath10k_pci 0000:03:00.0: device successfully recovered

Is the Qualcomm Atheros QCA6174 a crappy device or is it lack of proper linux driver support?

Comment 20 Daimar Stein 2022-02-23 20:16:04 UTC
Same issue here, on a Dell Inspiron 7567 with Qualcomm Atheros QCA6174 802.11ac on Fedora 35.
Weirdly enough Fedora is the only distro I remember having this problem with, but maybe that's because I didn't have a 5GHz capable router before switching to it.
In my case the crashes are completely random as well, but nothing I do short of a restart makes it work again.

journal:
fev 23 16:53:40 casper-3 kernel: ath10k_pci 0000:03:00.0: firmware crashed! (guid 7d4dc840-9a65-4402-b0e8-26c79bd50cd0)
fev 23 16:53:40 casper-3 kernel: ath10k_pci 0000:03:00.0: qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1028:0310
fev 23 16:53:40 casper-3 kernel: ath10k_pci 0000:03:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 0 testmode 0
fev 23 16:53:40 casper-3 kernel: ath10k_pci 0000:03:00.0: firmware ver WLAN.RM.4.4.1-00157-QCARMSWPZ-1 api 6 features wowlan,ignore-otp,mfp crc32 90eebefb
fev 23 16:53:40 casper-3 kernel: ath10k_pci 0000:03:00.0: board_file api 2 bmi_id N/A crc32 706c395e
fev 23 16:53:40 casper-3 kernel: ath10k_pci 0000:03:00.0: htt-ver 3.60 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
fev 23 16:53:40 casper-3 kernel: ath10k_pci 0000:03:00.0: failed to read firmware dump area: -28
fev 23 16:53:40 casper-3 kernel: ath10k_pci 0000:03:00.0: Copy Engine register dump:
fev 23 16:53:41 casper-3 kernel: ath10k_pci 0000:03:00.0: [00]: 0x00034400 4294967295 4294967295 4294967295 4294967295
fev 23 16:53:41 casper-3 kernel: ath10k_pci 0000:03:00.0: [01]: 0x00034800 4294967295 4294967295 4294967295 4294967295
fev 23 16:53:41 casper-3 kernel: ath10k_pci 0000:03:00.0: [02]: 0x00034c00 4294967295 4294967295 4294967295 4294967295
fev 23 16:53:41 casper-3 kernel: ath10k_pci 0000:03:00.0: [03]: 0x00035000 4294967295 4294967295 4294967295 4294967295
fev 23 16:53:42 casper-3 kernel: ath10k_pci 0000:03:00.0: [04]: 0x00035400 4294967295 4294967295 4294967295 4294967295
fev 23 16:53:42 casper-3 kernel: ath10k_warn: 126 callbacks suppressed
fev 23 16:53:42 casper-3 kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x0003583c: -110
fev 23 16:53:42 casper-3 kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x0003a028: -110
fev 23 16:53:42 casper-3 kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x00035844: -110
fev 23 16:53:42 casper-3 kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x0003a028: -110
fev 23 16:53:42 casper-3 kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x00035840: -110
fev 23 16:53:42 casper-3 kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x0003a028: -110
fev 23 16:53:42 casper-3 kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x00035848: -110
fev 23 16:53:42 casper-3 kernel: ath10k_pci 0000:03:00.0: [05]: 0x00035800 4294967295 4294967295 4294967295 4294967295
fev 23 16:53:42 casper-3 kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x0003a028: -110
fev 23 16:53:42 casper-3 kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x00035c3c: -110
fev 23 16:53:42 casper-3 kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x0003a028: -110
fev 23 16:53:42 casper-3 kernel: ath10k_pci 0000:03:00.0: [06]: 0x00035c00 4294967295 4294967295 4294967295 4294967295
fev 23 16:53:43 casper-3 kernel: ath10k_pci 0000:03:00.0: [07]: 0x00036000 4294967295 4294967295 4294967295 4294967295
fev 23 16:53:44 casper-3 kernel: ath10k_pci 0000:03:00.0: failed to read device register, device is gone
fev 23 16:53:44 casper-3 kernel: ath10k_pci 0000:03:00.0: failed to reset chip: -5
fev 23 16:53:44 casper-3 kernel: ath10k_pci 0000:03:00.0: Could not init hif: -5

Comment 21 Mateus Rodrigues Costa 2022-05-14 12:35:54 UTC
Still present on Fedora 36, device is a Dell G5 5590.

Kernel version:

kernel-5.17.6-300.fc36.x86_64
kernel-core-5.17.6-300.fc36.x86_64
kernel-modules-5.17.6-300.fc36.x86_64
kernel-modules-extra-5.17.6-300.fc36.x86_64

Related logs:


```
mai 14 08:04:31 centauro kernel: xhci_hcd 0000:01:00.2: xHC error in resume, USBSTS 0x401, Reinit
mai 14 08:04:31 centauro kernel: usb usb3: root hub lost power or was reset
mai 14 08:04:31 centauro kernel: usb usb4: root hub lost power or was reset
mai 14 08:05:05 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003a028: -110
mai 14 08:05:05 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003a028: -110
mai 14 08:05:07 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x000001d5 at 0x0003543c: -110
mai 14 08:05:07 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x000001d7 at 0x0003543c: -110
mai 14 08:05:07 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x000001d9 at 0x0003543c: -110
mai 14 08:05:07 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x000001db at 0x0003543c: -110
mai 14 08:05:07 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x000001dd at 0x0003543c: -110
mai 14 08:05:07 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x000001df at 0x0003543c: -110
mai 14 08:05:08 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x000001e1 at 0x0003543c: -110
mai 14 08:05:08 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x000001e3 at 0x0003543c: -110
mai 14 08:05:11 centauro kernel: ath10k_warn: 11 callbacks suppressed
mai 14 08:05:11 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x000001f7 at 0x0003543c: -110
mai 14 08:05:12 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x000001f9 at 0x0003543c: -110
mai 14 08:05:12 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x000001fb at 0x0003543c: -110
mai 14 08:05:13 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x000001fd at 0x0003543c: -110
mai 14 08:05:13 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x000001ff at 0x0003543c: -110
mai 14 08:05:13 centauro kernel: ath10k_pci 0000:3d:00.0: timed out waiting peer stats info
mai 14 08:05:14 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x00000201 at 0x0003543c: -110
mai 14 08:05:14 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x00000203 at 0x0003543c: -110
mai 14 08:05:15 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x00000205 at 0x0003543c: -110
mai 14 08:05:16 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x00000207 at 0x0003543c: -110
mai 14 08:05:16 centauro kernel: ath10k_warn: 10 callbacks suppressed
mai 14 08:05:16 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x0000021d at 0x0003543c: -110
mai 14 08:05:17 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x0000021f at 0x0003543c: -110
mai 14 08:05:17 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x00000221 at 0x0003543c: -110
mai 14 08:05:17 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x00000223 at 0x0003543c: -110
mai 14 08:05:17 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x00000225 at 0x0003543c: -110
mai 14 08:05:17 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x00000227 at 0x0003543c: -110
mai 14 08:05:17 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x00000229 at 0x0003543c: -110
mai 14 08:05:17 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x0000022b at 0x0003543c: -110
mai 14 08:05:17 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x0000022d at 0x0003543c: -110
mai 14 08:05:18 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x0000022f at 0x0003543c: -110
mai 14 08:05:21 centauro kernel: ath10k_warn: 89 callbacks suppressed
mai 14 08:05:21 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00036044: -110
mai 14 08:05:21 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00036044: -110
mai 14 08:05:22 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00036044: -110
mai 14 08:05:22 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00036044: -110
mai 14 08:05:22 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00036044: -110
mai 14 08:05:22 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00036044: -110
mai 14 08:05:22 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00036044: -110
mai 14 08:05:22 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00036044: -110
mai 14 08:05:22 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00036044: -110
mai 14 08:05:22 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00036044: -110
mai 14 08:05:27 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003482c: -110
mai 14 08:05:27 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0xffffffe1 at 0x0003482c: -110
mai 14 08:05:27 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00034c2c: -110
mai 14 08:05:27 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0xfffffffe at 0x00034c2c: -110
mai 14 08:05:27 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00034c34: -110
mai 14 08:05:27 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0xfffff81f at 0x00034c34: -110
mai 14 08:05:27 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00034c2c: -110
mai 14 08:05:27 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0xffffffe1 at 0x00034c2c: -110
mai 14 08:05:27 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003502c: -110
mai 14 08:05:27 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0xfffffffe at 0x0003502c: -110
mai 14 08:05:29 centauro rpm-ostree[12057]: In idle state; will auto-exit in 64 seconds
mai 14 08:05:30 centauro systemd[1]: rpm-ostreed.service: Deactivated successfully.
mai 14 08:05:30 centauro audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rpm-ostreed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
mai 14 08:05:30 centauro systemd[1]: rpm-ostreed.service: Consumed 9min 8.919s CPU time.
mai 14 08:05:32 centauro kernel: ath10k_warn: 121 callbacks suppressed
mai 14 08:05:32 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00035010: -110
mai 14 08:05:32 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0xfffeffff at 0x00035010: -110
mai 14 08:05:32 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003504c: -110
mai 14 08:05:32 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x0000ffff at 0x0003504c: -110
mai 14 08:05:32 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003504c: -110
mai 14 08:05:32 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0xffff0020 at 0x0003504c: -110
mai 14 08:05:32 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00035444: -110
mai 14 08:05:32 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003543c: -110
mai 14 08:05:32 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0xfffc0000 at 0x00035400: -110
mai 14 08:05:32 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x00002000 at 0x00035404: -110
mai 14 08:05:36 centauro kernel: ath10k_pci 0000:3d:00.0: failed to read device register, device is gone
mai 14 08:05:37 centauro kernel: ath10k_warn: 121 callbacks suppressed
mai 14 08:05:37 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0xffffffff at 0x00000800: -110
mai 14 08:05:37 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00034444: -110
mai 14 08:05:37 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003443c: -110
mai 14 08:05:37 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0xfffff000 at 0x00034400: -110
mai 14 08:05:37 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0x00000010 at 0x00034404: -110
mai 14 08:05:37 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00034410: -110
mai 14 08:05:37 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0xffff0100 at 0x00034410: -110
mai 14 08:05:37 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x00034410: -110
mai 14 08:05:37 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for write32 of 0xfffeffff at 0x00034410: -110
mai 14 08:05:37 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003444c: -110
mai 14 08:05:42 centauro kernel: ath10k_warn: 121 callbacks suppressed
mai 14 08:05:42 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003a028: -110
mai 14 08:05:42 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003a028: -110
mai 14 08:05:42 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003a028: -110
mai 14 08:05:42 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003a028: -110
mai 14 08:05:42 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003a028: -110
mai 14 08:05:42 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003a028: -110
mai 14 08:05:42 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003a028: -110
mai 14 08:05:42 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003a028: -110
mai 14 08:05:42 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003a028: -110
mai 14 08:05:42 centauro kernel: ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003a028: -110
mai 14 08:05:43 centauro kernel: ath10k_pci 0000:3d:00.0: failed to read device register, device is gone
mai 14 08:05:43 centauro kernel: ieee80211 phy0: Hardware restart was requested
mai 14 08:05:46 centauro kernel: ath10k_pci 0000:3d:00.0: failed to read device register, device is gone
mai 14 08:05:46 centauro kernel: ath10k_pci 0000:3d:00.0: failed to reset chip: -5
mai 14 08:05:46 centauro kernel: ath10k_pci 0000:3d:00.0: Could not init hif: -5
mai 14 08:05:46 centauro kernel: ------------[ cut here ]------------
mai 14 08:05:46 centauro kernel: Hardware became unavailable during restart.
mai 14 08:05:46 centauro kernel: ath10k_pci 0000:3d:00.0: firmware crashed! (guid 76ce3956-909f-4f14-ae3a-f18ca42d21bd)
mai 14 08:05:46 centauro kernel: ath10k_pci 0000:3d:00.0: qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1028:0310
mai 14 08:05:46 centauro kernel: ath10k_pci 0000:3d:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 0 testmode 0
```

I believe part of it is the firmware failed recover correctly from a power reset this specific time, and part of it is a bug in whatever causes the usb devices power to be reset randomly on my system.
In this particular case I have not resumed from suspend (sometimes it also breaks in resuming) and I have had several times other times where it crashed during normal use randomly.

In fact, I see that there are many times that the usb hub lines appear in several points on that boot logs, it just so happens that on that specific one the ath10k module will have its firmware crash:

Earlier on same boot:


```
mai 14 07:25:11 centauro kernel: xhci_hcd 0000:01:00.2: xHC error in resume, USBSTS 0x401, Reinit
mai 14 07:25:11 centauro kernel: usb usb3: root hub lost power or was reset
mai 14 07:25:11 centauro kernel: usb usb4: root hub lost power or was reset
mai 14 07:26:26 centauro kernel: xhci_hcd 0000:01:00.2: xHC error in resume, USBSTS 0x401, Reinit
mai 14 07:26:26 centauro kernel: usb usb3: root hub lost power or was reset
mai 14 07:26:26 centauro kernel: usb usb4: root hub lost power or was reset

```

The ath10k device:

``` 
[mateusrc@centauro ~]$ sudo lspci -s 3d:00.0 -v
3d:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32)
	Subsystem: Dell Device 0310
	Flags: bus master, fast devsel, latency 0, IRQ 168, IOMMU group 21
	Memory at ed200000 (64-bit, non-prefetchable) [size=2M]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable+ Count=1/8 Maskable+ 64bit-
	Capabilities: [70] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [148] Virtual Channel
	Capabilities: [168] Device Serial Number <snip>
	Capabilities: [178] Latency Tolerance Reporting
	Capabilities: [180] L1 PM Substates
	Kernel driver in use: ath10k_pci
	Kernel modules: ath10k_pci
```

It's present on lsusb, so that's why it's reset:


```
[mateusrc@centauro ~]$ lsusb -d 0cf3:e007
Bus 001 Device 005: ID 0cf3:e007 Qualcomm Atheros Communications 

```

The problem seems to have worsened recently (in the sense that it became more frequent, went from happening once each few days to basically once or twice each day), quite around the same time I enabled `intel_iommu=on` on this machine, the problem already happened before that though (even as far as the first Fedora version I ran on this machine, probably 32 or 33, even with IOMMU on them it wasn't as bad as is now) so I think I will try to disable iommu and test to rule it out later anyway, but there was also a recent linux-firmware update which could be the to blame too:

linux-firmware-20220411-131.fc36.noarch
linux-firmware-whence-20220411-131.fc36.noarch

(Do note that I was running an rpm-ostree transaction during the crash, maybe it forced so much the wifi card card that it decided to reset itself? The transaction managed to complete anyway, so I guess it had already done everything that needed internet access before the connection was lost)

I will investigate the usb resets in the mean time but, since it is most likely caused by some power saving feature, the solution that should most likely work for me is what is on comment 12 and comment 13.

Comment 22 Peter Robinson 2022-05-14 13:49:02 UTC
> It's present on lsusb, so that's why it's reset:
>
> [mateusrc@centauro ~]$ lsusb -d 0cf3:e007
> Bus 001 Device 005: ID 0cf3:e007 Qualcomm Atheros Communications 

I suspect that's the bluetooth, not sure what you mean by "that's why it's reset"

> twice each day), quite around the same time I enabled `intel_iommu=on` on

Please investigate whether it was the iommu, I don't have either the Dell device nor the ath wifi module to recreate the problem.

> recent linux-firmware update which could be the to blame too:
> 
> linux-firmware-20220411-131.fc36.noarch
> linux-firmware-whence-20220411-131.fc36.noarch

That release had no updates what so ever related to Atheros wireless, neither WiFi nor bluetooth so that's not to blame, if there were other updates that made it better or worse please try and work out what, if a specific kernel made it go from weekly to daily.

That said there was a large update of ath10k/ath11k firmwares in the may release which may (or may not) help). That was submitted as an update a few hours ago.

Comment 23 Mateus Rodrigues Costa 2022-05-14 16:05:59 UTC
(In reply to Peter Robinson from comment #22)
> > It's present on lsusb, so that's why it's reset:
> >
> > [mateusrc@centauro ~]$ lsusb -d 0cf3:e007
> > Bus 001 Device 005: ID 0cf3:e007 Qualcomm Atheros Communications 
> 
> I suspect that's the bluetooth, not sure what you mean by "that's why it's
> reset"

Weird, I had believed that BT and Wifi were the same card, the driver download page on Dell website says it's the same driver at least and I vaguely remember the same info from Windows' driver manager too. Unless maybe only the BT part is separately shown for some reason? But, afaik, it's the same device.

By "that's why it's reset", I meant in the sense that due to these:

```
mai 14 08:04:31 centauro kernel: usb usb3: root hub lost power or was reset
mai 14 08:04:31 centauro kernel: usb usb4: root hub lost power or was reset
```

If I am hitting a different root cause in which some bug causes all my usb devices to randomly restart, the Wifi card being one could be related. (Sorry, I don't know enough about hardware, especially laptop ones, to know if I should assume that every Wifi card will appear there)


> > twice each day), quite around the same time I enabled `intel_iommu=on` on
> 
> Please investigate whether it was the iommu, I don't have either the Dell
> device nor the ath wifi module to recreate the problem.
> 
> > recent linux-firmware update which could be the to blame too:
> > 
> > linux-firmware-20220411-131.fc36.noarch
> > linux-firmware-whence-20220411-131.fc36.noarch
> 
> That release had no updates what so ever related to Atheros wireless,
> neither WiFi nor bluetooth so that's not to blame, if there were other
> updates that made it better or worse please try and work out what, if a
> specific kernel made it go from weekly to daily.
> 
> That said there was a large update of ath10k/ath11k firmwares in the may
> release which may (or may not) help). That was submitted as an update a few
> hours ago.

The bug where my wifi card randomly dies has been present for at least as long as I used Fedora on this machine, it even happened on my previous install with iommu enabled without crashing so often (that release went as far as Fedora 35 for a while), it's just with this current install which has been Fedora 35 -> Fedora 36 Beta -> Fedora 36 Stable, initially without iommu, and enabling iommu 2 or 3 days ago that it really got more frequent.
I will also check if there have been any linux-firmware updates in between when I was running Beta and now but, yeah, iommu could be to blame. I will also test for a while with powersave disabled via NetworkManager while iommu is enabled to be 100% if it's a power saving bug.

---

Actually, as a fun fact, I am aware of a bug in certain Dell models (apparently it happens with other OEMs too) on Windows caused by the Bluetooth card (seems like Nvidia cards were also hit by it at some point) where a "WHEA-Logger" error with an id of 17 with message "A corrected hardware error has occurred." would be spammed on the Event Logs, no functionality is affected there though and the system seems to be alright. I know that because my old Dell laptop, which I had given away to my mother, hits it on Windows 10, which I tried to figure out how to fix.
Seems the fix (although it feels more like an workaround) to that bug is actually to go to Power Settings on Windows 10, enter the power plan settings, edit the advanced power plan settings and choose PCI Express -> Link State Power Management and set to Off on both Battery and Plugged In. Could be some bad interaction of chipset + Wireless card when powersaving is on. I have used Windows 10 on this machine only shortly, so never looked at the logs to check if is affected.

Could turn out I'm hitting the same on Linux, which would cause the wireless card to restart (so, either broken hardware or broken drivers for ath10k or broken power saving stuff). The 100% solution for that issue people found on Windows (since it seems to be very specifically related to BT), is to keep BT always off, although afaik this off seems to be the "hard" off (disabling in BIOS). `wifi.powersave` would most likely be the equivalent fix to disabling "Link State Power Management" by disabling power save only to that card, assuming it's the same bug.

If it's the same bug then the difference likely would be that on Windows the Wifi doesn't crash but here it does, the crash seems to be firmware but could be also the driver.

--

Also, looking here again, there are two issues.

First, the one both me and Daimar are having, which is likely related to the WHEA-Logger Windows one. For reference my previous laptop that I mentioned above was an Inspiron 15 7572, since theirs is a Dell Inspiron 7567, both laptops might be related and are very likely to be similarly affected.

From both our logs there are those in common:

ath10k_pci 0000:3d:00.0: failed to wake target for read32 at 0x0003a028: -110 (several similar to those, along with some for write32 on mine)
ath10k_pci 0000:3d:00.0: failed to read device register, device is gone
ath10k_pci 0000:3d:00.0: failed to reset chip: -5
ath10k_pci 0000:3d:00.0: Could not init hif: -5

This points to the idea that the device is not available anymore (probably either lost energy or at least the kernel can't use it anymore), and so of course the firmware crashed because it can't use it. This one is most likely hardware at fault.

And the other issue, is where the card crashes for some reason, a restart is requested and it recovers, which will have those:

[63222.621385] ieee80211 phy0: Hardware restart was requested
[63222.891724] ath10k_pci 0000:02:00.0: device successfully recovered

But, it might or might not successfully reload firmware and connect. For retrying loading firmware, it might not try again, it might try again and fail or it might try again and succeed, this latter one is most likely what happened with Steeve, so that's why he said he would only need to reboot sometimes. This one is most likely either firmware or driver at fault.

So, it seems like it's two bugs which show the same symptom and similar logs... Not sure how to go from there, any ideas?

Comment 24 Éric Brunet 2022-05-14 21:31:43 UTC
(In reply to Peter Robinson from comment #22)

> That said there was a large update of ath10k/ath11k firmwares in the may
> release which may (or may not) help). That was submitted as an update a few
> hours ago.

Interesting. I downloaded and installed linux-firmware-20220509-132.fc35.noarch.rpm from bodhi on my uptodate Fedora35, so that I now have in the logs

ath10k_pci 0000:02:00.0: firmware ver WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp crc32 bf907c7c

My Dell XPS 13 (9380) still crashes on 5GHz networks. However it now recovers ina couple of seconds and the system is again usable after a small delay.
(Previously, a crash of ath10k made my system very unstable, I had to reboot.)

Here's an example of what I see in the logs when it crashes:


[  694.450198] ath10k_pci 0000:02:00.0: timed out waiting peer stats info
[  699.442355] ath10k_pci 0000:02:00.0: wmi command 90113 timeout, restarting hardware
[  699.442380] ath10k_pci 0000:02:00.0: could not request stats (-11)
[  699.448155] ath10k_pci 0000:02:00.0: could not request peer stats info: -108
[  699.509700] ieee80211 phy0: Hardware restart was requested
[  699.852430] ath10k_pci 0000:02:00.0: device successfully recovered


So, a little bit better, but still not working...

Comment 25 Chandradeep Dey 2022-05-15 04:23:49 UTC
(In reply to Éric Brunet from comment #24)
> My Dell XPS 13 (9380) still crashes on 5GHz networks. However it now recovers ina couple of seconds and the system is again usable after a small delay.

> So, a little bit better, but still not working...

Btw this is only on your system. I think most of us had the "device successfully recovered" bit even before this.

Comment 26 Éric Brunet 2022-05-15 07:53:56 UTC
Oh, I used to have the message "device successfully recovered", but it was usualy not true: sometimes the wifi would not recover, or a completely different problem would appear (problem with the mouse, for instance). Everything hinted at a memory corruption. I had to reboot to have my computer usable again.

By the way, from the github repository for my card's firmware, https://github.com/kvalo/ath10k-firmware/tree/master/QCA6174/hw3.0/4.4.1 , I see

firmware-6.bin_WLAN.RM.4.4.1-00157-QCARMSWPZ-1   2 years ago
firmware-6.bin_WLAN.RM.4.4.1-00241-QCARMSWPZ-1   15 months ago
firmware-6.bin_WLAN.RM.4.4.1-00279-QCARMSWPZ-1   12 months ago
firmware-6.bin_WLAN.RM.4.4.1-00282-QCARMSWPZ-1   11 months ago
firmware-6.bin_WLAN.RM.4.4.1-00288-QCARMSWPZ-1   8 months ago

fedora is now in the process of upgrading from firmware 157 to firmware 288. Why is there such a delay? Why are so many revisions skiped over?

Comment 27 Mateus Rodrigues Costa 2022-05-15 12:41:16 UTC
Just an update on my version of this bug.

Disabling powersave for the wireless card seems to have basically made it go away. I think udev ships a bunch of rules for quirks of devices, maybe it would make sense to request upstream to add one so my specific version of the card on a Dell laptop (probably also all the other Atheros cards affected by the WHEA-Logger id 17 error on Windows too) has power save by default?

Still, now I only have a few more related logs:

```
mai 14 21:11:48 centauro kernel: ath10k_pci 0000:3d:00.0: wmi service ready event not received
mai 14 21:11:48 centauro kernel: ath10k_pci 0000:3d:00.0: Could not init core: -110
mai 14 21:11:53 centauro kernel: ath10k_pci 0000:3d:00.0: wmi service ready event not received
mai 14 21:11:53 centauro kernel: ath10k_pci 0000:3d:00.0: Could not init core: -110
mai 14 21:12:25 centauro kernel: rfkill: input handler enabled
mai 14 21:12:26 centauro kernel: xhci_hcd 0000:01:00.2: xHC error in resume, USBSTS 0x401, Reinit
mai 14 21:12:26 centauro kernel: usb usb3: root hub lost power or was reset
mai 14 21:12:26 centauro kernel: usb usb4: root hub lost power or was reset
mai 14 21:12:30 centauro kernel: rfkill: input handler disabled
mai 14 21:12:38 centauro kernel: ath10k_pci 0000:3d:00.0: wmi service ready event not received
mai 14 21:12:38 centauro kernel: ath10k_pci 0000:3d:00.0: Could not init core: -110
mai 14 21:14:54 centauro kernel: xhci_hcd 0000:01:00.2: xHC error in resume, USBSTS 0x401, Reinit
mai 14 21:14:54 centauro kernel: usb usb3: root hub lost power or was reset
mai 14 21:14:54 centauro kernel: usb usb4: root hub lost power or was reset
mai 14 21:17:06 centauro kernel: xhci_hcd 0000:01:00.2: xHC error in resume, USBSTS 0x401, Reinit
mai 14 21:17:06 centauro kernel: usb usb3: root hub lost power or was reset
mai 14 21:17:06 centauro kernel: usb usb4: root hub lost power or was reset
```

These about wmi and "could not init core" seem to be new, but I think they don't really break my Wifi connection, they most likely just slow the start of the driver or even force a reconnection. At least I didn't notice anything. If anything, these might be improved by the firmware updates mentioned above.

There is also the DMAR errors, likely due to IOMMU:

```
mai 14 20:55:33 centauro kernel: DMAR: DRHD: handling fault status reg 2
mai 14 20:55:33 centauro kernel: DMAR: [DMA Write NO_PASID] Request device [3d:00.0] fault addr 0xff9eb000 [fault reason 0x05] PTE Write access is not set
```

Although those seem to be rare and seem to not crash Wifi either.

---

So, the two bugs here, as far as I understand:

1) wireless card dies for some reason (likely lost power due to power save) -> wireless card device is completely gone -> firmware crashes because it can't find device -> wifi is gone, restart is required to get device back
2) firmware crashes for some reason -> wireless card crashes as well due to firmware crash -> a restart of the wireless card device is requested -> the wireless card device successfully recovers -> unknown what kernel or firmware does after, but usually you don't get wifi back

For 1, the powersave solution on NetworkManager (or probably as an udev rule) should work because this will prevent the card from dying and disappearing, likely even updated firmware wouldn't solve since the device would be gone.
For 2, maybe the firmware is the only solution, as it could prevent the actual crash currently present and it could also improve how it brings up the card again after the crash. So that might explain with Éric had a better time after installing the new firmware updates.

Comment 28 Mateus Rodrigues Costa 2022-05-15 12:43:17 UTC
(In reply to Mateus Rodrigues Costa from comment #27)
> the card on a Dell laptop (probably also all the other Atheros cards
> affected by the WHEA-Logger id 17 error on Windows too) has power save by
> default?

Sorry, small typo: I meant "has power saving *disabled* by default".

Comment 29 Ben Cotton 2022-11-29 16:48:52 UTC
This message is a reminder that Fedora Linux 35 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 35 on 2022-12-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '35'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 35 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 30 Ben Cotton 2022-12-13 15:15:17 UTC
Fedora Linux 35 entered end-of-life (EOL) status on 2022-12-13.

Fedora Linux 35 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 31 Red Hat Bugzilla 2023-09-18 00:21:20 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.