Bug 1897900 - Qualcomm QCA9377 firmware crash
Summary: Qualcomm QCA9377 firmware crash
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: linux-firmware
Version: 35
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: David Woodhouse
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-15 12:45 UTC by David
Modified: 2022-12-13 15:16 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-12-13 15:16:14 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg output right after crash (150.63 KB, application/octet-stream)
2020-11-15 12:45 UTC, David
no flags Details
script for situation 2 (257 bytes, text/plain)
2020-11-15 12:46 UTC, David
no flags Details
dmesg output after boot (109.95 KB, application/octet-stream)
2020-11-17 10:24 UTC, David
no flags Details

Description David 2020-11-15 12:45:52 UTC
Created attachment 1729501 [details]
dmesg output right after crash

Description of problem:
My Dell's WiFi card often fails on resume from standby. Card is a WiFi AC chip from Qualcomm, namely QCA9377.
There are two main types of failure, one of which is recoverable via reinsertion of the WiFi module and PCI bus rescan. I've written a simple script to do so, which I'm going to attach after this post.

Version-Release number of selected component (if applicable):
Firmware is version 6. I've tried renaming it to stick with version 5, but issue is never completely gone, just a little better. This I tried back when using Fedora 30, gave up on 32 and just got on with this issue. 

How reproducible:
With a frequency of 1 failure out of 6-7 total (nowadays, fedora 33), simply standby the laptop.

Steps to Reproduce:
1. Standby the laptop (Dell Inspiron 15 5570)
2. Resume from standby
3.

Actual results:
WiFi Connection is either:
1) resumed, with noticeable delay. I call this the base case.
2) not resumed, indefinitely trying to connect and periodically notifying authentication failure. This is the situation my script can solve.
3) complete firmware crash. Laptop hangs, everything stutters and I can no longer see WiFi networks. The only valid solution I found is to reboot the PC. In the past, it used to be so bad it completely froze Fedora and had to manually reset the lap. That's when I'd decided to revert to firmware 5.

Expected results:
Other QCA chipsets all perform as expected, resuming connection and never crashing.

Additional info:
WiFi card, hardware-wise, is perfectly functional and works ok under Windows.

Comment 1 David 2020-11-15 12:46:55 UTC
Created attachment 1729502 [details]
script for situation 2

Comment 2 Peter Robinson 2020-11-15 15:11:24 UTC
What version of kernel are you running (rpm -q kernel) and linux-firmware (rpm -q linux-firmware) and provide the exact revision of the HW (lspci | grep -i wireless)

Comment 3 David 2020-11-15 16:17:32 UTC
(In reply to Peter Robinson from comment #2)
> What version of kernel are you running (rpm -q kernel) and linux-firmware
> (rpm -q linux-firmware) and provide the exact revision of the HW (lspci |
> grep -i wireless)

[dave997@localhost ~]$ rpm -q kernel
kernel-5.8.18-200.fc32.x86_64
kernel-5.8.16-200.fc32.x86_64
kernel-5.8.17-200.fc32.x86_64
kernel-5.8.18-300.fc33.x86_64

[dave997@localhost ~]$ rpm -q linux-firmware
linux-firmware-20201022-113.fc33.noarch

[dave997@localhost ~]$ lspci | grep -i wireless
03:00.0 Network controller: Qualcomm Atheros QCA9377 802.11ac Wireless Network Adapter (rev 31)

Here you are.

Comment 4 Peter Robinson 2020-11-15 17:05:59 UTC
Is this a regression or has it always been like this? The last time this firmware was updated by the vendor was Oct 2018.

Comment 5 David 2020-11-15 18:38:20 UTC
(In reply to Peter Robinson from comment #4)
> Is this a regression or has it always been like this? The last time this
> firmware was updated by the vendor was Oct 2018.

I can tell you I first noticed this issue in mid 2018. It can be easily it was October in reality and I don't remember it exactly. This would make sense given what you said and the fact revision 5 was far more stable.
What I can tell you for sure is I've never had peace since then. Some kernel versions mitigate the problem, some others worsen it but none ever got totally rid of it so far (5.8.18 has worsened it). I really hope this is going to end soon since even the only fact of force powering off my PC isn't good at all.

If you think I should have Qualcomm informed about this I will, but frankly I do rely on your work far more than I rely on theirs (I also think I've already got in touch a while ago, no success). I know the "sorry, we don't support Linux" policy and don't like it: I've paid money for my hardware and don't like people politely obliging me to use Windows because they need me to.

I'm at your disposal Robert, tests, reports, whatever.

Comment 6 David 2020-11-17 10:24:10 UTC
Created attachment 1730085 [details]
dmesg output after boot

This morning, right after kernel update (5.9.8) I was greeted with no wireless card. Had to reboot, Dmesg for you.

Comment 7 Peter Robinson 2020-11-17 10:55:26 UTC
> This morning, right after kernel update (5.9.8) I was greeted with no
> wireless card. Had to reboot, Dmesg for you.

That sounds more like a kernel driver regression in the ath10k driver.

Comment 8 David 2020-11-17 11:53:31 UTC
(In reply to Peter Robinson from comment #7)
> > This morning, right after kernel update (5.9.8) I was greeted with no
> > wireless card. Had to reboot, Dmesg for you.
> 
> That sounds more like a kernel driver regression in the ath10k driver.

Yes, it may be as well.

Comment 9 David 2021-02-01 11:59:42 UTC
Hi guys, I'm sorry to bother you again.
Issue is far from being gone, firmware (or whatever) now started crashing while being used. This can have very notable consequences as you can understand.
Below you have a stack trace from the last crash. System is up to date.

feb 01 12:49:20 localhost.localdomain kernel: cfg80211: Loading compiled-in X.509 certificates for regulatory database
feb 01 12:49:20 localhost.localdomain kernel: cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
feb 01 12:49:20 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for write32 of 0x00000000 at 0x00034400: -110
feb 01 12:49:20 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for write32 of 0x00000000 at 0x00034404: -110
feb 01 12:49:20 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x00034410: -110
feb 01 12:49:20 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for write32 of 0xffff0000 at 0x00034410: -110
feb 01 12:49:20 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x0003444c: -110
feb 01 12:49:20 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for write32 of 0xffff0000 at 0x0003444c: -110
feb 01 12:49:20 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for write32 of 0x00000000 at 0x00034408: -110
feb 01 12:49:20 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for write32 of 0x00000000 at 0x0003440c: -110
feb 01 12:49:20 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x00034450: -110
feb 01 12:49:20 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for write32 of 0xffff0000 at 0x00034450: -110
feb 01 12:49:24 localhost.localdomain kernel: ------------[ cut here ]------------
feb 01 12:49:24 localhost.localdomain kernel: WARNING: CPU: 0 PID: 34912 at drivers/pci/msi.c:1075 __pci_enable_msi_range+0x489/0x4d0
feb 01 12:49:24 localhost.localdomain kernel: Modules linked in: ath10k_pci(+) ath10k_core mac80211 ath cfg80211 libarc4 uinput nfnetlink_queue nfnetlink_log ib_core snd_seq_dummy rfcomm ccm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter cmac bnep sunrpc vfat fat snd_hda_codec_hdmi snd_soc_skl snd_hda_codec_realtek snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_codec_generic snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation snd_soc_core x86_pkg_temp_thermal snd_compress intel_powerclamp snd_pcm_dmaengine soundwire_cadence coretemp snd_hda_codec kvm_intel snd_hda_core ac97_bus kvm snd_hwdep ee1004 snd_seq iTCO_wdt intel_pmc_bxt
feb 01 12:49:24 localhost.localdomain kernel:  dell_laptop ledtrig_audio iTCO_vendor_support intel_rapl_msr mei_hdcp pktcdvd snd_seq_device irqbypass dell_smm_hwmon snd_pcm rapl uvcvideo intel_cstate btusb snd_timer btrtl intel_uncore btbcm dell_wmi btintel snd videobuf2_vmalloc pcspkr rtsx_usb_ms dell_smbios videobuf2_memops dcdbas bluetooth memstick videobuf2_v4l2 wmi_bmof dell_wmi_descriptor videobuf2_common i2c_i801 soundcore i2c_smbus videodev mei_me ecdh_generic rfkill mc joydev ecc mei processor_thermal_device ucsi_acpi idma64 intel_xhci_usb_role_switch typec_ucsi intel_rapl_common intel_pch_thermal roles intel_soc_dts_iosf typec int3403_thermal int3402_thermal acpi_pad int340x_thermal_zone intel_hid int3400_thermal acpi_thermal_rel sparse_keymap zram ip_tables amdgpu i915 hid_multitouch iommu_v2 rtsx_usb_sdmmc mmc_core gpu_sched ttm i2c_algo_bit crct10dif_pclmul crc32_pclmul crc32c_intel r8169 drm_kms_helper ghash_clmulni_intel cec serio_raw drm rtsx_usb i2c_hid video pinctrl_sunrisepoint wmi fuse
feb 01 12:49:24 localhost.localdomain kernel:  [last unloaded: cfg80211]
feb 01 12:49:24 localhost.localdomain kernel: CPU: 0 PID: 34912 Comm: modprobe Tainted: G        W         5.10.10-200.fc33.x86_64 #1
feb 01 12:49:24 localhost.localdomain kernel: Hardware name: Dell Inc. Inspiron 5570/0YDF7T, BIOS 1.2.3 05/15/2019
feb 01 12:49:24 localhost.localdomain kernel: RIP: 0010:__pci_enable_msi_range+0x489/0x4d0
feb 01 12:49:24 localhost.localdomain kernel: Code: 0f b6 f6 48 89 ef e8 f6 65 fd ff e9 b9 fd ff ff 31 f6 48 89 ef e8 07 da fd ff e9 ce fe ff ff 41 bc ea ff ff ff e9 e7 fb ff ff <0f> 0b 41 bc ea ff ff ff e9 da fb ff ff 45 89 cc e9 d2 fb ff ff 48
feb 01 12:49:24 localhost.localdomain kernel: RSP: 0018:ffffb70540e73bb0 EFLAGS: 00010202
feb 01 12:49:24 localhost.localdomain kernel: RAX: 0000000000000010 RBX: 0000000000000001 RCX: 0000000000000000
feb 01 12:49:24 localhost.localdomain kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff96d8c17d6000
feb 01 12:49:24 localhost.localdomain kernel: RBP: ffff96d8c17d6000 R08: 0000000000000001 R09: ffff96d8c5e084b8
feb 01 12:49:24 localhost.localdomain kernel: R10: ffffb70540e738d0 R11: ffffffffbcb44748 R12: 0000000000000008
feb 01 12:49:24 localhost.localdomain kernel: R13: ffff96d8c17d6000 R14: 0000000000000000 R15: ffff96d8c17d6000
feb 01 12:49:24 localhost.localdomain kernel: FS:  00007fc739e2f740(0000) GS:ffff96dc2f400000(0000) knlGS:0000000000000000
feb 01 12:49:24 localhost.localdomain kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
feb 01 12:49:24 localhost.localdomain kernel: CR2: 00007f608393cf53 CR3: 000000012bbd6002 CR4: 00000000003706f0
feb 01 12:49:24 localhost.localdomain kernel: Call Trace:
feb 01 12:49:24 localhost.localdomain kernel:  ? ath10k_pci_enable_legacy_irq+0x60/0x60 [ath10k_pci]
feb 01 12:49:24 localhost.localdomain kernel:  pci_enable_msi+0x16/0x30
feb 01 12:49:24 localhost.localdomain kernel:  ath10k_pci_probe+0x2f1/0x865 [ath10k_pci]
feb 01 12:49:24 localhost.localdomain kernel:  local_pci_probe+0x42/0x80
feb 01 12:49:24 localhost.localdomain kernel:  ? _cond_resched+0x16/0x40
feb 01 12:49:24 localhost.localdomain kernel:  pci_device_probe+0xd9/0x190
feb 01 12:49:24 localhost.localdomain kernel:  really_probe+0x205/0x460
feb 01 12:49:24 localhost.localdomain kernel:  driver_probe_device+0xe1/0x150
feb 01 12:49:24 localhost.localdomain kernel:  device_driver_attach+0xa1/0xb0
feb 01 12:49:24 localhost.localdomain kernel:  __driver_attach+0x8a/0x150
feb 01 12:49:24 localhost.localdomain kernel:  ? device_driver_attach+0xb0/0xb0
feb 01 12:49:24 localhost.localdomain kernel:  ? device_driver_attach+0xb0/0xb0
feb 01 12:49:24 localhost.localdomain kernel:  bus_for_each_dev+0x64/0x90
feb 01 12:49:24 localhost.localdomain kernel:  bus_add_driver+0x12b/0x1e0
feb 01 12:49:24 localhost.localdomain kernel:  driver_register+0x8b/0xe0
feb 01 12:49:24 localhost.localdomain kernel:  ? 0xffffffffc047b000
feb 01 12:49:24 localhost.localdomain kernel:  ath10k_pci_init+0x1f/0x1000 [ath10k_pci]
feb 01 12:49:24 localhost.localdomain kernel:  do_one_initcall+0x44/0x1d0
feb 01 12:49:24 localhost.localdomain kernel:  ? do_init_module+0x23/0x260
feb 01 12:49:24 localhost.localdomain kernel:  ? kmem_cache_alloc_trace+0xef/0x1e0
feb 01 12:49:24 localhost.localdomain kernel:  do_init_module+0x5c/0x260
feb 01 12:49:24 localhost.localdomain kernel:  __do_sys_init_module+0x12a/0x190
feb 01 12:49:24 localhost.localdomain kernel:  do_syscall_64+0x33/0x40
feb 01 12:49:24 localhost.localdomain kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
feb 01 12:49:24 localhost.localdomain kernel: RIP: 0033:0x7fc739f5f4be
feb 01 12:49:24 localhost.localdomain kernel: Code: 48 8b 0d bd 19 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8a 19 0c 00 f7 d8 64 89 01 48
feb 01 12:49:24 localhost.localdomain kernel: RSP: 002b:00007ffcfe0b12f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
feb 01 12:49:24 localhost.localdomain kernel: RAX: ffffffffffffffda RBX: 000055f6bd1dac80 RCX: 00007fc739f5f4be
feb 01 12:49:24 localhost.localdomain kernel: RDX: 000055f6bc57c6a6 RSI: 000000000001601e RDI: 000055f6bde66ff0
feb 01 12:49:24 localhost.localdomain kernel: RBP: 000055f6bde66ff0 R08: 000055f6bde66ff0 R09: 00007ffcfe0ad1be
feb 01 12:49:24 localhost.localdomain kernel: R10: 000055f6bd1da010 R11: 0000000000000246 R12: 000055f6bc57c6a6
feb 01 12:49:24 localhost.localdomain kernel: R13: 000055f6bd1dac10 R14: 000055f6bd1dac80 R15: 000055f6bd1de040
feb 01 12:49:24 localhost.localdomain kernel: ---[ end trace 5561cb8140a0f32a ]---
feb 01 12:49:24 localhost.localdomain kernel: amdgpu: VI should always have 2 performance levels
feb 01 12:49:24 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: pci irq legacy oper_irq_mode 1 irq_mode 0 reset_mode 0
feb 01 12:49:25 localhost.localdomain kernel: ath10k_warn: 147 callbacks suppressed
feb 01 12:49:25 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x0003a028: -110
feb 01 12:49:25 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x0003a028: -110
feb 01 12:49:25 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x0003a028: -110
feb 01 12:49:25 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x0003a028: -110
feb 01 12:49:25 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x0003a028: -110
feb 01 12:49:25 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x0003a028: -110
feb 01 12:49:25 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x0003a028: -110
feb 01 12:49:25 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x0003a028: -110
feb 01 12:49:25 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x0003a028: -110
feb 01 12:49:25 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to wake target for read32 at 0x0003a028: -110
feb 01 12:49:27 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to read device register, device is gone
feb 01 12:49:27 localhost.localdomain kernel: ath10k_pci 0000:03:00.0: failed to reset chip: -5
feb 01 12:49:30 localhost.localdomain kernel: ath10k_pci: probe of 0000:03:00.0 failed with error -5

I'm desperate.

Comment 10 Ben Cotton 2021-11-04 13:57:35 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 11 Ben Cotton 2021-11-04 14:26:55 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 12 Ben Cotton 2021-11-04 15:24:34 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 13 Ben Cotton 2022-11-29 16:50:14 UTC
This message is a reminder that Fedora Linux 35 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 35 on 2022-12-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '35'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 35 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 14 Ben Cotton 2022-12-13 15:16:14 UTC
Fedora Linux 35 entered end-of-life (EOL) status on 2022-12-13.

Fedora Linux 35 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.