Bug 1429135

Summary: delayed crash in iwlwifi
Product: [Fedora] Fedora Reporter: N. R. Hayre <nrhayre>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 25CC: cz172638, dimitris.on.linux, extras-orphan, gansalmon, ichavero, itamar, jforbes, jonathan, kernel-maint, labbott, linuxwifi, linville, madhu.chinakonda, mchehab, than
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-4.9.13-201.fc25 kernel-4.10.9-100.fc24 kernel-4.10.9-200.fc25 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-14 22:19:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kernel messages from most recent boot, ending with iwlwifi crash. none

Description N. R. Hayre 2017-03-04 22:17:20 UTC
Created attachment 1260001 [details]
kernel messages from most recent boot, ending with iwlwifi crash.

Description of problem:

Wifi module crashes after some amount of time, usually less than a day, also seems to happen much faster for certain wireless routers.

First message after failure is consistently:

    Mar 04 13:52:49 lepus kernel: iwlwifi 0000:04:00.0: Error sending STATISTICS_CMD: time out after 2000ms.
    Mar 04 13:52:49 lepus kernel: iwlwifi 0000:04:00.0: Timeout exiting D0i3

(As found in attached log.)

Version-Release number of selected component (if applicable):

Seems that the iwlwifi builds are sync'd with kernel releases, and this has been a problem since at least kernel 4.9.8 -- current (4.9.12).

How reproducible:  Normal boot, connect to any wifi, wait 0-24 hours.

The kernel messages are pretty consistent, and the kernel log starting with the most recent boot is attached.

Comment 1 Emmanuel Grumbach 2017-03-07 21:12:34 UTC
Please disable IWLWIFI_PCIE_RTPM

Comment 2 Laura Abbott 2017-03-07 21:18:10 UTC
Is that a general recommendation across the board or just for testing?

Comment 3 Emmanuel Grumbach 2017-03-07 21:26:02 UTC
in general.

It is disabled by default:

config IWLWIFI_PCIE_RTPM
       bool "Enable runtime power management mode for PCIe devices"
       depends on IWLMVM && IWLWIFI_PCIE && PM && EXPERT
       default false
       help
         Say Y here to enable runtime power management for PCIe
         devices.  If enabled, the device will go into low power mode
         when idle for a short period of time, allowing for improved
         power saving during runtime. Note that this feature requires
         a tight integration with the platform. It is not recommended
         to enable this feature without proper validation with the
         specific target platform.


You really shouldn't enable that in a broad distro that will run on unknown platform.

Comment 4 Emmanuel Grumbach 2017-03-07 21:28:47 UTC
I also saw the dreaded:

Mar 04 13:52:49 lepus kernel: WARNING: CPU: 3 PID: 231 at drivers/net/wireless/intel/iwlwifi/pcie/trans.c:1864 iwl_trans_pcie_grab_nic_access+0xeb/0xf0 [iwlwifi]
Mar 04 13:52:49 lepus kernel: Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)

Which is really a PCIe problem for which can't do much. But disabling RTPM is always a good thing to do.

Comment 5 Laura Abbott 2017-03-07 21:41:32 UTC
That option was off in Rawhide/f26 but was still on in F24/F25. I turned it off on those branches. It should show up in the next build (sometime this week I expect)

Comment 6 N. R. Hayre 2017-03-07 22:57:21 UTC
Thank you all for the prompt attention to this.

Indeed, after posting, I noted that crashes do not seem to happen on adapter power.  RTPM was being modulated somewhere by a power-saving tool based on power source, as in:

    echo 'on' > '/sys/bus/pci/devices/0000:04:00.0/power/control';

I have since removed all such tools, and I'm watching and controlling this setting with PowerTOP.

Is changing this setting equivalent to changing the kernel config param that Emmanuel mentions?

This all stems from a quest for longer battery life.  Regarding the statement that "... disabling RTPM is always a good thing to do," does that apply to all devices on PCI?  Is there any power benefit to RTPM when it doesn't cause errors?

I am unfamiliar with the ticket protocol here, but I would be fine with closing this, given the provided solution and the absence of further errors on my end.

Comment 7 Fedora Update System 2017-03-08 15:31:10 UTC
kernel-4.9.13-201.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-387ff46a66

Comment 8 Fedora Update System 2017-03-08 15:32:50 UTC
kernel-4.9.13-101.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2017-2e1f3694b2

Comment 9 Fedora Update System 2017-03-09 14:25:49 UTC
kernel-4.9.13-101.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-2e1f3694b2

Comment 10 Fedora Update System 2017-03-09 14:57:53 UTC
kernel-4.9.13-201.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-387ff46a66

Comment 11 Fedora Update System 2017-03-11 11:50:58 UTC
kernel-4.9.13-101.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 12 Fedora Update System 2017-03-11 12:21:05 UTC
kernel-4.9.13-201.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.

Comment 13 Than Ngo 2017-04-04 15:26:10 UTC
reopen the bug because this option is still enable in latest kernel-4.10.8-200.fc25.x86_64

Comment 14 Justin M. Forbes 2017-04-04 15:53:34 UTC
Sorry, looks like this got missed in the rebase. It should be fixed in the 4.10.9 update when it pushes.

Comment 15 Fedora Update System 2017-04-10 23:26:54 UTC
kernel-4.10.9-200.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2017-3a9ec92dd6

Comment 16 Fedora Update System 2017-04-10 23:28:56 UTC
kernel-4.10.9-100.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2017-502cf68d68

Comment 17 Fedora Update System 2017-04-11 18:54:27 UTC
kernel-4.10.9-100.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-502cf68d68

Comment 18 Fedora Update System 2017-04-11 19:24:36 UTC
kernel-4.10.9-200.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-3a9ec92dd6

Comment 19 Fedora Update System 2017-04-14 22:19:03 UTC
kernel-4.10.9-100.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 20 Fedora Update System 2017-04-14 22:49:43 UTC
kernel-4.10.9-200.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.