Bug 523075
| Summary: | iwlwifi driver - warn_slowpath_fmt warning messages - 2.6.30.5-43.fc11.x86_64 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | gene c <gjunk> | ||||||
| Component: | kernel | Assignee: | Stanislaw Gruszka <sgruszka> | ||||||
| Status: | CLOSED NEXTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | low | ||||||||
| Version: | 11 | CC: | itamar, kernel-maint, linville, madko, reinette.chatre | ||||||
| Target Milestone: | --- | Keywords: | Reopened | ||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2010-06-21 06:26:04 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
gene c
2009-09-13 22:13:32 UTC
Here is the warning in the code:
/* If a Tx command is being handled and it isn't in the actual
* command queue then there a command routing bug has been introduced
* in the queue management code. */
if (WARN(txq_id != IWL_CMD_QUEUE_NUM,
"wrong command queue %d, sequence 0x%X readp=%d writep=%d\n",
txq_id, sequence,
priv->txq[IWL_CMD_QUEUE_NUM].q.read_ptr,
priv->txq[IWL_CMD_QUEUE_NUM].q.write_ptr)) {
iwl_print_hex_dump(priv, IWL_DL_INFO , rxb, 32);
return;
}
I ran some further tests - and it is indeed triggered by load - seems to be either direction too. I'm still running 2.6.30.5-43.fc11.x86_64. I'm trying to reproduce this on my computer, as far no luck. Gene c, what is your network configuration (abgn, encryption) ? How much data need to be transferred to trigger a warning? And just in case, do you use the newest firmware? If you think there is also any other information relevant to reproduce this bug, please provide it. Respinding to https://bugzilla.redhat.com/show_bug.cgi?id=523075#c3 Network is N Encryption is wpa2 personal firmware is iwl4965-firmware.noarch 228.61.2.24-1.fc11 How much data - not sure - I run rdiff backup to do backups. I do recall similar bug discussed on LKML (I think) and have vague recollections it may be fixed in 2.6.32. could this be related or no ? http://patchwork.kernel.org/patch/50172/ no sorry for noise - that trace looks quite different .. I can confirm this being fixed in 2.6.32.8 - at least after several days of testing I am unable to produce this with the newer kernel. Please reopen this - I have just got the same traceback with 2.6.32.8.58 (I did not see this with the 2.6.32.8-55 build tho I see no wifi changes between the 2) Course this happens the moment I closed the bug!!! Sigh ... I really thought it was fixed. The trace now is different however - included here: Feb 19 19:22:29 lap1 kernel: Pid: 2837, comm: chrome Not tainted 2.6.32.8-58.fc12.x86_64 #1 Feb 19 19:22:29 lap1 kernel: Call Trace: Feb 19 19:22:29 lap1 kernel: <IRQ> [<ffffffff81056348>] warn_slowpath_common+0x7c/0x94 Feb 19 19:22:29 lap1 kernel: [<ffffffff81056374>] warn_slowpath_null+0x14/0x16 Feb 19 19:22:29 lap1 kernel: [<ffffffffa0191ae2>] ___ieee80211_stop_tx_ba_session+0x7b/0x80 [mac80211] Feb 19 19:22:29 lap1 kernel: [<ffffffffa0191ccf>] sta_addba_resp_timer_expired+0x64/0x75 [mac80211] Feb 19 19:22:29 lap1 kernel: [<ffffffff810650b4>] run_timer_softirq+0x1c4/0x268 Feb 19 19:22:29 lap1 kernel: [<ffffffff81026b8e>] ? apic_write+0x16/0x18 Feb 19 19:22:29 lap1 kernel: [<ffffffff8105d964>] __do_softirq+0xe5/0x1a9 Feb 19 19:22:29 lap1 kernel: [<ffffffff810804ce>] ? tick_program_event+0x2a/0x2c Feb 19 19:22:29 lap1 kernel: [<ffffffff81012e6c>] call_softirq+0x1c/0x30 Feb 19 19:22:29 lap1 kernel: [<ffffffff810143ea>] do_softirq+0x46/0x86 Feb 19 19:22:29 lap1 kernel: [<ffffffff8105d7a2>] irq_exit+0x3b/0x7d Feb 19 19:22:29 lap1 kernel: [<ffffffff81459c1a>] smp_apic_timer_interrupt+0x86/0x94 Feb 19 19:22:29 lap1 kernel: [<ffffffff81012833>] apic_timer_interrupt+0x13/0x20 Feb 19 19:22:29 lap1 kernel: <EOI> Feb 19 19:22:29 lap1 kernel: ---[ end trace 4944414491987e6e ]--- Once this occurs the network in not useable - rmmod iwlagn ; modprobe iwlagn does bring te network back to life. Hi gene c Could you provide full log from system start (/var/log/messages or dmesg if it contains everything)? What is your AP model/firmware? Created attachment 395474 [details]
dmesg from fresh boot
As requested - this is the dmesg from a fresh boot.
AP is a Cisco-Linksys WRT160N ( Version 1 Hardware ) running the latest firmware which is
01/10/2008
Ver.1.02.2
Thanks
Oops sorry I was not clear, I meant full logs when this issue happens. I have access to WRT160 (not sure which version), I'm going to start use it. I know I asked you before, but want to do this again, any ideas how to reproduce that bug? Created attachment 395489 [details] messages log file including traceback 1) Log file attached. Please note that in this case, I rmmod' iwlagn and modprobe'd it to get it working again - you will see that in the log - The network was not useable right after the warn_slowpath_xxx stuff. 2) The traceback I'm getting now (with 2.6.32.8-58) is similar but not the same as the original I saw in 2.6.30. They both begin warn_slow_path_common(). 3) In the past - load (such as running a backup using rdiff-backup) would trigger this. 4) In this case reflected in this log (2.6.32.8) it was merely booting, logging in - nm connected - start up google-chrome and use it. Chrome had several tabs so may have caused a some network activity for sure - but less than a backup. 5) I had successfully run backups using 2.6.32.8-55 build and it did NOT trigger - as soon as I installed the -58 build the above problem happened (coincidence?) I'd suggest putting the wifi under headvy load - not sure if the encryption type plays a roll - i belive it is WPA2/personal AES (im not in front of the AP at the moment). Let me know how else I can help. These may be related maybe ? Googling warn_slowpath_common() brings up lots but many look different traces https://bugzilla.redhat.com/show_bug.cgi?id=556990 https://bugzilla.redhat.com/show_bug.cgi?id=502329 http://lkml.indiana.edu/hypermail/linux/kernel/0910.1/00250.html Here is a bit longer except from logs.
>Feb 19 19:22:28 lap1 kernel: iwlagn 0000:03:00.0: iwl_tx_agg_start on ra = 00:21:29:a3:3c:d0 tid = 0
>Feb 19 19:22:28 lap1 kernel: iwlagn 0000:03:00.0: Hardware error detected. Restarting.
>Feb 19 19:22:29 lap1 kernel: Registered led device: iwl-phy0::radio
>Feb 19 19:22:29 lap1 kernel: Registered led device: iwl-phy0::assoc
>Feb 19 19:22:29 lap1 kernel: Registered led device: iwl-phy0::RX
>Feb 19 19:22:29 lap1 kernel: Registered led device: iwl-phy0::TX
>Feb 19 19:22:29 lap1 kernel: iwlagn 0000:03:00.0: Stopping AGG while state not ON or starting
>Feb 19 19:22:29 lap1 kernel: iwlagn 0000:03:00.0: queue number out of range: 0, must be 7 to 14
>Feb 19 19:22:29 lap1 kernel: ------------[ cut here ]------------
>Feb 19 19:22:29 lap1 kernel: WARNING: at net/mac80211/agg-tx.c:150 ___ieee80211_stop_tx_ba_session+0x7b/0x80 [mac80211]()
>Feb 19 19:22:29 lap1 kernel: Hardware name: 6459CTO
>Feb 19 19:22:29 lap1 kernel: Modules linked in: fuse hidp rfcomm sco bridge stp llc bnep l2cap autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table iptable_mangle ipt_MASQUERADE iptable_nat nf_nat ip6t_RE
>JECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm uinput snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep arc4 snd_seq snd_seq_device snd_pcm thinkpad_acpi ecb sdhci_pci sdhci mm
>c_core ricoh_mmc i2c_i801 snd_timer snd iwlagn btusb iwlcore bluetooth mac80211 e1000e cfg80211 rfkill joydev soundcore iTCO_wdt snd_page_alloc wmi iTCO_vendor_support sha256_generic cryptd aes_x86_64 aes_ge
>neric cbc dm_crypt dm_multipath firewire_ohci yenta_socket rsrc_nonstatic firewire_core crc_itu_t video output nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: microcode]
>Feb 19 19:22:29 lap1 kernel: Pid: 2837, comm: chrome Not tainted 2.6.32.8-58.fc12.x86_64 #1
>Feb 19 19:22:29 lap1 kernel: Call Trace:
>Feb 19 19:22:29 lap1 kernel: <IRQ> [<ffffffff81056348>] warn_slowpath_common+0x7c/0x94
>Feb 19 19:22:29 lap1 kernel: [<ffffffff81056374>] warn_slowpath_null+0x14/0x16
>Feb 19 19:22:29 lap1 kernel: [<ffffffffa0191ae2>] ___ieee80211_stop_tx_ba_session+0x7b/0x80 [mac80211]
>Feb 19 19:22:29 lap1 kernel: [<ffffffffa0191ccf>] sta_addba_resp_timer_expired+0x64/0x75 [mac80211]
>Feb 19 19:22:29 lap1 kernel: [<ffffffff810650b4>] run_timer_softirq+0x1c4/0x268
>Feb 19 19:22:29 lap1 kernel: [<ffffffff81026b8e>] ? apic_write+0x16/0x18
>Feb 19 19:22:29 lap1 kernel: [<ffffffff8105d964>] __do_softirq+0xe5/0x1a9
>Feb 19 19:22:29 lap1 kernel: [<ffffffff810804ce>] ? tick_program_event+0x2a/0x2c
>Feb 19 19:22:29 lap1 kernel: [<ffffffff81012e6c>] call_softirq+0x1c/0x30
>Feb 19 19:22:29 lap1 kernel: [<ffffffff810143ea>] do_softirq+0x46/0x86
>Feb 19 19:22:29 lap1 kernel: [<ffffffff8105d7a2>] irq_exit+0x3b/0x7d
>Feb 19 19:22:29 lap1 kernel: [<ffffffff81459c1a>] smp_apic_timer_interrupt+0x86/0x94
>Feb 19 19:22:29 lap1 kernel: [<ffffffff81012833>] apic_timer_interrupt+0x13/0x20
>Feb 19 19:22:29 lap1 kernel: <EOI>
>Feb 19 19:22:29 lap1 kernel: ---[ end trace 4944414491987e6e ]---
First we have, hardware error, then driver try to recover it, but firmware/driver is in some inconsistent state, and firmware give driver bad queue number.
Regarding this bug, I think we have to do better recovery (firmware reset) mechanism. I going to look at related code maybe I will have some propositions. Then I think we have to report this bug upstream in inelllinuxwireless.org bugzilla.
Thanks for your help - Is the 'hardware error' mean my hardware is bad - or this kind of thing happens ? Do i need to have the hardware replaced - which would be tricky unless windows logs an error to make lenovo happy enough to replace the parts. I'd be happy to test a new firmware too if that helps? gene There is something wrong with the hardware, but we should be able to recover that with firmware/driver. If for example windows driver cope well with the same hardware, linux should cope with it as well. I don't know if new testing firmware is available publicly. I thought you are using the newest f/w release, if no please update, http://intellinuxwireless.org/?n=Downloads (newest should be the same as in rpm). warn_slowpath_common() is part of every linux kernel warning, at least with some configuration options enabled, just ignore this part of calltrace. I seem unable to see this in windows - but perhaps I need to run windows longer. I am running the latest relesaed firmware - I found nothing newer than the fedora rpm. Thanks for your help. Hi Intel created a dozen of patches for driver recovery from fimware errors. I backported them to fedora. Could you please test below kernel and see if it helps in that issue: http://koji.fedoraproject.org/koji/taskinfo?taskID=2090837 I will try your build - what I have found fairly reliable is run a backup using rdiff-backup to a server over the wireless network. It usually lasts a while ... could take 15 mins up to an hour to die ... so make sure there's lots to backup. My disk is encrypted, so I suspect a backup from a non-encrypted disk would go a lot faster and possibly show the problem sooner - tho I do not know. g This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping i am no longer able to produce this on f13 kernel 2.6.33.5-124.fc13.x86_64 Ok gene c, thanks for info. |