Description of problem: After updating to 2.6.30.5 kernel I am getting the warning messages below from the iwl driver. It seems to be load related - ie when i do backups - tho I need more testing to verify this. Tainting is due to rpmfusion nvidia driver. Version-Release number of selected component (if applicable): kernel - 2.6.30.5-43.fc11.x86_64 wifi hardware is 4965 AGN How reproducible: Seems to be load related. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Log Inlcuded Sep 11 06:46:18 f11 kernel: ------------[ cut here ]------------ Sep 11 06:46:18 f11 kernel: WARNING: at drivers/net/wireless/iwlwifi/iwl-tx.c:1127 iwl_tx_cmd_complete+0x72/0x2e2 [iwlcore]() (Tainted: P ) Sep 11 06:46:18 f11 kernel: Hardware name: 6459CTO Sep 11 06:46:18 f11 kernel: wrong command queue 0, sequence 0x0 readp=182 writep=184 Sep 11 06:46:18 f11 kernel: Modules linked in: fuse hidp ipt_MASQUERADE rfcomm bridge stp llc bnep sco l2cap sunrpc autofs4 iptable_nat nf_nat iptable_mangle ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_multipath kvm_intel kvm uinput snd_hda_codec_analog nvidia(P) arc4 ecb iwlagn iwlcore snd_hda_intel snd_hda_codec thinkpad_acpi firewire_ohci video rfkill firewire_core snd_hwdep hwmon joydev crc_itu_t pcspkr btusb snd_pcm output sdhci_pci snd_timer yenta_socket sdhci lib80211 ricoh_mmc mac80211 e1000e snd cfg80211 mmc_core bluetooth i2c_i801 wmi iTCO_wdt soundcore iTCO_vendor_support rsrc_nonstatic snd_page_alloc i2c_core sha256_generic cbc cryptd aes_x86_64 aes_generic dm_crypt [last unloaded: microcode] Sep 11 06:46:18 f11 kernel: Pid: 0, comm: swapper Tainted: P 2.6.30.5-43.fc11.x86_64 #1 Sep 11 06:46:18 f11 kernel: Call Trace: Sep 11 06:46:18 f11 kernel: <IRQ> [<ffffffff81057104>] warn_slowpath_common+0x95/0xc3 Sep 11 06:46:18 f11 kernel: [<ffffffff810571bf>] warn_slowpath_fmt+0x50/0x66 Sep 11 06:46:18 f11 kernel: [<ffffffffa01f3087>] iwl_tx_cmd_complete+0x72/0x2e2 [iwlcore] Sep 11 06:46:18 f11 kernel: [<ffffffff81235a97>] ? swiotlb_unmap_page+0x2a/0x7d Sep 11 06:46:18 f11 kernel: [<ffffffffa0218c57>] iwl_rx_handle+0x2c8/0x3b3 [iwlagn] Sep 11 06:46:18 f11 kernel: [<ffffffff8106c502>] ? queue_work_on+0x59/0x74 Sep 11 06:46:18 f11 kernel: [<ffffffff8106c68d>] ? queue_work+0x32/0x48 Sep 11 06:46:18 f11 kernel: [<ffffffffa021ad75>] iwl_irq_tasklet+0x4c8/0x6ff [iwlagn] Sep 11 06:46:18 f11 kernel: [<ffffffffa01ec527>] ? iwl_isr+0x1e3/0x22c [iwlcore] Sep 11 06:46:18 f11 kernel: [<ffffffff810748e2>] ? ktime_get_ts+0x61/0x7d Sep 11 06:46:18 f11 kernel: [<ffffffff8105d12b>] tasklet_action+0x9f/0x11f Sep 11 06:46:18 f11 kernel: [<ffffffff8105db54>] __do_softirq+0xd2/0x1d2 Sep 11 06:46:18 f11 kernel: [<ffffffff8101323c>] call_softirq+0x1c/0x30 Sep 11 06:46:18 f11 kernel: [<ffffffff81014ba3>] do_softirq+0x5f/0xd7 Sep 11 06:46:18 f11 kernel: [<ffffffff8105d633>] irq_exit+0x66/0xb7 Sep 11 06:46:18 f11 kernel: [<ffffffff814986d8>] ? trace_hardirqs_off_thunk+0x3a/0x6c Sep 11 06:46:18 f11 kernel: [<ffffffff810143f8>] do_IRQ+0xc1/0xee Sep 11 06:46:18 f11 kernel: [<ffffffff81012a93>] ret_from_intr+0x0/0x16 Sep 11 06:46:18 f11 kernel: <EOI> [<ffffffff812a5dc3>] ? acpi_idle_enter_bm+0x26a/0x2b2 Sep 11 06:46:18 f11 kernel: [<ffffffff812a5dbc>] ? acpi_idle_enter_bm+0x263/0x2b2 Sep 11 06:46:18 f11 kernel: [<ffffffff813b34ef>] ? cpuidle_idle_call+0xa0/0xed Sep 11 06:46:18 f11 kernel: [<ffffffff81010f96>] ? cpu_idle+0xbf/0x10a Sep 11 06:46:18 f11 kernel: [<ffffffff814809da>] ? rest_init+0x7e/0x94 Sep 11 06:46:18 f11 kernel: [<ffffffff81758034>] ? start_kernel+0x412/0x433 Sep 11 06:46:18 f11 kernel: [<ffffffff817572d0>] ? x86_64_start_reservations+0xbb/0xd6 Sep 11 06:46:18 f11 kernel: [<ffffffff817573f0>] ? x86_64_start_kernel+0x105/0x128 Sep 11 06:46:18 f11 kernel: ---[ end trace 54777fc7f280841f ]---
Here is the warning in the code: /* If a Tx command is being handled and it isn't in the actual * command queue then there a command routing bug has been introduced * in the queue management code. */ if (WARN(txq_id != IWL_CMD_QUEUE_NUM, "wrong command queue %d, sequence 0x%X readp=%d writep=%d\n", txq_id, sequence, priv->txq[IWL_CMD_QUEUE_NUM].q.read_ptr, priv->txq[IWL_CMD_QUEUE_NUM].q.write_ptr)) { iwl_print_hex_dump(priv, IWL_DL_INFO , rxb, 32); return; }
I ran some further tests - and it is indeed triggered by load - seems to be either direction too. I'm still running 2.6.30.5-43.fc11.x86_64.
I'm trying to reproduce this on my computer, as far no luck. Gene c, what is your network configuration (abgn, encryption) ? How much data need to be transferred to trigger a warning? And just in case, do you use the newest firmware? If you think there is also any other information relevant to reproduce this bug, please provide it.
Respinding to https://bugzilla.redhat.com/show_bug.cgi?id=523075#c3 Network is N Encryption is wpa2 personal firmware is iwl4965-firmware.noarch 228.61.2.24-1.fc11 How much data - not sure - I run rdiff backup to do backups. I do recall similar bug discussed on LKML (I think) and have vague recollections it may be fixed in 2.6.32.
could this be related or no ? http://patchwork.kernel.org/patch/50172/
no sorry for noise - that trace looks quite different ..
I can confirm this being fixed in 2.6.32.8 - at least after several days of testing I am unable to produce this with the newer kernel.
Please reopen this - I have just got the same traceback with 2.6.32.8.58 (I did not see this with the 2.6.32.8-55 build tho I see no wifi changes between the 2) Course this happens the moment I closed the bug!!! Sigh ... I really thought it was fixed. The trace now is different however - included here: Feb 19 19:22:29 lap1 kernel: Pid: 2837, comm: chrome Not tainted 2.6.32.8-58.fc12.x86_64 #1 Feb 19 19:22:29 lap1 kernel: Call Trace: Feb 19 19:22:29 lap1 kernel: <IRQ> [<ffffffff81056348>] warn_slowpath_common+0x7c/0x94 Feb 19 19:22:29 lap1 kernel: [<ffffffff81056374>] warn_slowpath_null+0x14/0x16 Feb 19 19:22:29 lap1 kernel: [<ffffffffa0191ae2>] ___ieee80211_stop_tx_ba_session+0x7b/0x80 [mac80211] Feb 19 19:22:29 lap1 kernel: [<ffffffffa0191ccf>] sta_addba_resp_timer_expired+0x64/0x75 [mac80211] Feb 19 19:22:29 lap1 kernel: [<ffffffff810650b4>] run_timer_softirq+0x1c4/0x268 Feb 19 19:22:29 lap1 kernel: [<ffffffff81026b8e>] ? apic_write+0x16/0x18 Feb 19 19:22:29 lap1 kernel: [<ffffffff8105d964>] __do_softirq+0xe5/0x1a9 Feb 19 19:22:29 lap1 kernel: [<ffffffff810804ce>] ? tick_program_event+0x2a/0x2c Feb 19 19:22:29 lap1 kernel: [<ffffffff81012e6c>] call_softirq+0x1c/0x30 Feb 19 19:22:29 lap1 kernel: [<ffffffff810143ea>] do_softirq+0x46/0x86 Feb 19 19:22:29 lap1 kernel: [<ffffffff8105d7a2>] irq_exit+0x3b/0x7d Feb 19 19:22:29 lap1 kernel: [<ffffffff81459c1a>] smp_apic_timer_interrupt+0x86/0x94 Feb 19 19:22:29 lap1 kernel: [<ffffffff81012833>] apic_timer_interrupt+0x13/0x20 Feb 19 19:22:29 lap1 kernel: <EOI> Feb 19 19:22:29 lap1 kernel: ---[ end trace 4944414491987e6e ]---
Once this occurs the network in not useable - rmmod iwlagn ; modprobe iwlagn does bring te network back to life.
Hi gene c Could you provide full log from system start (/var/log/messages or dmesg if it contains everything)? What is your AP model/firmware?
Created attachment 395474 [details] dmesg from fresh boot As requested - this is the dmesg from a fresh boot. AP is a Cisco-Linksys WRT160N ( Version 1 Hardware ) running the latest firmware which is 01/10/2008 Ver.1.02.2 Thanks
Oops sorry I was not clear, I meant full logs when this issue happens. I have access to WRT160 (not sure which version), I'm going to start use it. I know I asked you before, but want to do this again, any ideas how to reproduce that bug?
Created attachment 395489 [details] messages log file including traceback 1) Log file attached. Please note that in this case, I rmmod' iwlagn and modprobe'd it to get it working again - you will see that in the log - The network was not useable right after the warn_slowpath_xxx stuff. 2) The traceback I'm getting now (with 2.6.32.8-58) is similar but not the same as the original I saw in 2.6.30. They both begin warn_slow_path_common(). 3) In the past - load (such as running a backup using rdiff-backup) would trigger this. 4) In this case reflected in this log (2.6.32.8) it was merely booting, logging in - nm connected - start up google-chrome and use it. Chrome had several tabs so may have caused a some network activity for sure - but less than a backup. 5) I had successfully run backups using 2.6.32.8-55 build and it did NOT trigger - as soon as I installed the -58 build the above problem happened (coincidence?) I'd suggest putting the wifi under headvy load - not sure if the encryption type plays a roll - i belive it is WPA2/personal AES (im not in front of the AP at the moment). Let me know how else I can help. These may be related maybe ? Googling warn_slowpath_common() brings up lots but many look different traces https://bugzilla.redhat.com/show_bug.cgi?id=556990 https://bugzilla.redhat.com/show_bug.cgi?id=502329 http://lkml.indiana.edu/hypermail/linux/kernel/0910.1/00250.html
Here is a bit longer except from logs. >Feb 19 19:22:28 lap1 kernel: iwlagn 0000:03:00.0: iwl_tx_agg_start on ra = 00:21:29:a3:3c:d0 tid = 0 >Feb 19 19:22:28 lap1 kernel: iwlagn 0000:03:00.0: Hardware error detected. Restarting. >Feb 19 19:22:29 lap1 kernel: Registered led device: iwl-phy0::radio >Feb 19 19:22:29 lap1 kernel: Registered led device: iwl-phy0::assoc >Feb 19 19:22:29 lap1 kernel: Registered led device: iwl-phy0::RX >Feb 19 19:22:29 lap1 kernel: Registered led device: iwl-phy0::TX >Feb 19 19:22:29 lap1 kernel: iwlagn 0000:03:00.0: Stopping AGG while state not ON or starting >Feb 19 19:22:29 lap1 kernel: iwlagn 0000:03:00.0: queue number out of range: 0, must be 7 to 14 >Feb 19 19:22:29 lap1 kernel: ------------[ cut here ]------------ >Feb 19 19:22:29 lap1 kernel: WARNING: at net/mac80211/agg-tx.c:150 ___ieee80211_stop_tx_ba_session+0x7b/0x80 [mac80211]() >Feb 19 19:22:29 lap1 kernel: Hardware name: 6459CTO >Feb 19 19:22:29 lap1 kernel: Modules linked in: fuse hidp rfcomm sco bridge stp llc bnep l2cap autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table iptable_mangle ipt_MASQUERADE iptable_nat nf_nat ip6t_RE >JECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm uinput snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep arc4 snd_seq snd_seq_device snd_pcm thinkpad_acpi ecb sdhci_pci sdhci mm >c_core ricoh_mmc i2c_i801 snd_timer snd iwlagn btusb iwlcore bluetooth mac80211 e1000e cfg80211 rfkill joydev soundcore iTCO_wdt snd_page_alloc wmi iTCO_vendor_support sha256_generic cryptd aes_x86_64 aes_ge >neric cbc dm_crypt dm_multipath firewire_ohci yenta_socket rsrc_nonstatic firewire_core crc_itu_t video output nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: microcode] >Feb 19 19:22:29 lap1 kernel: Pid: 2837, comm: chrome Not tainted 2.6.32.8-58.fc12.x86_64 #1 >Feb 19 19:22:29 lap1 kernel: Call Trace: >Feb 19 19:22:29 lap1 kernel: <IRQ> [<ffffffff81056348>] warn_slowpath_common+0x7c/0x94 >Feb 19 19:22:29 lap1 kernel: [<ffffffff81056374>] warn_slowpath_null+0x14/0x16 >Feb 19 19:22:29 lap1 kernel: [<ffffffffa0191ae2>] ___ieee80211_stop_tx_ba_session+0x7b/0x80 [mac80211] >Feb 19 19:22:29 lap1 kernel: [<ffffffffa0191ccf>] sta_addba_resp_timer_expired+0x64/0x75 [mac80211] >Feb 19 19:22:29 lap1 kernel: [<ffffffff810650b4>] run_timer_softirq+0x1c4/0x268 >Feb 19 19:22:29 lap1 kernel: [<ffffffff81026b8e>] ? apic_write+0x16/0x18 >Feb 19 19:22:29 lap1 kernel: [<ffffffff8105d964>] __do_softirq+0xe5/0x1a9 >Feb 19 19:22:29 lap1 kernel: [<ffffffff810804ce>] ? tick_program_event+0x2a/0x2c >Feb 19 19:22:29 lap1 kernel: [<ffffffff81012e6c>] call_softirq+0x1c/0x30 >Feb 19 19:22:29 lap1 kernel: [<ffffffff810143ea>] do_softirq+0x46/0x86 >Feb 19 19:22:29 lap1 kernel: [<ffffffff8105d7a2>] irq_exit+0x3b/0x7d >Feb 19 19:22:29 lap1 kernel: [<ffffffff81459c1a>] smp_apic_timer_interrupt+0x86/0x94 >Feb 19 19:22:29 lap1 kernel: [<ffffffff81012833>] apic_timer_interrupt+0x13/0x20 >Feb 19 19:22:29 lap1 kernel: <EOI> >Feb 19 19:22:29 lap1 kernel: ---[ end trace 4944414491987e6e ]--- First we have, hardware error, then driver try to recover it, but firmware/driver is in some inconsistent state, and firmware give driver bad queue number. Regarding this bug, I think we have to do better recovery (firmware reset) mechanism. I going to look at related code maybe I will have some propositions. Then I think we have to report this bug upstream in inelllinuxwireless.org bugzilla.
Thanks for your help - Is the 'hardware error' mean my hardware is bad - or this kind of thing happens ? Do i need to have the hardware replaced - which would be tricky unless windows logs an error to make lenovo happy enough to replace the parts. I'd be happy to test a new firmware too if that helps? gene
There is something wrong with the hardware, but we should be able to recover that with firmware/driver. If for example windows driver cope well with the same hardware, linux should cope with it as well. I don't know if new testing firmware is available publicly. I thought you are using the newest f/w release, if no please update, http://intellinuxwireless.org/?n=Downloads (newest should be the same as in rpm). warn_slowpath_common() is part of every linux kernel warning, at least with some configuration options enabled, just ignore this part of calltrace.
I seem unable to see this in windows - but perhaps I need to run windows longer. I am running the latest relesaed firmware - I found nothing newer than the fedora rpm. Thanks for your help.
Hi Intel created a dozen of patches for driver recovery from fimware errors. I backported them to fedora. Could you please test below kernel and see if it helps in that issue: http://koji.fedoraproject.org/koji/taskinfo?taskID=2090837
I will try your build - what I have found fairly reliable is run a backup using rdiff-backup to a server over the wireless network. It usually lasts a while ... could take 15 mins up to an hour to die ... so make sure there's lots to backup. My disk is encrypted, so I suspect a backup from a non-encrypted disk would go a lot faster and possibly show the problem sooner - tho I do not know. g
This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
i am no longer able to produce this on f13 kernel 2.6.33.5-124.fc13.x86_64
Ok gene c, thanks for info.