Bug 501117 - kernel oops on iwl3945
kernel oops on iwl3945
Status: CLOSED WORKSFORME
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
11
x86_64 Linux
low Severity medium
: ---
: ---
Assigned To: Stanislaw Gruszka
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-05-16 11:55 EDT by Lukas Bezdicka
Modified: 2009-08-14 04:56 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-08-14 04:56:04 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmesg output from dumped kernel (43 bytes, text/plain)
2009-05-16 11:55 EDT, Lukas Bezdicka
no flags Details
contents of dmesg (63.23 KB, text/plain)
2009-05-16 13:27 EDT, Chuck Ebbert
no flags Details

  None (edit)
Description Lukas Bezdicka 2009-05-16 11:55:48 EDT
Created attachment 344272 [details]
dmesg output from dumped kernel

How reproducible:
always

Steps to Reproduce:
1. get machine with intel wifi
2. while true; do rmmod iwl3945; modprobe iwl3945; done
3. oops
  
Actual results:
oops

Expected results:
something ugly but not crash
Comment 1 Chuck Ebbert 2009-05-16 13:27:56 EDT
Created attachment 344284 [details]
contents of dmesg
Comment 2 Chuck Ebbert 2009-05-16 13:48:34 EDT
/usr/src/debug/kernel-2.6.29/linux-2.6.29.x86_64/kernel/timer.c:932
ffffffff81052505:       4c 63 c0                movslq %eax,%r8
ffffffff81052508:       49 c1 e0 04             shl    $0x4,%r8
ffffffff8105250c:       4f 8b 0c 10             mov    (%r8,%r10,1),%r9
ffffffff81052510:       4f 8d 04 02             lea    (%r10,%r8,1),%r8
ffffffff81052514:       eb 13                   jmp    ffffffff81052529 <get_next_timer_interrupt+0x110>
/usr/src/debug/kernel-2.6.29/linux-2.6.29.x86_64/kernel/timer.c:934
ffffffff81052516:       49 8b 79 10             mov    0x10(%r9),%rdi
/usr/src/debug/kernel-2.6.29/linux-2.6.29.x86_64/kernel/timer.c:932
ffffffff8105251a:       4d 89 d9                mov    %r11,%r9
/usr/src/debug/kernel-2.6.29/linux-2.6.29.x86_64/kernel/timer.c:934
ffffffff8105251d:       4c 39 e7                cmp    %r12,%rdi
ffffffff81052520:       4c 0f 48 e7             cmovs  %rdi,%r12
/usr/src/debug/kernel-2.6.29/linux-2.6.29.x86_64/kernel/timer.c:932
ffffffff81052524:       bf 01 00 00 00          mov    $0x1,%edi
ffffffff81052529:       4d 8b 19                mov    (%r9),%r11   <====
ffffffff8105252c:       4d 39 c1                cmp    %r8,%r9

                index = slot = timer_jiffies & TVN_MASK;
                do {
===>                    list_for_each_entry(nte, varp->vec + slot, entry) {
                                found = 1;
                                if (time_before(nte->expires, expires))
                                        expires = nte->expires;
                        }
                        /* 
                         * Do we still search for the first timer or are 
                         * we looking up the cascade buckets ? 
                         */
                        if (found) {
                                /* Look at the cascade bucket(s)? */
                                if (!index || slot < index)
                                        break;
                                return expires;
                        }
                        slot = (slot + 1) & TVN_MASK;
                } while (slot != index);
Comment 3 Bug Zapper 2009-06-09 11:56:02 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 4 Stanislaw Gruszka 2009-07-16 11:16:57 EDT
I we have race in patch:

linux-2.6-iwl3945-report-killswitch-changes-even-if-the-interface-is-down.patch

When device is removed in iwl3945_pci_remove() first is call to:

cancel_delayed_work_sync(&priv->rfkill_poll);

and then:

ieee80211_unregister_hw(priv->hw)
 -> iwl3945_mac_stop(struct ieee80211_hw *hw)
    -> queue_delayed_work(priv->workqueue, &priv->rfkill_poll,
                           round_jiffies_relative(2 * HZ));

So after module unloading we can have armed timer, which access data
from module memory region. Race is fixed in mainline by commit:

commit d552bfb65241a35d48e44ddb0d27e0454f579ab4
Author: Kolekar, Abhijeet <abhijeet.kolekar@intel.com>
Date:   Fri Dec 19 10:37:41 2008 +0800

    iwl3945: release resources before shutting down

Commit apply almost cleanly on fedora kernel sources, I tested it and can not reproduce the oops.
Comment 5 Stanislaw Gruszka 2009-07-17 04:21:58 EDT
Update. After some more testing I we discovered we still have issues with modprobe, rmmod. When NetworkManager is working and I do:

while true; do modprobe iwl3945; rmmod iwl3945 ; done
Ctrl + C
modprobe iwl3945

kernel bug occurs:


 ------------[ cut here ]------------
kernel BUG at drivers/net/wireless/iwlwifi/iwl3945-base.c:3352!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1c.1/0000:03:00.0/firmware/0000:03:00.0/loading
Modules linked in: iwl3945 fuse rfcomm bridge stp llc bnep sco l2cap sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq dm_multipath uinput snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm arc4 snd_timer yenta_socket rsrc_nonstatic snd soundcore ecb joydev mac80211 btusb bluetooth i2c_i801 lib80211 iTCO_wdt iTCO_vendor_support e1000e snd_page_alloc nsc_ircc irda crc_ccitt cfg80211 thinkpad_acpi hwmon pcspkr i915 drm i2c_algo_bit i2c_core video output [last unloaded: iwl3945]

Pid: 0, comm: swapper Not tainted (2.6.29.5my #1) 6369CTO
EIP: 0060:[<f8c5dff5>] EFLAGS: 00010097 CPU: 0
EIP is at iwl3945_irq_tasklet+0x499/0x7be [iwl3945]
EAX: e84b0000 EBX: e8418e60 ECX: ef8ae000 EDX: 00000000
ESI: e8419788 EDI: e84b6004 EBP: c08ffec8 ESP: c08ffe7c
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=c08fe000 task=c0889350 task.ti=c08fe000)
Stack:
 e841b48c e841cde8 e8419758 0002f902 e841b498 e841b69c 801c122c e84193d4
 00000282 00000008 00000002 00010000 00000001 00000000 80000008 00000001
 e841ce98 e841ce9c c096fc00 c08ffee4 c0439a9c 00000000 c096c690 00000001
Call Trace:
 [<c0439a9c>] ? tasklet_action+0x8b/0xf7
 [<c0439f39>] ? __do_softirq+0x99/0x139
 [<c043a02b>] ? do_softirq+0x52/0x7e
 [<c043a196>] ? irq_exit+0x49/0x77
 [<c040b00e>] ? do_IRQ+0x97/0xad
 [<c0409b2c>] ? common_interrupt+0x2c/0x34
 [<c05b6d48>] ? acpi_idle_enter_bm+0x25f/0x2a9
 [<c06748e0>] ? cpuidle_idle_call+0x65/0x9d
 [<c04085f0>] ? cpu_idle+0x72/0x92
 [<c0705010>] ? rest_init+0x58/0x5a
Code: f0 00 0f 84 3e 01 00 00 8b 4e 08 85 c9 0f 84 28 01 00 00 8b 81 a8 00 00 00 66 8b 40 06 0f b6 d4 81 e2 bf 00 00 00 83 fa 04 74 04 <0f> 0b eb fe f6 c4 40 88 45 f0 8b bb 50 27 00 00 75 06 8a 45 f0
EIP: [<f8c5dff5>] iwl3945_irq_tasklet+0x499/0x7be [iwl3945] SS:ESP 0068:c08ffe7c
Comment 6 Stanislaw Gruszka 2009-07-17 05:40:00 EDT
This second oops is know mainline and reported here:
 
http://marc.info/?l=linux-wireless&m=123147215829854&w=2
Comment 7 Stanislaw Gruszka 2009-07-17 08:39:47 EDT
According bug report mail thread thread these additional fixes are needed: 

commit df833b1d73680f9f9dc72cbc3215edbbc6ab740d
Author: Reinette Chatre <reinette.chatre@intel.com>
Date:   Tue Apr 21 10:55:48 2009 -0700

    iwlwifi: DMA fixes


commit 8cd812bcda06645160b0b279e1a125271a73411c
Author: Winkler, Tomas <tomas.winkler@intel.com>
Date:   Fri Dec 19 10:37:43 2008 +0800

    iwl3945: use iwl_rb_status


Commit "iwlwifi: DMA fixes" is not trivial patch and it is hard to backport as is, without re-base with previous commits ... hmm.
Comment 8 Stanislaw Gruszka 2009-08-14 04:56:04 EDT
We apply

commit 638d0eb9197d1e285451f6594184fcfc9c2a5d44
Author: Chatre, Reinette <reinette.chatre@intel.com>
Date:   Mon Jan 19 15:30:24 2009 -0800

    iwl3945: add debugging for wrong command queue


plus some other patches and I can no longer reproduce this bug with newer F11 kernel-2.6.29.6-217.2.3.fc11. So I'm closing this bug. Please reopen if you still have problem with that.

Note You need to log in before you can comment on or make changes to this bug.