Bug 321521
Summary: | warm boot e100 x86_64 fc8 panic | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Doug Maxey <dwm> | ||||||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||
Severity: | urgent | Docs Contact: | |||||||||||
Priority: | low | ||||||||||||
Version: | 8 | CC: | cebbert, chref, fkooman, kth, wwoods | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | kernel-2.6.23.1-26.fc8 | Doc Type: | Bug Fix | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2007-10-24 19:06:18 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 184121 | ||||||||||||
Attachments: |
|
Description
Doug Maxey
2007-10-06 18:44:31 UTC
Also some e100 problems here on my system (see attachments). The panic happens on the moment the init scripts start the network (/etc/init.d/network start). I can boot in single mode (using grub) and then use dhclient to get the network card to work (no panic!). Using the command /etc/init.d/network start panics the kernel. Cold boot doesn't fix it... Created attachment 218531 [details]
photo of panic
Created attachment 218541 [details]
dmesg / lspci output
(In reply to comment #2) > Created an attachment (id=218531) [edit] > photo of panic > Can you reproduce with kernel option "vga=1" (50-line mode) and take a picture of that? Hm, not with vga=1, because even in single user mode the resolution switches back. I used vga=0x317, that worked. It seems fixed in kernel-2.6.23-0.222.rc9.git4.fc8 though (wiee!), I'll attach the picture I made with kernel-2.6.23-0.220.rc9.git2.fc8 anyway... Created attachment 220121 [details]
kernel-2.6.23-0.220.rc9.git2.fc8 picture of kernel panic
line 1008: BUG_ON(!test_bit(__LINK_STATE_RX_SCHED, &dev->state)); Happened again with .222, but a little different signature.. Oh, I see, more traceback. :) kernel BUG at lib/list_debug.c:33! invalid opcode: 0000 [1] SMP CPU 1 Modules linked in: ipt_REJECT ipt_LOG ipt_recent nf_conntrack_ipv4 iptable_filter ip_tables nf_conntrack_ftp nf_conntrack_ipv6 xt_state nf_conntrack nfnetlink xt_tcpudp ip6t_ipv6header ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand dm_mirror dm_multipath dm_mod snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss k8temp snd_pcm e100 e1000 serio_raw hwmon mii snd_timer snd soundcore snd_page_alloc forcedeth i2c_nforce2 i2c_core button sg sr_mod cdrom pata_amd ata_generic sata_nv libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 1852, comm: ip Not tainted 2.6.23-0.222.rc9.git4.fc8 #1 RIP: 0010:[<ffffffff8112ec86>] [<ffffffff8112ec86>] __list_add+0x47/0x5b RSP: 0018:ffff81007fc2bdd8 EFLAGS: 00010086 RAX: 0000000000000079 RBX: ffff810079980000 RCX: 0000000000006fce RDX: ffff8100791f2000 RSI: ffffffff815740d9 RDI: 0000000000000000 RBP: 0000000000000096 R08: 0000000000000002 R09: ffffffff81038190 R10: ffffffff81038190 R11: 00000001810403a0 R12: ffff810079980980 R13: 0000000000000011 R14: 00000000fffbd77b R15: ffff81007996cdc0 FS: 00002aaaaaac4b00(0000) GS:ffff810003e17578(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000370aeccc00 CR3: 0000000078c16000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ip (pid: 1852, threadinfo ffff81007941a000, task ffff8100791f2000) Stack: ffff810079980ab8 ffffffff811fa7f2 0000000000000000 0000000000000000 ffff810079980000 ffffffff8814e835 ffff810079d063b8 0000000000000000 0000000000000000 ffffffff81070be9 0000000000000011 ffffffff813e4680 Call Trace: <IRQ> [<ffffffff811fa7f2>] __netif_rx_schedule+0x3d/0x81 [<ffffffff8814e835>] :e100:e100_intr+0x9c/0xa8 [<ffffffff81070be9>] handle_IRQ_event+0x1e/0x51 [<ffffffff81071f2a>] handle_fasteoi_irq+0x9a/0xd9 [<ffffffff8100e227>] do_IRQ+0xf1/0x162 [<ffffffff8100c146>] ret_from_intr+0x0/0xf [<ffffffff8814de3d>] :e100:e100_enable_irq+0x17/0x48 [<ffffffff8814fc1d>] :e100:e100_poll+0x0/0x34d [<ffffffff811fdc75>] net_rx_action+0xad/0x1bb [<ffffffff8103d345>] __do_softirq+0x5e/0xe0 [<ffffffff8100cdfc>] call_softirq+0x1c/0x28 <EOI> [<ffffffff811fd408>] dev_open+0x4c/0x6e [<ffffffff8100e0cb>] do_softirq+0x35/0xa0 [<ffffffff8103d22a>] local_bh_enable_ip+0xc9/0xf4 [<ffffffff811fd408>] dev_open+0x4c/0x6e [<ffffffff811fb409>] dev_change_flags+0xaa/0x168 [<ffffffff8124189d>] devinet_ioctl+0x235/0x597 [<ffffffff811f050e>] sock_ioctl+0x1c8/0x1e5 [<ffffffff810af4a5>] do_ioctl+0x21/0x6b [<ffffffff810af73c>] vfs_ioctl+0x24d/0x266 [<ffffffff810af7ae>] sys_ioctl+0x59/0x7b [<ffffffff8100bbfe>] system_call+0x7e/0x83 Code: 0f 0b eb fe 48 89 7e 08 48 89 37 48 89 57 08 48 89 3a 5a c3 RIP [<ffffffff8112ec86>] __list_add+0x47/0x5b RSP <ffff81007fc2bdd8> Kernel panic - not syncing: Aiee, killing interrupt handler! On powerpc, we can get into the debugger by enabling xmon=on. Other than enabling and compiling kgdb, anything like that on x64? From the serial port, that is... And once again, cold boot was fine. And again on .224, back to the original traceback kernel BUG at include/linux/netdevice.h:1008! invalid opcode: 0000 [1] SMP CPU 0 Modules linked in: ipt_REJECT ipt_LOG ipt_recent nf_conntrack_ipv4 iptable_filter ip_tables nf_conntrack_ftp nf_conntrack_ipv6 xt_state nf_conntrack nfnetlink xt_tcpudp ip6t_ipv6header ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand dm_mirror dm_multipath dm_mod snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm k8temp hwmon e100 serio_raw snd_timer e1000 mii snd soundcore forcedeth snd_page_alloc i2c_nforce2 i2c_core button sr_mod cdrom sg pata_amd ata_generic sata_nv libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 1863, comm: ip Not tainted 2.6.23-0.224.rc9.git6.fc8 #1 RIP: 0010:[<ffffffff8814bf0c>] [<ffffffff8814bf0c>] :e100:e100_poll+0x2ef/0x34d RSP: 0018:ffffffff81550ed8 EFLAGS: 00010046 RAX: 0000000000000046 RBX: 0000000000000246 RCX: 0000000000000000 RDX: ffff810079322000 RSI: 000000000000023e RDI: ffff8100796bca80 RBP: ffff8100796bc980 R08: ffff8100796bc000 R09: 0000000000000000 R10: ffffffff81594800 R11: 00000001810403a0 R12: 0000000000000000 R13: ffff8100796bc000 R14: ffff810078a2c012 R15: 0000000000000000 FS: 00002aaaaaac4b00(0000) GS:ffffffff813dd000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000370aeccc00 CR3: 00000000798e4000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ip (pid: 1863, threadinfo ffff810079b0e000, task ffff810079322000) Stack: ffff8100793227d8 0000000000000000 ffffffff81550f54 00000010810542ff 0000000000000000 ffff810079318000 00ffffff811fdbea ffff8100796bc000 0000000000000000 ffff810003d8c058 ffff810003d8c000 00000000fffbd67f Call Trace: <IRQ> [<ffffffff811fdc3d>] net_rx_action+0xad/0x1bb [<ffffffff8103d345>] __do_softirq+0x5e/0xe0 [<ffffffff8100cdfc>] call_softirq+0x1c/0x28 <EOI> [<ffffffff811fd3d0>] dev_open+0x4c/0x6e [<ffffffff8100e0cb>] do_softirq+0x35/0xa0 [<ffffffff8103d22a>] local_bh_enable_ip+0xc9/0xf4 [<ffffffff811fd3d0>] dev_open+0x4c/0x6e [<ffffffff811fb3d1>] dev_change_flags+0xaa/0x168 [<ffffffff81241865>] devinet_ioctl+0x235/0x597 [<ffffffff811f04d6>] sock_ioctl+0x1c8/0x1e5 [<ffffffff810af4b5>] do_ioctl+0x21/0x6b [<ffffffff810af74c>] vfs_ioctl+0x24d/0x266 [<ffffffff810af7be>] sys_ioctl+0x59/0x7b [<ffffffff8100bbfe>] system_call+0x7e/0x83 Code: 0f 0b eb fe 49 8d bd 00 02 00 00 e8 d4 2c fe f8 f0 41 0f ba RIP [<ffffffff8814bf0c>] :e100:e100_poll+0x2ef/0x34d RSP <ffffffff81550ed8> Kernel panic - not syncing: Aiee, killing interrupt handler! Still present in 2.6.23-6. Still only occurs on warm boot. kernel BUG at include/linux/netdevice.h:1008! invalid opcode: 0000 [1] SMP CPU 0 Modules linked in: ipt_REJECT ipt_LOG ipt_recent nf_conntrack_ipv4 iptable_filter ip_tables nf_conntrack_ftp nf_conntrack_ipv6 xt_state nf_con ntrack nfnetlink xt_tcpudp ip6t_ipv6header ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand dm_mirror dm_multipath dm_mod snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm k8temp snd_time r hwmon snd soundcore e1000 forcedeth e100 snd_page_alloc mii button i2c_nforce2 i2c_core sg sr_mod cdrom pata_amd ata_generic sata_nv libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 1880, comm: ip Not tainted 2.6.23-6.fc8 #1 RIP: 0010:[<ffffffff880f7f2c>] [<ffffffff880f7f2c>] :e100:e100_poll+0x2ef/0x34d RSP: 0018:ffffffff81550ed8 EFLAGS: 00010046 RAX: 0000000000000016 RBX: 0000000000000246 RCX: 0000000000000000 RDX: ffff81010ee9c000 RSI: 00000000000005d6 RDI: ffff810117e94a80 RBP: ffff810117e94980 R08: ffff810117e94000 R09: ffffffff81557000 R10: ffffffff81594800 R11: 0000000000000000 R12: 0000000000000000 R13: ffff810117e94000 R14: ffff810115ce6012 R15: 0000000000000000 FS: 00002aaaaaac4b00(0000) GS:ffffffff813de000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000916d50 CR3: 0000000118d29000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ip (pid: 1880, threadinfo ffff81011762e000, task ffff81010ee9c000) Stack: ffff81010ee9c7d8 0000000000000000 ffffffff81550f54 00000010810545cf 0000000000000000 ffff810117554000 00ffffff811fdf4e ffff810117e94000 0000000000000000 ffff810001067058 ffff810001067000 000000010003a94f Call Trace: <IRQ> [<ffffffff811fdfa1>] net_rx_action+0xad/0x1bb [<ffffffff8103d345>] __do_softirq+0x5e/0xe0 [<ffffffff8100cdfc>] call_softirq+0x1c/0x28 <EOI> [<ffffffff811fd734>] dev_open+0x4c/0x6e [<ffffffff8100e0cb>] do_softirq+0x35/0xa0 [<ffffffff8103d22a>] local_bh_enable_ip+0xc9/0xf4 [<ffffffff811fd734>] dev_open+0x4c/0x6e [<ffffffff811fb735>] dev_change_flags+0xaa/0x168 [<ffffffff81241bdd>] devinet_ioctl+0x235/0x597 [<ffffffff811f083a>] sock_ioctl+0x1c8/0x1e5 [<ffffffff810af7ed>] do_ioctl+0x21/0x6b [<ffffffff810afa84>] vfs_ioctl+0x24d/0x266 [<ffffffff810afaf6>] sys_ioctl+0x59/0x7b [<ffffffff8100bbfe>] system_call+0x7e/0x83 Code: 0f 0b eb fe 49 8d bd 00 02 00 00 e8 e4 6f 03 f9 f0 41 0f ba RIP [<ffffffff880f7f2c>] :e100:e100_poll+0x2ef/0x34d RSP <ffffffff81550ed8> Kernel panic - not syncing: Aiee, killing interrupt handler! Created attachment 233621 [details]
patch fixes problem with e100 IRQ sharing
We observed the same panic on a Dell Dimension 5150 (E510), although not
limited to warm boots. We noticed that the following trace is possible:
- when starting the interface, e100_up() gets called
- it calls e100_hw_init(), which disables e100 IRQ generation
(e100_disable_irq())
- it registers the interrupt handler
- the interrupt handler (e100_intr()) gets called - this happens because the
IRQ line is shared with another device (in this case, the SATA controller)
- the interrupt handler examines the stat_ack register of the interface: even
though interrupts are disabled, an event is indicated and the interrupt handler
proceeds
- the interrupt handler calls netif_rx_schedule_prep(), which sets the
__LINK_STATE_RX_SCHED bit, and __netif_rx_schedule(), which adds the interface
to the poll list
- when the interrupt handler returns, e100_up() calls netif_poll_enable(), thus
clearing the __LINK_STATE_RX_SCHED bit
- now the NET RX softirq (net_rx_action) calls e100_poll(), which in turn calls
netif_rx_complete()
- netif_rx_complete() checks whether the __LINK_STATE_RX_SCHED bit is set and
triggers the panic
To avoid this situation, where the interrupt handler executes although e100
interrupts are disabled, we suggest the attached patch. It lets the interrupt
handler check the interrupt mask bit before proceeding with the interrupt
handling.
*** Bug 340191 has been marked as a duplicate of this bug. *** The current rawhide kernel seems to have a patch for this issue - does it work now? Yes indeed. Working since -26 |