Bug 321521 - warm boot e100 x86_64 fc8 panic
warm boot e100 x86_64 fc8 panic
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
8
x86_64 Linux
low Severity urgent
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
: 340191 (view as bug list)
Depends On:
Blocks: F8KernelBlocker
  Show dependency treegraph
 
Reported: 2007-10-06 14:44 EDT by Doug Maxey
Modified: 2007-11-30 17:12 EST (History)
5 users (show)

See Also:
Fixed In Version: kernel-2.6.23.1-26.fc8
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-24 15:06:18 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
photo of panic (126.37 KB, image/jpeg)
2007-10-06 21:03 EDT, François Kooman
no flags Details
dmesg / lspci output (22.69 KB, application/x-download)
2007-10-06 21:03 EDT, François Kooman
no flags Details
kernel-2.6.23-0.220.rc9.git2.fc8 picture of kernel panic (169.21 KB, image/jpeg)
2007-10-08 16:10 EDT, François Kooman
no flags Details
patch fixes problem with e100 IRQ sharing (907 bytes, patch)
2007-10-21 11:03 EDT, Christof Efkemann
no flags Details | Diff

  None (edit)
Description Doug Maxey 2007-10-06 14:44:31 EDT
Description of problem:
Reboot after update of f7.92 panics.

Version-Release number of selected component (if applicable):
2.6.23-0.217.rc9.git1.fc8

How reproducible:
100% on warm reboot.

cold start seems to fix

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
kernel BUG at include/linux/netdevice.h:1008!
invalid opcode: 0000 [1] SMP 
CPU 1 
Modules linked in: nf_conntrack_ftp nf_conntrack_netbios_ns ipt_REJECT ipt_LOG
ipt_recent nf_conntrack_ipv4 iptable_filter ip_tables nf_conntr
ack_ipv6 xt_state nf_conntrack nfnetlink xt_tcpudp ip6t_REJECT ip6table_filter
ip6_tables x_tables ipv6 cpufreq_ondemand dm_mirror dm_multipat
h dm_mod snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss e1000 serio_raw e100 k8te
mp hwmon mii snd_pcm snd_timer snd soundcore i2c_nforce2 forcedeth
snd_page_alloc i2c_core button sr_mod sg cdrom pata_amd ata_generic sata_nv
 libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd
Pid: 1864, comm: ip Not tainted 2.6.23-0.217.rc9.git1.fc8 #1
RIP: 0010:[<ffffffff88146f0c>]  [<ffffffff88146f0c>] :e100:e100_poll+0x2ef/0x34d
RSP: 0018:ffff81007fc27ed8  EFLAGS: 00010046
RAX: 0000000000000056 RBX: 0000000000000246 RCX: 0000000000000000
RDX: ffff810078e2c000 RSI: 000000000000066f RDI: ffff810079f8ca80
RBP: ffff810079f8c980 R08: ffff810079f8c000 R09: ffff81007fc2a000
R10: ffff81007fc10000 R11: 0000000000000001 R12: 0000000000000000
R13: ffff810079f8c000 R14: ffff81007a7ee012 R15: 0000000000000000
FS:  00002aaaab232b00(0000) GS:ffff810037cd4578(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaaafa8c00 CR3: 00000000784f5000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ip (pid: 1864, threadinfo ffff810078966000, task ffff810078e2c000)
Stack:  ffff810078e2c7d8 0000000000000000 ffff81007fc27f54 00000010810542fb
 0000000000000000 ffff8100784d4000 00ffffff811fcb3e ffff810079f8c000
 0000000000000000 ffff810003df7058 ffff810003df7000 00000000fffbd79d
Call Trace:
 <IRQ>  [<ffffffff811fcb91>] net_rx_action+0xad/0x1bb
 [<ffffffff8103d33e>] __do_softirq+0x4b/0xe0
 [<ffffffff8103d351>] __do_softirq+0x5e/0xe0
 [<ffffffff8100cdfc>] call_softirq+0x1c/0x28
 <EOI>  [<ffffffff811fc324>] dev_open+0x4c/0x6e
 [<ffffffff8100e0cb>] do_softirq+0x35/0xa0
 [<ffffffff8103d236>] local_bh_enable_ip+0xc9/0xf4
 [<ffffffff811fc324>] dev_open+0x4c/0x6e
 [<ffffffff811fa325>] dev_change_flags+0xaa/0x168
 [<ffffffff812407b1>] devinet_ioctl+0x235/0x597
 [<ffffffff811ef42a>] sock_ioctl+0x1c8/0x1e5
 [<ffffffff810af4b9>] do_ioctl+0x21/0x6b
 [<ffffffff810af750>] vfs_ioctl+0x24d/0x266
 [<ffffffff810af7c2>] sys_ioctl+0x59/0x7b
 [<ffffffff8100bbfe>] system_call+0x7e/0x83

Code: 0f 0b eb fe 49 8d bd 00 02 00 00 e8 d8 7c fe f8 f0 41 0f ba 
RIP  [<ffffffff88146f0c>] :e100:e100_poll+0x2ef/0x34d
 RSP <ffff81007fc27ed8>
Kernel panic - not syncing: Aiee, killing interrupt handler!
Comment 1 François Kooman 2007-10-06 21:01:04 EDT
Also some e100 problems here on my system (see attachments). 

The panic happens on the moment the init scripts start the network
(/etc/init.d/network start). I can boot in single mode (using grub) and then use
dhclient to get the network card to work (no panic!). Using the command
/etc/init.d/network start panics the kernel.

Cold boot doesn't fix it...
Comment 2 François Kooman 2007-10-06 21:03:11 EDT
Created attachment 218531 [details]
photo of panic
Comment 3 François Kooman 2007-10-06 21:03:34 EDT
Created attachment 218541 [details]
dmesg / lspci output
Comment 4 Chuck Ebbert 2007-10-08 15:16:27 EDT
(In reply to comment #2)
> Created an attachment (id=218531) [edit]
> photo of panic
> 

Can you reproduce with kernel option "vga=1" (50-line mode) and take a picture
of that?
Comment 5 François Kooman 2007-10-08 16:08:53 EDT
Hm, not with vga=1, because even in single user mode the resolution switches
back. I used vga=0x317, that worked.

It seems fixed in kernel-2.6.23-0.222.rc9.git4.fc8 though (wiee!), I'll attach
the picture I made with kernel-2.6.23-0.220.rc9.git2.fc8 anyway...
Comment 6 François Kooman 2007-10-08 16:10:26 EDT
Created attachment 220121 [details]
kernel-2.6.23-0.220.rc9.git2.fc8 picture of kernel panic
Comment 7 Chuck Ebbert 2007-10-08 16:17:36 EDT
line 1008:
        BUG_ON(!test_bit(__LINK_STATE_RX_SCHED, &dev->state));
Comment 8 Doug Maxey 2007-10-08 22:08:11 EDT
Happened again with .222, but a little different signature..  Oh, I see, more
traceback.  :)

kernel BUG at lib/list_debug.c:33!
invalid opcode: 0000 [1] SMP 
CPU 1 
Modules linked in: ipt_REJECT ipt_LOG ipt_recent nf_conntrack_ipv4
iptable_filter ip_tables nf_conntrack_ftp nf_conntrack_ipv6 xt_state
nf_conntrack nfnetlink xt_tcpudp ip6t_ipv6header ip6t_REJECT ip6table_filter
ip6_tables x_tables ipv6 cpufreq_ondemand dm_mirror dm_multipath dm_mod
snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss k8temp snd_pcm e100 e1000 serio_raw
hwmon mii snd_timer snd soundcore snd_page_alloc forcedeth i2c_nforce2 i2c_core
button sg sr_mod cdrom pata_amd ata_generic sata_nv libata sd_mod scsi_mod ext3
jbd mbcache uhci_hcd ohci_hcd ehci_hcd
Pid: 1852, comm: ip Not tainted 2.6.23-0.222.rc9.git4.fc8 #1
RIP: 0010:[<ffffffff8112ec86>]  [<ffffffff8112ec86>] __list_add+0x47/0x5b
RSP: 0018:ffff81007fc2bdd8  EFLAGS: 00010086
RAX: 0000000000000079 RBX: ffff810079980000 RCX: 0000000000006fce
RDX: ffff8100791f2000 RSI: ffffffff815740d9 RDI: 0000000000000000
RBP: 0000000000000096 R08: 0000000000000002 R09: ffffffff81038190
R10: ffffffff81038190 R11: 00000001810403a0 R12: ffff810079980980
R13: 0000000000000011 R14: 00000000fffbd77b R15: ffff81007996cdc0
FS:  00002aaaaaac4b00(0000) GS:ffff810003e17578(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000370aeccc00 CR3: 0000000078c16000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ip (pid: 1852, threadinfo ffff81007941a000, task ffff8100791f2000)
Stack:  ffff810079980ab8 ffffffff811fa7f2 0000000000000000 0000000000000000
 ffff810079980000 ffffffff8814e835 ffff810079d063b8 0000000000000000
 0000000000000000 ffffffff81070be9 0000000000000011 ffffffff813e4680
Call Trace:
 <IRQ>  [<ffffffff811fa7f2>] __netif_rx_schedule+0x3d/0x81
 [<ffffffff8814e835>] :e100:e100_intr+0x9c/0xa8
 [<ffffffff81070be9>] handle_IRQ_event+0x1e/0x51
 [<ffffffff81071f2a>] handle_fasteoi_irq+0x9a/0xd9
 [<ffffffff8100e227>] do_IRQ+0xf1/0x162
 [<ffffffff8100c146>] ret_from_intr+0x0/0xf
 [<ffffffff8814de3d>] :e100:e100_enable_irq+0x17/0x48
 [<ffffffff8814fc1d>] :e100:e100_poll+0x0/0x34d
 [<ffffffff811fdc75>] net_rx_action+0xad/0x1bb
 [<ffffffff8103d345>] __do_softirq+0x5e/0xe0
 [<ffffffff8100cdfc>] call_softirq+0x1c/0x28
 <EOI>  [<ffffffff811fd408>] dev_open+0x4c/0x6e
 [<ffffffff8100e0cb>] do_softirq+0x35/0xa0
 [<ffffffff8103d22a>] local_bh_enable_ip+0xc9/0xf4
 [<ffffffff811fd408>] dev_open+0x4c/0x6e
 [<ffffffff811fb409>] dev_change_flags+0xaa/0x168
 [<ffffffff8124189d>] devinet_ioctl+0x235/0x597
 [<ffffffff811f050e>] sock_ioctl+0x1c8/0x1e5
 [<ffffffff810af4a5>] do_ioctl+0x21/0x6b
 [<ffffffff810af73c>] vfs_ioctl+0x24d/0x266
 [<ffffffff810af7ae>] sys_ioctl+0x59/0x7b
 [<ffffffff8100bbfe>] system_call+0x7e/0x83


Code: 0f 0b eb fe 48 89 7e 08 48 89 37 48 89 57 08 48 89 3a 5a c3 
RIP  [<ffffffff8112ec86>] __list_add+0x47/0x5b
 RSP <ffff81007fc2bdd8>
Kernel panic - not syncing: Aiee, killing interrupt handler!


Comment 9 Doug Maxey 2007-10-08 22:11:42 EDT
On powerpc, we can get into the debugger by enabling xmon=on.  Other than
enabling and compiling kgdb, anything like that on x64?
Comment 10 Doug Maxey 2007-10-08 22:14:28 EDT
From the serial port, that is...
Comment 11 Doug Maxey 2007-10-08 22:15:27 EDT
And once again, cold boot was fine.
Comment 12 Doug Maxey 2007-10-09 10:10:03 EDT
And again on .224, back to the original traceback


kernel BUG at include/linux/netdevice.h:1008!
invalid opcode: 0000 [1] SMP 
CPU 0 
Modules linked in: ipt_REJECT ipt_LOG ipt_recent nf_conntrack_ipv4
iptable_filter ip_tables nf_conntrack_ftp nf_conntrack_ipv6 xt_state
nf_conntrack nfnetlink xt_tcpudp ip6t_ipv6header ip6t_REJECT ip6table_filter
ip6_tables x_tables ipv6 cpufreq_ondemand dm_mirror dm_multipath dm_mod
snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm k8temp hwmon e100 serio_raw
snd_timer e1000 mii snd soundcore forcedeth snd_page_alloc i2c_nforce2 i2c_core
button sr_mod cdrom sg pata_amd ata_generic sata_nv libata sd_mod scsi_mod ext3
jbd mbcache uhci_hcd ohci_hcd ehci_hcd
Pid: 1863, comm: ip Not tainted 2.6.23-0.224.rc9.git6.fc8 #1
RIP: 0010:[<ffffffff8814bf0c>]  [<ffffffff8814bf0c>] :e100:e100_poll+0x2ef/0x34d
RSP: 0018:ffffffff81550ed8  EFLAGS: 00010046
RAX: 0000000000000046 RBX: 0000000000000246 RCX: 0000000000000000
RDX: ffff810079322000 RSI: 000000000000023e RDI: ffff8100796bca80
RBP: ffff8100796bc980 R08: ffff8100796bc000 R09: 0000000000000000
R10: ffffffff81594800 R11: 00000001810403a0 R12: 0000000000000000
R13: ffff8100796bc000 R14: ffff810078a2c012 R15: 0000000000000000
FS:  00002aaaaaac4b00(0000) GS:ffffffff813dd000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000370aeccc00 CR3: 00000000798e4000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ip (pid: 1863, threadinfo ffff810079b0e000, task ffff810079322000)
Stack:  ffff8100793227d8 0000000000000000 ffffffff81550f54 00000010810542ff
 0000000000000000 ffff810079318000 00ffffff811fdbea ffff8100796bc000
 0000000000000000 ffff810003d8c058 ffff810003d8c000 00000000fffbd67f
Call Trace:
 <IRQ>  [<ffffffff811fdc3d>] net_rx_action+0xad/0x1bb
 [<ffffffff8103d345>] __do_softirq+0x5e/0xe0
 [<ffffffff8100cdfc>] call_softirq+0x1c/0x28
 <EOI>  [<ffffffff811fd3d0>] dev_open+0x4c/0x6e
 [<ffffffff8100e0cb>] do_softirq+0x35/0xa0
 [<ffffffff8103d22a>] local_bh_enable_ip+0xc9/0xf4
 [<ffffffff811fd3d0>] dev_open+0x4c/0x6e
 [<ffffffff811fb3d1>] dev_change_flags+0xaa/0x168
 [<ffffffff81241865>] devinet_ioctl+0x235/0x597
 [<ffffffff811f04d6>] sock_ioctl+0x1c8/0x1e5
 [<ffffffff810af4b5>] do_ioctl+0x21/0x6b
 [<ffffffff810af74c>] vfs_ioctl+0x24d/0x266
 [<ffffffff810af7be>] sys_ioctl+0x59/0x7b
 [<ffffffff8100bbfe>] system_call+0x7e/0x83


Code: 0f 0b eb fe 49 8d bd 00 02 00 00 e8 d4 2c fe f8 f0 41 0f ba 
RIP  [<ffffffff8814bf0c>] :e100:e100_poll+0x2ef/0x34d
 RSP <ffffffff81550ed8>
Kernel panic - not syncing: Aiee, killing interrupt handler!


Comment 13 Doug Maxey 2007-10-14 15:04:57 EDT
Still present in 2.6.23-6.  Still only occurs on warm boot.

kernel BUG at include/linux/netdevice.h:1008!

invalid opcode: 0000 [1] SMP 

CPU 0 

Modules linked in: ipt_REJECT ipt_LOG ipt_recent nf_conntrack_ipv4
iptable_filter ip_tables nf_conntrack_ftp nf_conntrack_ipv6 xt_state nf_con

ntrack nfnetlink xt_tcpudp ip6t_ipv6header ip6t_REJECT ip6table_filter
ip6_tables x_tables ipv6 cpufreq_ondemand dm_mirror dm_multipath dm_mod

 snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm k8temp snd_time

r hwmon snd soundcore e1000 forcedeth e100 snd_page_alloc mii button i2c_nforce2
i2c_core sg sr_mod cdrom pata_amd ata_generic sata_nv libata 

sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd

Pid: 1880, comm: ip Not tainted 2.6.23-6.fc8 #1

RIP: 0010:[<ffffffff880f7f2c>]  [<ffffffff880f7f2c>] :e100:e100_poll+0x2ef/0x34d

RSP: 0018:ffffffff81550ed8  EFLAGS: 00010046

RAX: 0000000000000016 RBX: 0000000000000246 RCX: 0000000000000000

RDX: ffff81010ee9c000 RSI: 00000000000005d6 RDI: ffff810117e94a80

RBP: ffff810117e94980 R08: ffff810117e94000 R09: ffffffff81557000

R10: ffffffff81594800 R11: 0000000000000000 R12: 0000000000000000

R13: ffff810117e94000 R14: ffff810115ce6012 R15: 0000000000000000

FS:  00002aaaaaac4b00(0000) GS:ffffffff813de000(0000) knlGS:0000000000000000

CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b

CR2: 0000000000916d50 CR3: 0000000118d29000 CR4: 00000000000006e0

DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Process ip (pid: 1880, threadinfo ffff81011762e000, task ffff81010ee9c000)

Stack:  ffff81010ee9c7d8 0000000000000000 ffffffff81550f54 00000010810545cf

 0000000000000000 ffff810117554000 00ffffff811fdf4e ffff810117e94000

 0000000000000000 ffff810001067058 ffff810001067000 000000010003a94f

Call Trace:

 <IRQ>  [<ffffffff811fdfa1>] net_rx_action+0xad/0x1bb

 [<ffffffff8103d345>] __do_softirq+0x5e/0xe0

 [<ffffffff8100cdfc>] call_softirq+0x1c/0x28

 <EOI>  [<ffffffff811fd734>] dev_open+0x4c/0x6e

 [<ffffffff8100e0cb>] do_softirq+0x35/0xa0

 [<ffffffff8103d22a>] local_bh_enable_ip+0xc9/0xf4

 [<ffffffff811fd734>] dev_open+0x4c/0x6e

 [<ffffffff811fb735>] dev_change_flags+0xaa/0x168

 [<ffffffff81241bdd>] devinet_ioctl+0x235/0x597

 [<ffffffff811f083a>] sock_ioctl+0x1c8/0x1e5

 [<ffffffff810af7ed>] do_ioctl+0x21/0x6b

 [<ffffffff810afa84>] vfs_ioctl+0x24d/0x266

 [<ffffffff810afaf6>] sys_ioctl+0x59/0x7b

 [<ffffffff8100bbfe>] system_call+0x7e/0x83



Code: 0f 0b eb fe 49 8d bd 00 02 00 00 e8 e4 6f 03 f9 f0 41 0f ba 

RIP  [<ffffffff880f7f2c>] :e100:e100_poll+0x2ef/0x34d

 RSP <ffffffff81550ed8>

Kernel panic - not syncing: Aiee, killing interrupt handler!

Comment 14 Christof Efkemann 2007-10-21 11:03:26 EDT
Created attachment 233621 [details]
patch fixes problem with e100 IRQ sharing

We observed the same panic on a Dell Dimension 5150 (E510), although not
limited to warm boots.	We noticed that the following trace is possible:

- when starting the interface, e100_up() gets called
- it calls e100_hw_init(), which disables e100 IRQ generation
(e100_disable_irq())
- it registers the interrupt handler
- the interrupt handler (e100_intr()) gets called - this happens because the
IRQ line is shared with another device (in this case, the SATA controller)
- the interrupt handler examines the stat_ack register of the interface: even
though interrupts are disabled, an event is indicated and the interrupt handler
proceeds
- the interrupt handler calls netif_rx_schedule_prep(), which sets the
__LINK_STATE_RX_SCHED bit, and __netif_rx_schedule(), which adds the interface
to the poll list
- when the interrupt handler returns, e100_up() calls netif_poll_enable(), thus
clearing the __LINK_STATE_RX_SCHED bit
- now the NET RX softirq (net_rx_action) calls e100_poll(), which in turn calls
netif_rx_complete()
- netif_rx_complete() checks whether the __LINK_STATE_RX_SCHED bit is set and
triggers the panic

To avoid this situation, where the interrupt handler executes although e100
interrupts are disabled, we suggest the attached patch.  It lets the interrupt
handler check the interrupt mask bit before proceeding with the interrupt
handling.
Comment 15 Chuck Ebbert 2007-10-21 11:34:54 EDT
*** Bug 340191 has been marked as a duplicate of this bug. ***
Comment 16 Will Woods 2007-10-24 12:51:00 EDT
The current rawhide kernel seems to have a patch for this issue - does it work now?
Comment 17 Doug Maxey 2007-10-24 14:54:15 EDT
Yes indeed.  Working since -26

Note You need to log in before you can comment on or make changes to this bug.