Bug 455691
Summary: | [5.2] kernel-debug panic | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Qian Cai <qcai> | ||||
Component: | kernel | Assignee: | Jiri Pirko <jpirko> | ||||
Status: | CLOSED NOTABUG | QA Contact: | Martin Jenner <mjenner> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 5.2.z | CC: | agospoda, bruce.w.allan, jesse.brandeburg, john.ronciak, rkhan, vmayatsk | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2008-08-12 14:01:12 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Qian Cai
2008-07-17 05:56:44 UTC
Created attachment 312009 [details]
full boot log
Looks like I found a way to reproduce it. Boot a 32bit kernel-debug on a x86_64 machine first, it will hang at, ... Write protecting the kernel read-only data: 412k request_module: runaway loop modprobe binfmt-464c request_module: runaway loop modprobe binfmt-464c request_module: runaway loop modprobe binfmt-464c request_module: runaway loop modprobe binfmt-464c request_module: runaway loop modprobe binfmt-464c ... Then, remove 32bit kernel-debug and install a 64bit one, and then it finally failed in udev, + /sbin/udevsettle Unable to handle kernel paging request at ffffffff8405e2d9 RIP: [<ffffffff8405e2d9>] PGD 203067 PUD 205063 PMD 0 Oops: 0010 [1] SMP last sysfs file: /class/net/lo/type CPU 1 Modules linked in: floppy i6300esb pcspkr sg shpchp ide_cd cdrom i2c_i801 i2c_core e1000 serio_raw dm_snapshot dm_zero dm_mirror dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 1106, comm: pam_console_app Tainted: G M 2.6.18-92.1.6.el5debug #1 RIP: 0010:[<ffffffff8405e2d9>] [<ffffffff8405e2d9>] RSP: 0000:ffff810037c13ef8 EFLAGS: 00010282 RAX: ffff81003db41eb0 RBX: 0000000000000003 RCX: 0000000000000000 RDX: 000000000000000e RSI: ffff810001000038 RDI: ffff81003db41eb0 RBP: ffff810002600d20 R08: ffff810037c8a6b0 R09: 0000000000000000 R10: 0000000000000002 R11: 0000000000000158 R12: ffff81003cf9b6a0 R13: 0000000000000003 R14: 0000000000000014 R15: ffff81003cb7e540 FS: 00002b223bc063e0(0000) GS:ffff810037cdd300(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff8405e2d9 CR3: 000000003c88d000 CR4: 00000000000006e0 Process pam_console_app (pid: 1106, threadinfo ffff81003cc20000, task ffff81003cb7e540) Stack: ffffffff800a0a96 0000000000000282 ffff810002600e20 0000000000000000 0000000000000008 0000000000000001 ffffffff800a0b4e ffffffff80425fd0 ffffffff800971d2 ffffffff80425f90 0000000000000001 ffffffff80425fd0 Call Trace: <IRQ> [<ffffffff800a0a96>] __rcu_process_callbacks+0x140/0x1d5 [<ffffffff800a0b4e>] rcu_process_callbacks+0x23/0x43 [<ffffffff800971d2>] tasklet_action+0x6c/0xc3 [<ffffffff80012a0e>] __do_softirq+0x67/0xf3 [<ffffffff800613d0>] call_softirq+0x1c/0x28 [<ffffffff8007066a>] do_softirq+0x35/0xa0 [<ffffffff80060d3b>] apic_timer_interrupt+0x6b/0x70 <EOI> [<ffffffff8006a77f>] do_page_fault+0x424/0x835 [<ffffffff8000c021>] find_vma+0x23/0x5d [<ffffffff8006a7b1>] do_page_fault+0x456/0x835 [<ffffffff80028908>] do_filp_open+0x1c/0x38 [<ffffffff80013274>] poison_obj+0x26/0x2f [<ffffffff80032ff3>] cache_free_debugcheck+0x208/0x217 [<ffffffff80060e9d>] error_exit+0x0/0x96 Code: Bad RIP value. RIP [<ffffffff8405e2d9>] RSP <ffff810037c13ef8> CR2: ffffffff8405e2d9 <0>Kernel panic - not syncing: Fatal exception <1>Unable to handle kernel NULL pointer dereference at 0000000000000005 RIP: [<0000000000000005>] PGD 3fd90067 PUD 3d798067 PMD 0 Oops: 0010 [2] SMP last sysfs file: /class/net/lo/type CPU 1 Modules linked in: floppy i6300esb pcspkr sg shpchp ide_cd cdrom i2c_i801 i2c_core e1000 serio_raw dm_snapshot dm_zero dm_mirror dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 1106, comm: pam_console_app Tainted: G M 2.6.18-92.1.6.el5debug #1 RIP: 0010:[<0000000000000005>] [<0000000000000005>] RSP: 0000:ffff810037c13b90 EFLAGS: 00010002 RAX: ffff81003cb7e540 RBX: 00000000000000bf RCX: ffff810037c13e48 RDX: 0000000000000452 RSI: 00000000000000bf RDI: 00000000000000bf RBP: ffff810037c13bc0 R08: 0000000000000001 R09: 000000000000003f R10: ffff81003d5d4188 R11: ffffffff8016968b R12: 0000000000000005 R13: 00000000ffffff03 R14: 0000000000000010 R15: ffff81003cb7e540 FS: 00002b223bc063e0(0000) GS:ffff810037cdd300(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000005 CR3: 000000003c88d000 CR4: 00000000000006e0 Process pam_console_app (pid: 1106, threadinfo ffff81003cc20000, task ffff81003cb7e540) Stack: ffffffff8002426f ffffffff80299c38 ffffffff8405e2d9 ffff810037c13e48 ffffffff80060ccb ffff810037c13bc0 ffff81003d5d4188 ffffffff8016968b ffff81003d5d4188 000000000000003f 0000000000000001 00000000000000ff Call Trace: <IRQ> [<ffffffff8002426f>] smp_call_function_interrupt+0x66/0x84 [<ffffffff80060ccb>] call_function_interrupt+0x6b/0x70 [<ffffffff8016968b>] vgacon_cursor+0x0/0x1a5 [<ffffffff80079800>] smp_send_stop+0x62/0x64 [<ffffffff800797e9>] smp_send_stop+0x4b/0x64 [<ffffffff80093b07>] panic+0x94/0x1f0 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff8006bc38>] bad_gs+0x26/0x2089 [<ffffffff80068eed>] oops_end+0x64/0x66 [<ffffffff8006aac6>] do_page_fault+0x76b/0x835 [<ffffffff800683ef>] _spin_unlock+0x17/0x20 [<ffffffff80150451>] __next_cpu+0x19/0x28 [<ffffffff8008de33>] find_busiest_group+0x20d/0x621 [<ffffffff80060e9d>] error_exit+0x0/0x96 [<ffffffff800a0a96>] __rcu_process_callbacks+0x140/0x1d5 [<ffffffff800a0b4e>] rcu_process_callbacks+0x23/0x43 [<ffffffff800971d2>] tasklet_action+0x6c/0xc3 [<ffffffff80012a0e>] __do_softirq+0x67/0xf3 [<ffffffff800613d0>] call_softirq+0x1c/0x28 [<ffffffff8007066a>] do_softirq+0x35/0xa0 [<ffffffff80060d3b>] apic_timer_interrupt+0x6b/0x70 <EOI> [<ffffffff8006a77f>] do_page_fault+0x424/0x835 [<ffffffff8000c021>] find_vma+0x23/0x5d [<ffffffff8006a7b1>] do_page_fault+0x456/0x835 [<ffffffff80028908>] do_filp_open+0x1c/0x38 [<ffffffff80013274>] poison_obj+0x26/0x2f [<ffffffff80032ff3>] cache_free_debugcheck+0x208/0x217 [<ffffffff80060e9d>] error_exit+0x0/0x96 Code: Bad RIP value. RIP [<0000000000000005>] RSP <ffff810037c13b90> CR2: 0000000000000005 <0>Kernel panic - not syncing: Fatal exception BUG: warning at kernel/panic.c:137/panic() (Tainted: G M ) Call Trace: <IRQ> [<ffffffff80093c52>] panic+0x1df/0x1f0 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff80068eed>] oops_end+0x64/0x66 [<ffffffff8006aac6>] do_page_fault+0x76b/0x835 [<ffffffff801aa752>] vt_console_print+0x213/0x229 [<ffffffff80060e9d>] error_exit+0x0/0x96 [<ffffffff8016968b>] vgacon_cursor+0x0/0x1a5 [<ffffffff8002426f>] smp_call_function_interrupt+0x66/0x84 [<ffffffff80060ccb>] call_function_interrupt+0x6b/0x70 [<ffffffff8016968b>] vgacon_cursor+0x0/0x1a5 [<ffffffff80079800>] smp_send_stop+0x62/0x64 [<ffffffff800797e9>] smp_send_stop+0x4b/0x64 [<ffffffff80093b07>] panic+0x94/0x1f0 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff8006bc38>] bad_gs+0x26/0x2089 [<ffffffff80068eed>] oops_end+0x64/0x66 [<ffffffff8006aac6>] do_page_fault+0x76b/0x835 [<ffffffff800683ef>] _spin_unlock+0x17/0x20 [<ffffffff80150451>] __next_cpu+0x19/0x28 [<ffffffff8008de33>] find_busiest_group+0x20d/0x621 [<ffffffff80060e9d>] error_exit+0x0/0x96 [<ffffffff800a0a96>] __rcu_process_callbacks+0x140/0x1d5 [<ffffffff800a0b4e>] rcu_process_callbacks+0x23/0x43 [<ffffffff800971d2>] tasklet_action+0x6c/0xc3 [<ffffffff80012a0e>] __do_softirq+0x67/0xf3 [<ffffffff800613d0>] call_softirq+0x1c/0x28 [<ffffffff8007066a>] do_softirq+0x35/0xa0 [<ffffffff80060d3b>] apic_timer_interrupt+0x6b/0x70 <EOI> [<ffffffff8006a77f>] do_page_fault+0x424/0x835 [<ffffffff8000c021>] find_vma+0x23/0x5d [<ffffffff8006a7b1>] do_page_fault+0x456/0x835 [<ffffffff80028908>] do_filp_open+0x1c/0x38 [<ffffffff80013274>] poison_obj+0x26/0x2f [<ffffffff80032ff3>] cache_free_debugcheck+0x208/0x217 [<ffffffff80060e9d>] error_exit+0x0/0x96 BUG: warning at drivers/input/serio/i8042.c:846/i8042_panic_blink() (Tainted: G M ) Call Trace: <IRQ> [<ffffffff802015c8>] i8042_panic_blink+0x112/0x2a5 [<ffffffff80093bf8>] panic+0x185/0x1f0 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff80068eed>] oops_end+0x64/0x66 [<ffffffff8006aac6>] do_page_fault+0x76b/0x835 [<ffffffff801aa752>] vt_console_print+0x213/0x229 [<ffffffff80060e9d>] error_exit+0x0/0x96 [<ffffffff8016968b>] vgacon_cursor+0x0/0x1a5 [<ffffffff8002426f>] smp_call_function_interrupt+0x66/0x84 [<ffffffff80060ccb>] call_function_interrupt+0x6b/0x70 [<ffffffff8016968b>] vgacon_cursor+0x0/0x1a5 [<ffffffff80079800>] smp_send_stop+0x62/0x64 [<ffffffff800797e9>] smp_send_stop+0x4b/0x64 [<ffffffff80093b07>] panic+0x94/0x1f0 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff8006bc38>] bad_gs+0x26/0x2089 [<ffffffff80068eed>] oops_end+0x64/0x66 [<ffffffff8006aac6>] do_page_fault+0x76b/0x835 [<ffffffff800683ef>] _spin_unlock+0x17/0x20 [<ffffffff80150451>] __next_cpu+0x19/0x28 [<ffffffff8008de33>] find_busiest_group+0x20d/0x621 [<ffffffff80060e9d>] error_exit+0x0/0x96 [<ffffffff800a0a96>] __rcu_process_callbacks+0x140/0x1d5 [<ffffffff800a0b4e>] rcu_process_callbacks+0x23/0x43 [<ffffffff800971d2>] tasklet_action+0x6c/0xc3 [<ffffffff80012a0e>] __do_softirq+0x67/0xf3 [<ffffffff800613d0>] call_softirq+0x1c/0x28 [<ffffffff8007066a>] do_softirq+0x35/0xa0 [<ffffffff80060d3b>] apic_timer_interrupt+0x6b/0x70 <EOI> [<ffffffff8006a77f>] do_page_fault+0x424/0x835 [<ffffffff8000c021>] find_vma+0x23/0x5d [<ffffffff8006a7b1>] do_page_fault+0x456/0x835 [<ffffffff80028908>] do_filp_open+0x1c/0x38 [<ffffffff80013274>] poison_obj+0x26/0x2f [<ffffffff80032ff3>] cache_free_debugcheck+0x208/0x217 [<ffffffff80060e9d>] error_exit+0x0/0x96 BUG: warning at drivers/input/serio/i8042.c:849/i8042_panic_blink() (Tainted: G M ) Call Trace: <IRQ> [<ffffffff802016b1>] i8042_panic_blink+0x1fb/0x2a5 [<ffffffff80093bf8>] panic+0x185/0x1f0 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff80068eed>] oops_end+0x64/0x66 [<ffffffff8006aac6>] do_page_fault+0x76b/0x835 [<ffffffff801aa752>] vt_console_print+0x213/0x229 [<ffffffff80060e9d>] error_exit+0x0/0x96 [<ffffffff8016968b>] vgacon_cursor+0x0/0x1a5 [<ffffffff8002426f>] smp_call_function_interrupt+0x66/0x84 [<ffffffff80060ccb>] call_function_interrupt+0x6b/0x70 [<ffffffff8016968b>] vgacon_cursor+0x0/0x1a5 [<ffffffff80079800>] smp_send_stop+0x62/0x64 [<ffffffff800797e9>] smp_send_stop+0x4b/0x64 [<ffffffff80093b07>] panic+0x94/0x1f0 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff8006bc38>] bad_gs+0x26/0x2089 [<ffffffff80068eed>] oops_end+0x64/0x66 [<ffffffff8006aac6>] do_page_fault+0x76b/0x835 [<ffffffff800683ef>] _spin_unlock+0x17/0x20 [<ffffffff80150451>] __next_cpu+0x19/0x28 [<ffffffff8008de33>] find_busiest_group+0x20d/0x621 [<ffffffff80060e9d>] error_exit+0x0/0x96 [<ffffffff800a0a96>] __rcu_process_callbacks+0x140/0x1d5 [<ffffffff800a0b4e>] rcu_process_callbacks+0x23/0x43 [<ffffffff800971d2>] tasklet_action+0x6c/0xc3 [<ffffffff80012a0e>] __do_softirq+0x67/0xf3 [<ffffffff800613d0>] call_softirq+0x1c/0x28 [<ffffffff8007066a>] do_softirq+0x35/0xa0 [<ffffffff80060d3b>] apic_timer_interrupt+0x6b/0x70 <EOI> [<ffffffff8006a77f>] do_page_fault+0x424/0x835 [<ffffffff8000c021>] find_vma+0x23/0x5d [<ffffffff8006a7b1>] do_page_fault+0x456/0x835 [<ffffffff80028908>] do_filp_open+0x1c/0x38 [<ffffffff80013274>] poison_obj+0x26/0x2f [<ffffffff80032ff3>] cache_free_debugcheck+0x208/0x217 [<ffffffff80060e9d>] error_exit+0x0/0x96 BUG: warning at drivers/input/serio/i8042.c:851/i8042_panic_blink() (Tainted: G M ) Call Trace: <IRQ> [<ffffffff8020172e>] i8042_panic_blink+0x278/0x2a5 [<ffffffff80093bf8>] panic+0x185/0x1f0 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff80068eed>] oops_end+0x64/0x66 [<ffffffff8006aac6>] do_page_fault+0x76b/0x835 [<ffffffff801aa752>] vt_console_print+0x213/0x229 [<ffffffff80060e9d>] error_exit+0x0/0x96 [<ffffffff8016968b>] vgacon_cursor+0x0/0x1a5 [<ffffffff8002426f>] smp_call_function_interrupt+0x66/0x84 [<ffffffff80060ccb>] call_function_interrupt+0x6b/0x70 [<ffffffff8016968b>] vgacon_cursor+0x0/0x1a5 [<ffffffff80079800>] smp_send_stop+0x62/0x64 [<ffffffff800797e9>] smp_send_stop+0x4b/0x64 [<ffffffff80093b07>] panic+0x94/0x1f0 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff80017812>] release_console_sem+0x43/0x228 [<ffffffff8006bc38>] bad_gs+0x26/0x2089 [<ffffffff80068eed>] oops_end+0x64/0x66 [<ffffffff8006aac6>] do_page_fault+0x76b/0x835 [<ffffffff800683ef>] _spin_unlock+0x17/0x20 [<ffffffff80150451>] __next_cpu+0x19/0x28 [<ffffffff8008de33>] find_busiest_group+0x20d/0x621 [<ffffffff80060e9d>] error_exit+0x0/0x96 [<ffffffff800a0a96>] __rcu_process_callbacks+0x140/0x1d5 [<ffffffff800a0b4e>] rcu_process_callbacks+0x23/0x43 [<ffffffff800971d2>] tasklet_action+0x6c/0xc3 [<ffffffff80012a0e>] __do_softirq+0x67/0xf3 [<ffffffff800613d0>] call_softirq+0x1c/0x28 [<ffffffff8007066a>] do_softirq+0x35/0xa0 [<ffffffff80060d3b>] apic_timer_interrupt+0x6b/0x70 <EOI> [<ffffffff8006a77f>] do_page_fault+0x424/0x835 [<ffffffff8000c021>] find_vma+0x23/0x5d [<ffffffff8006a7b1>] do_page_fault+0x456/0x835 [<ffffffff80028908>] do_filp_open+0x1c/0x38 [<ffffffff80013274>] poison_obj+0x26/0x2f [<ffffffff80032ff3>] cache_free_debugcheck+0x208/0x217 [<ffffffff80060e9d>] error_exit+0x0/0x96 Machine is currently reserved and will be free hopefully next week. I tried to reproduce this on ibm-palmetto.rhts.bos.redhat.com host. I used cron to reboot machine every 5 minutes. I tried kernel-debug 2.6.18-92.1.10.el5 (for about 3 hours) and 2.6.18-92.1.6.el5 (for about 16 hours) and I had no luck. I suggest that this approach may lead to mentioned oops. Or am I wrong? I got a vmcore. Although backtrace slightly different from original reported, it still failed in /etc/rc.sysinit -> start_udev -> /sbin/udevsettle http://porkchop.devel.redhat.com/qa/qa/qcai/vmcores/vmcore-455691 + ^M/sbin/udevsettle <66<1>Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP: ^M [<0000000000000028>]^M PGD 36a8b067 PUD 3618f067 PMD 0 ^M Oops: 0010 [1] SMP ^M last sysfs file: /class/usb_device/usbdev3.1/dev^M CPU 0 ^M Modules linked in: sg i2c_i801 shpchp i2c_core pcspkr i6300esb e1000 ide_cd cdrom serio_raw dm_snapshot dm_zero dm_mirror dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd^M Pid: 2078, comm: modprobe Tainted: G M 2.6.18-92.el5debug #1^M RIP: 0010:[<0000000000000028>] [<0000000000000028>]^M RSP: 0018:ffff81003458fa60 EFLAGS: 00010003^M RAX: 0000000000004208 RBX: ffff81003458fad8 RCX: 0000000000020000^M RDX: 0000000000004208 RSI: 0000000000020000 RDI: 0000000000000036^M RBP: ffffffff8029bd71 R08: 00000000ffffffff R09: 0000000000000020^M R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246^M R13: ffffffff8051e5a2 R14: 0000000000000029 R15: 0000000000000016^M FS: 00002b44650ab240(0000) GS:ffffffff8040c000(0000) knlGS:0000000000000000^M CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M CR2: 0000000000000028 CR3: 000000003d780000 CR4: 00000000000006e0^M Process modprobe (pid: 2078, threadinfo ffff81003458e000, task ffff810034fde1c0)^M Stack: 0000004488180cf2 ffffffff800cff31 ffff810037ffb178 0000000000000286^M ffffffff8000aa62 0000000000000000 000000440000e000 ffff810000353834^M 000200d000000000 0000000000000015 0000000000000015 00000000000000ef^M Call Trace:^M [<ffffffff800cff31>] zone_statistics+0x3e/0x6d^M [<ffffffff8000aa62>] get_page_from_freelist+0x232/0x450^M [<ffffffff800943b5>] printk+0x52/0xbd^M [<ffffffff8007bc14>] assign_irq_vector+0xb9/0xbf^M [<ffffffff8007c0da>] gsi_irq_sharing+0xf6/0xfd^M [<ffffffff8007d1e2>] physflat_cpu_mask_to_apicid+0x0/0x27^M [<ffffffff8007c940>] io_apic_set_pci_routing+0xff/0x261^M [<ffffffff8007cd8c>] mp_register_gsi+0x154/0x15e^M [<ffffffff80078ef2>] acpi_register_gsi+0x36/0x51^M [<ffffffff8019169d>] acpi_pci_irq_enable+0x13b/0x1e5^M [<ffffffff8015a83e>] pci_enable_device_bars+0x27/0x34^M [<ffffffff8015a869>] pci_enable_device+0x1e/0x3e^M [<ffffffff88171f3b>] :e1000:e1000_probe+0x1d/0xca3^M [<ffffffff8015c953>] pci_device_probe+0x100/0x180^M [<ffffffff801bd08e>] driver_probe_device+0x52/0xaa^M [<ffffffff801bd1ef>] __driver_attach+0x86/0xd4^M [<ffffffff801bd169>] __driver_attach+0x0/0xd4^M [<ffffffff801bca93>] bus_for_each_dev+0x43/0x6e^M [<ffffffff801bc6d6>] bus_add_driver+0x81/0x133^M [<ffffffff8015cb43>] __pci_register_driver+0x5b/0x7c^M [<ffffffff88194045>] :e1000:e1000_init_module+0x45/0x7b^M [<ffffffff800ade86>] sys_init_module+0xb1/0x1ee^M [<ffffffff80060116>] system_call+0x7e/0x83^M ^M ^M Code: Bad RIP value.^M RIP [<0000000000000028>]^M RSP <ffff81003458fa60>^M Looks like I am hitting another similar crash here, http://porkchop.devel.redhat.com/qa/qa/qcai/vmcores/vmcore-455691-2 + /sbin/udevsettle^M <66<1>Unable to handle kernel NULL pointer dereference at 0000000000000080 RIP: ^M [<0000000000000080>]^M PGD 3c95b067 PUD 3c947067 PMD 0 ^M Oops: 0010 [1] SMP ^M last sysfs file: /class/misc/mcelog/dev^M CPU 0 ^M Modules linked in: i2c_i801 ide_cd i6300esb shpchp e1000 cdrom i2c_core serio_raw dm_snapshot dm_zero dm_mirror dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd^M Pid: 666, comm: modprobe Tainted: G M 2.6.18-92.el5debug #1^M RIP: 0010:[<0000000000000080>] [<0000000000000080>]^M RSP: 0000:ffff81003c953c30 EFLAGS: 00010007^M RAX: 000000000000416c RBX: ffff81003c953ca8 RCX: 0000000000020000^M RDX: 000000000000416c RSI: 0000000000020000 RDI: 0000000000000036^M RBP: ffffffff8817ac80 R08: 00000000ffffffff R09: ffffffff80094105^M R10: ffffffff80094105 R11: ffff81003ed58a58 R12: 0000000000000246^M R13: ffffffff8051e5a2 R14: 0000000000000038 R15: ffff810037cfef60^M FS: 00002aad1cffe240(0000) GS:ffffffff8040c000(0000) knlGS:0000000000000000^M CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M CR2: 0000000000000080 CR3: 000000003c971000 CR4: 00000000000006e0^M Process modprobe (pid: 666, threadinfo ffff81003c952000, task ffff81003c950240)^M Stack: ffffffff804e3f20 0000000000000001 0000000000000002 ffff81003c953fd8^M ffff81003c952000 ffff81003c950240 ffffffff804e3f98 000000000000000d^M ffffffff800606a8 ffff81003ed58a58 ffff81003ed58a58 ffff81003ed58000^M Call Trace:^M [<ffffffff800606a8>] restore_args+0x0/0x30^M [<ffffffff800943b5>] printk+0x52/0xbd^M [<ffffffff8000cfc8>] __delay+0x8/0x10^M [<ffffffff88170819>] :e1000:e1000_release_eeprom+0x5f/0x80^M [<ffffffff88170d67>] :e1000:e1000_read_eeprom+0x2af/0x2c7^M [<ffffffff8816d9bb>] :e1000:e1000_probe+0xa9d/0xca3^M [<ffffffff8015c953>] pci_device_probe+0x100/0x180^M [<ffffffff801bd08e>] driver_probe_device+0x52/0xaa^M [<ffffffff801bd1ef>] __driver_attach+0x86/0xd4^M [<ffffffff801bd169>] __driver_attach+0x0/0xd4^M [<ffffffff801bca93>] bus_for_each_dev+0x43/0x6e^M [<ffffffff801bc6d6>] bus_add_driver+0x81/0x133^M [<ffffffff8015cb43>] __pci_register_driver+0x5b/0x7c^M [<ffffffff88153045>] :e1000:e1000_init_module+0x45/0x7b^M [<ffffffff800ade86>] sys_init_module+0xb1/0x1ee^M [<ffffffff80060116>] system_call+0x7e/0x83^M Code: Bad RIP value.^M RIP [<0000000000000080>]^M RSP <ffff81003c953c30>^M Looks like random crash to me. Let me know if you need vmcore of it. Scanning logical volumes^M Reading all physical volumes. This may take a while...^M Found volume group "VolGroup00" using metadata type lvm2^M Activating logical volumes^M Unable to handle kernel paging request at ffffffffbb00863e RIP: ^M [<ffffffffbb00863e>]^M PGD 1003067 PUD 1005063 PMD 0 ^M Oops: 0010 [1] SMP ^M last sysfs file: /block/ram0/dev^M CPU 0 ^M Modules linked in: dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd ata_piix libata sd_mod scsi_mod^M Pid: 926, comm: lvm Tainted: G M 2.6.18-92.1.6.el5debug #1^M RIP: 0010:[<ffffffffbb00863e>] [<ffffffffbb00863e>]^M RSP: 0018:ffff81000816fd20 EFLAGS: 00010246^M RAX: 0000000000000000 RBX: ffff8100081977a8 RCX: ffff81000816fddc^M RDX: ffffffff800e88e2 RSI: ffff81000816fddc RDI: ffff8100081977a8^M RBP: ffff8100081977a8 R08: 0000000000000002 R09: 0000000000000000^M R10: ffffffff8005c61e R11: ffff8100081226c0 R12: ffff81000816fddc^M R13: ffffffff800e88e2 R14: ffff81000217aa98 R15: ffff8100021c32f8^M FS: 00000000088e4880(0063) GS:ffffffff8040e000(0000) knlGS:0000000000000000^M CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M CR2: ffffffffbb00863e CR3: 000000000871d000 CR4: 00000000000006e0^M Process lvm (pid: 926, threadinfo ffff81000816e000, task ffff8100081226c0)^M Stack: ffffffff45131ceb ffffffff8005c61e ffff81000816fddc 0000000000000001^M ffffffff800e88e2 ffff81000217aa98 ffff8100021c32f8 ffffffff800e88e2^M ffffffff8005c62f ffff8100081a16c0 0000000000000000 ffff81000217aa98^M Call Trace:^M [<ffffffff8005c61e>] ifind+0x23/0xab^M [<ffffffff800e88e2>] bdev_test+0x0/0xf^M [<ffffffff800e88e2>] bdev_test+0x0/0xf^M [<ffffffff8005c62f>] ifind+0x34/0xab^M [<ffffffff800eed2a>] iget5_locked+0x6c/0x19b^M [<ffffffff800e784a>] bdev_set+0x0/0xb^M [<ffffffff800e80fd>] bdget+0x40/0x112^M [<ffffffff800e82de>] do_open+0x10f/0x313^M [<ffffffff800e873c>] blkdev_open+0x0/0x4f^M [<ffffffff800e873c>] blkdev_open+0x0/0x4f^M [<ffffffff800e875f>] blkdev_open+0x23/0x4f^M [<ffffffff8001f484>] __dentry_open+0xd9/0x1e2^M [<ffffffff80028916>] do_filp_open+0x2a/0x38^M [<ffffffff800683ef>] _spin_unlock+0x17/0x20^M [<ffffffff80016702>] get_unused_fd+0xf9/0x107^M [<ffffffff8001a65f>] do_sys_open+0x44/0xbe^M [<ffffffff80060116>] system_call+0x7e/0x83^M ^M ^M Code: Bad RIP value.^M RIP [<ffffffffbb00863e>]^M RSP <ffff81000816fd20>^M CR2: ffffffffbb00863e^M <0>Kernel panic - not syncing: Fatal exception^M Please disregard Comment #7. It is irrelevant here. It is possible to reproduce the original crash stack traces on 92.1.6.el5 Kernel, but seems there is no way to save a vmcore, because Kdump Kernel crashed as well, Unable to handle kernel paging request at ffffffff8405e2d9 RIP: [<ffffffff8405e2d9>] PGD 203067 PUD 205063 PMD 0 Oops: 0010 [1] SMP last sysfs file: /devices/pnp0/00:09/id CPU 1 Modules linked in: dm_snapshot dm_zero dm_mirror dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 622, comm: modprobe Tainted: G M 2.6.18-92.1.6.el5debug #1 RIP: 0010:[<ffffffff8405e2d9>] [<ffffffff8405e2d9>] RSP: 0000:ffff810037c13ef8 EFLAGS: 00010286 RAX: ffff81003e54ed50 RBX: 0000000000000003 RCX: 0000000000000000 RDX: 000000000000000c RSI: ffff810009000038 RDI: ffff81003e54ed50 RBP: ffff81000a600d20 R08: 0000000000000002 R09: 0000000000000000 R10: 0000000000000002 R11: 0000000000000158 R12: ffff81003d36beb0 R13: 0000000000000003 R14: 0000000000000007 R15: 00007fff1f53b8b7 FS: 00002af78b57edc0(0000) GS:ffff81003ff60300(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffff8405e2d9 CR3: 000000003cd3f000 CR4: 00000000000006e0 Process modprobe (pid: 622, threadinfo ffff81003fc18000, task ffff81003cd76580) Stack: ffffffff800a0a96 ffff810037cbc000 ffff81000a600e20 0000000000000000 000000000000000a 0000000000000001 ffffffff800a0b4e ffffffff80425fd0 ffffffff800971d2 ffffffff80425f90 0000000000000001 ffffffff80425fd0 Call Trace: <IRQ> [<ffffffff800a0a96>] __rcu_process_callbacks+0x140/0x1d5 [<ffffffff800a0b4e>] rcu_process_callbacks+0x23/0x43 [<ffffffff800971d2>] tasklet_action+0x6c/0xc3 [<ffffffff80012a0e>] __do_softirq+0x67/0xf3 [<ffffffff800613d0>] call_softirq+0x1c/0x28 [<ffffffff8007066a>] do_softirq+0x35/0xa0 [<ffffffff80060d3b>] apic_timer_interrupt+0x6b/0x70 <EOI> Code: Bad RIP value. RIP [<ffffffff8405e2d9>] RSP <ffff810037c13ef8> Linux version 2.6.18-92.1.6.el5debug (brewbuilder.redhat.com) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)) #1 SMP Fri Jun 20 03:07:33 EDT 2008 Command line: ro root=/dev/VolGroup00/LogVol00 console=tty0 console=ttyS0,115200 irqpoll maxcpus=1 reset_devices memmap=exactmap memmap=640K@0K memmap=9476K@16384K memmap=120940K@26500K elfcorehdr=147440K memmap=24K#1048000K memmap=40K#1048024K BIOS-provided physical RAM map: BIOS-e820: 0000000000000100 - 000000000009b800 (usable) BIOS-e820: 000000000009b800 - 00000000000a0000 (reserved) BIOS-e820: 0000000000100000 - 000000003ff70000 (usable) BIOS-e820: 000000003ff70000 - 000000003ff76000 (ACPI data) BIOS-e820: 000000003ff76000 - 000000003ff80000 (ACPI NVS) BIOS-e820: 000000003ff80000 - 0000000040000000 (reserved) BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000ff800000 - 00000000ffc00000 (reserved) BIOS-e820: 00000000fffffc00 - 0000000100000000 (reserved) user-defined physical RAM map: user: 0000000000000000 - 00000000000a0000 (usable) user: 0000000001000000 - 0000000001941000 (usable) user: 00000000019e1000 - 0000000008ffc000 (usable) user: 000000003ff70000 - 000000003ff76000 (ACPI data) user: 000000003ff76000 - 000000003ff80000 (ACPI data) DMI present. No NUMA configuration found Faking a node at 0000000000000000-0000000008ffc000 Bootmem setup node 0 0000000000000000-0000000008ffc000 Memory for crash kernel (0x0 to 0x0) notwithin permissible range disabling kdump ACPI: PM-Timer IO Port: 0x1008 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 15:4 APIC version 20 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) Processor #1 15:4 APIC version 20 ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23 ACPI: IOAPIC (id[0x03] address[0xfec10000] gsi_base[24]) IOAPIC[1]: apic_id 3, version 32, address 0xfec10000, GSI 24-47 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Setting APIC routing to physical flat Using ACPI (MADT) for SMP configuration information Nosave address range: 00000000000a0000 - 0000000001000000 Nosave address range: 0000000001941000 - 00000000019e1000 Allocating PCI resources starting at 40000000 (gap: 3ff80000:c0080000) SMP: Allowing 2 CPUs, 0 hotplug CPUs Built 1 zonelists. Total pages: 31893 Kernel command line: ro root=/dev/VolGroup00/LogVol00 console=tty0 console=ttyS0,115200 irqpoll maxcpus=1 reset_devices memmap=exactmap memmap=640K@0K memmap=9476K@16384K memmap=120940K@26500K elfcorehdr=147440K memmap=24K#1048000K memmap=40K#1048024K Misrouted IRQ fixup and polling support enabled This may significantly impact system performance Initializing CPU#0 PID hash table entries: 512 (order: 9, 4096 bytes) Console: colour VGA+ 80x25 Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar ... MAX_LOCKDEP_SUBCLASSES: 8 ... MAX_LOCK_DEPTH: 30 ... MAX_LOCKDEP_KEYS: 2048 ... CLASSHASH_SIZE: 1024 ... MAX_LOCKDEP_ENTRIES: 8192 ... MAX_LOCKDEP_CHAINS: 8192 ... CHAINHASH_SIZE: 4096 memory used by lock dependency info: 1392 kB per task-struct memory footprint: 2160 bytes Dentry cache hash table entries: 16384 (order: 5, 131072 bytes) Inode-cache hash table entries: 8192 (order: 4, 65536 bytes) Checking aperture... Memory: 110836k/147440k available (2525k kernel code, 20220k reserved, 1623k data, 196k init) Calibrating delay using timer specific routine.. 2904.44 BogoMIPS (lpj=1452220) Security Framework v1.0.0 initialized SELinux: Initializing. selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 256 CPU: Trace cache: 12K uops, L1 D cache: 16K CPU: L2 cache: 1024K using mwait in idle threads. CPU: Physical Processor ID: 0 CPU: Processor Core ID: 0 lockdep: not fixing up alternatives. ACPI: Core revision 20060707 Using local APIC timer interrupts. result 6251239 Detected 6.251 MHz APIC timer. Brought up 1 CPUs testing NMI watchdog ... OK. Disabling vsyscall due to use of PM timer time.c: Using 3.579545 MHz WALL PM GTOD PM timer. time.c: Detected 1400.285 MHz processor. checking if image is initramfs... it is Freeing initrd memory: 3002k freed NET: Registered protocol family 16 No dock devices found. ACPI: bus type pci registered PCI: Using configuration type 1 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) PCI quirk: region 1000-107f claimed by ICH4 ACPI/GPIO/TCO PCI quirk: region 1180-11bf claimed by ICH4 GPIO PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.2 PCI: Transparent bridge - 0000:00:1e.0 ACPI: PCI Interrupt Link [LNKA] (IRQs *3 10 11 14 15), disabled. ACPI: PCI Interrupt Link [LNKB] (IRQs 3 10 11 14 15) *5, disabled. ACPI: PCI Interrupt Link [LNKC] (IRQs 3 10 11 14 15) *5, disabled. ACPI: PCI Interrupt Link [LNKD] (IRQs 3 10 *11 14 15), disabled. ACPI: PCI Interrupt Link [LNKE] (IRQs 3 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKF] (IRQs 3 10 11 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKG] (IRQs 3 10 *11 14 15), disabled. ACPI: PCI Interrupt Link [LNKH] (IRQs 3 10 11 14 15) *7, disabled. Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init pnp: PnP ACPI: found 12 devices usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Using ACPI for IRQ routing PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 NetLabel: unlabeled traffic allowed by default PCI-GART: No AMD northbridge found. PCI: Bridge: 0000:00:03.0 IO window: 2000-2fff MEM window: d0100000-d01fffff PREFETCH window: 50000000-500fffff PCI: Bridge: 0000:00:1c.0 IO window: disabled. MEM window: disabled. PREFETCH window: disabled. PCI: Bridge: 0000:00:1e.0 IO window: disabled. MEM window: d0200000-d02fffff PREFETCH window: e0000000-efffffff NET: Registered protocol family 2 IP route cache hash table entries: 1024 (order: 1, 8192 bytes) TCP established hash table entries: 4096 (order: 6, 262144 bytes) TCP bind hash table entries: 2048 (order: 5, 131072 bytes) TCP: Hash tables configured (established 4096 bind 2048) TCP reno registered Simple Boot Flag at 0x38 set to 0x1 audit: initializing netlink socket (disabled) audit(1217838876.420:1): initialized Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) Initializing Cryptographic API ksign: Installing public key data Loading keyring - Added public key 5C9DD0FD4CC9C295 - User ID: Red Hat, Inc. (Kernel Module GPG key) io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) pci_hotplug: PCI Hot Plug PCI Core version: 0.5 ACPI: CPU1 (power states: C1[C1] C3[C3]) ACPI: Processor [CPU0] (supports 8 throttling states) ACPI: CPU0 (power states: C1[C1] C3[C3]) ACPI: Processor [CPU1] (supports 8 throttling states) Real Time Clock Driver v1.12ac Non-volatile memory driver v1.2 Linux agpgart interface v0.101 (c) Dave Jones agpgart: Detected an Intel i875 Chipset. agpgart: AGP aperture is 32M @ 0xd2000000 Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A 00:09: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A RAMDISK driver initialized: 16 RAM disks of 16384K size 4096 blocksize Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx hda: CD-224E, ATAPI CD/DVD-ROM drive ide1: I/O resource 0x170-0x177 not free. ide1: ports already in use, skipping probe ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 ide-floppy driver 0.99.newide usbcore: registered new driver hiddev usbcore: registered new driver usbhid drivers/usb/input/hid-core.c: v2.6:USB HID core driver PNP: PS/2 Controller [PNP0303:KBC0,PNP0f13:MSE0] at 0x60,0x64 irq 1,12 serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 mice: PS/2 mouse device common for all mice md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: bitmap version 4.39 TCP bic registered Initializing IPsec netlink socket NET: Registered protocol family 1 NET: Registered protocol family 17 ACPI: (supports S0 S1 S4 S5) Freeing unused kernel memory: 196k freed Write protecting the kernel read-only data: 496k Mounting proc filesystem Mounting sysfs filesystem Creating /dev Creating initial device nodes Loading scsi_mod.ko module SCSI subsystem initialized Loading sd_mod.ko module Loading libata.ko module Loading ata_piix.ko module ata_piix 0000:00:1f.2: MAP [ IDE IDE P0 P1 ] GSI 16 sharing vector 0xA9 and IRQ 16 ACPI: PCI Interrupt 0000:00:1f.2[A] -> GSI 18 (level, low) -> IRQ 169 ata_piix 0000:00:1f.2: 0x1F0 IDE port busy scsi0 : ata_piix scsi1 : ata_piix ata1: DUMMY ata2: SATA max UDMA/133 bmdma 0x1478 irq 15 ata2.00: ATA-6: WDC WD400JD-75HKA1, 14.03G14, max UDMA/133 ata2.00: 78125000 sectors, multi 16: LBA ata2.00: configured for UDMA/133 Vendor: ATA Model: WDC WD400JD-75HK Rev: 14.0 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 78125000 512-byte hdwr sectors (40000 MB) sda: Write Protect is off SCSI device sda: drive cache: write back SCSI device sda: 78125000 512-byte hdwr sectors (40000 MB) sda: Write Protect is off SCSI device sda: drive cache: write back sda: sda1 sda2 sd 1:0:0:0: Attached scsi disk sda Loading jbd.ko module Loading ext3.ko module Loading dm-mod.ko module device-mapper: uevent: version 1.0.3 device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised: dm-devel Loading dm-mirror.ko module Loading dm-zero.ko module Loading dm-snapshot.ko module Creating Block Devices Creating block device ram0 Creating block device ram1 Creating block device ram10 Creating block device ram11 Creating block device ram12 Creating block device ram13 Creating block device ram14 Creating block device ram15 Creating block device ram2 Creating block device ram3 Creating block device ram4 Creating block device ram5 Creating block device ram6 Creating block device ram7 Creating block device ram8 Creating block device ram9 Creating block device sda Making device-mapper control node Scanning logical volumes Reading all physical volumes. This may take a while... Unable to handle kernel paging request at ffffffffbaf0863f RIP: [<ffffffffbaf0863f>] PGD 1003067 PUD 1005063 PMD 0 Oops: 0010 [1] SMP last sysfs file: /block/ram0/dev CPU 0 Modules linked in: dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd ata_piix libata sd_mod scsi_mod Pid: 924, comm: lvm Tainted: G M 2.6.18-92.1.6.el5debug #1 RIP: 0010:[<ffffffffbaf0863f>] [<ffffffffbaf0863f>] RSP: 0018:ffff810008123d80 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff810008198828 RCX: ffff810008123e0c RDX: ffffffff800e88e2 RSI: ffff810008123e0c RDI: ffff810008198828 RBP: ffff810008198828 R08: 0000000000000002 R09: 0000000000000000 R10: ffffffff800eed56 R11: ffff810008dd6580 R12: ffff810008123e0c R13: ffffffff800e88e2 R14: ffff81000217cf00 R15: ffff8100021c32f8 FS: 00000000163cc880(0063) GS:ffffffff8040e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffbaf0863f CR3: 0000000008728000 CR4: 00000000000006e0 Process lvm (pid: 924, threadinfo ffff810008122000, task ffff810008dd6580) Stack: ffffffff45231cea ffffffff800eed56 ffff8100081b3328 0000000000000000 ffff81000217cf00 ffff8100021c32f8 ffff810008123e0c ffffffff800e88e2 ffffffff800eed67 ffffffff800e784a 0000000000000000 0000000000000000 Call Trace: [<ffffffff800eed56>] iget5_locked+0x98/0x19b [<ffffffff800e88e2>] bdev_test+0x0/0xf [<ffffffff800eed67>] iget5_locked+0xa9/0x19b [<ffffffff800e784a>] bdev_set+0x0/0xb [<ffffffff800e873c>] blkdev_open+0x0/0x4f [<ffffffff800e80fd>] bdget+0x40/0x112 [<ffffffff800e864b>] bd_acquire+0x42/0xaa [<ffffffff800e874f>] blkdev_open+0x13/0x4f [<ffffffff8001f484>] __dentry_open+0xd9/0x1e2 [<ffffffff80028916>] do_filp_open+0x2a/0x38 [<ffffffff800683ef>] _spin_unlock+0x17/0x20 [<ffffffff80016702>] get_unused_fd+0xf9/0x107 [<ffffffff8001a65f>] do_sys_open+0x44/0xbe [<ffffffff80060116>] system_call+0x7e/0x83 Code: Bad RIP value. RIP [<ffffffffbaf0863f>] RSP <ffff810008123d80> CR2: ffffffffbaf0863f <0>Kernel panic - not syncing: Fatal exception So, both times kernel was crashed with 'modprobe e1000'. Upstream commit 3c34ac36ac1084e571ef9b6fb1d6a5b10ccc1fd0 looks very promising for me with regard to comment 2. With qcai we figured out that the best (most likely) way to reproduce this issue is using kernel-debug-2.6.18-92.1.6.el5.x86_64 on palmetto machine. To trigger the oops you must cold-reset the machine via RHTS web. Following boot has about 30% chance to oops. So I used 2.6.18-92.1.6.el5 and patched it with upstream patch from comment #10. no change, still gotting oopses. So can you bring up the kernel with out loading the e1000 driver and then load it manually after the system is up? Once the system is up without the driver loaded can the driver be loaded and unloaded multiple times? It seems very odd that only this one configuration is broke. So this only happens with the debug kernel and seems most common with kernel-debug-2.6.18-92.1.6.el5.x86_64? It would be interesting to blacklist the e1000 module and see if that changes any behavior, but it seems unlikely. You could also consider building the kernel with some of these options disabled if we think it is likely to be related to any options added for debug kernels: CONFIG_DEBUG_SLAB=y CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_RWSEMS=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y CONFIG_DEBUG_VM=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_SPINLOCK_SLEEP=y CONFIG_LOCK_STAT=y (In reply to comment #14) > So this only happens with the debug kernel and seems most common with > kernel-debug-2.6.18-92.1.6.el5.x86_64? Exactly. > > It would be interesting to blacklist the e1000 module and see if that changes > any behavior, but it seems unlikely. Same issues still appear when e1000 is blacklisted. So it ruled out e1000. Thanks guys and feel free to unCC yourself. > > You could also consider building the kernel with some of these options disabled > if we think it is likely to be related to any options added for debug kernels: > > CONFIG_DEBUG_SLAB=y > CONFIG_DEBUG_MUTEXES=y > CONFIG_DEBUG_RT_MUTEXES=y > CONFIG_DEBUG_RWSEMS=y > CONFIG_DEBUG_LOCK_ALLOC=y > CONFIG_PROVE_LOCKING=y > CONFIG_DEBUG_VM=y > CONFIG_DEBUG_SPINLOCK=y > CONFIG_DEBUG_SPINLOCK_SLEEP=y > CONFIG_LOCK_STAT=y I'm going to try this. When compiling kernel on palmetto, this (and similar) appeared: CC init/initramfs.o init/initramfs.c: In function ‘unpack_to_rootfs’: init/initramfs.c:441: internal compiler error: Segmentation fault Please submit a full bug report, with preprocessed source if appropriate. See <URL:http://bugzilla.redhat.com/bugzilla> for instructions. The bug is not reproducible, so it is likely a hardware or OS problem. make[1]: *** [init/initramfs.o] Error 1 make: *** [init] Error 2 Now I'm unable to compile kernel on this machine at all. Using original 2.6.18-92.el5. another try to compile kernel caused this: Red Hat Enterprise Linux Server release 5.2 (Tikanga) Kernel 2.6.18-92.el5 on an x86_64 ibm-palmetto.rhts.bos.redhat.com login: HARDWARE ERROR CPU 1: Machine Check Exception: 6 Bank 3: fe00000000010100 RIP 00:<00000000004b423b> TSC 7bd3ebdba2 ADDR 3a8 MISC 1d400c0802a80 This is not a software problem! Run through mcelog --ascii to decode and contact your hardware vendor HARDWARE ERROR CPU 0: Machine Check Exception: 6 Bank 3: fe00000000010100 RIP 00:<ffffffff80056c9a> TSC 7bd3ebde58 ADDR 3a8 MISC 1d400c0802a80 This is not a software problem! Run through mcelog --ascii to decode and contact your hardware vendor Kernel panic - not syncing: Machine check I also noticed that palmetto is according to bios info table running 3.4GHz P4 but sometimes this info says 1.4GHz or even 6GHz... |