Description of problem: 64-bit RHEL 4.7 won't boot if LSI mirror is resyncing Version-Release number of selected component (if applicable): mkinitrd-4.2.1.13-1 (or possibly: kernel-smp-2.6.9-78.0.1.EL.x86_64) (and maybe: MPTBIOS-IME-5.10.02.04) How reproducible: Try to boot a 64-bit RHEL 4.7 server while its LSI mirror is resyncing. Steps to Reproduce: 1. Start an LSI mirror resynchronization (e.g. via the "Synchronize Whole Mirror" link in the LSI BIOS utility) 2. Try to boot Redhat Actual results: Kernel crash (see output below). Expected results: Successful boot. Additional info: On 64-bit servers with two disks managed by an LSI RAID controller, configured as a single mirrored volume, RHEL 4.7 won't boot if the mirror is resynchronizing--there's an OOPS right after nash starts (full output below). I believe that we saw this on RHEL 4.6 as well, though I can't be certain. This is a fairly serious issue, since it effectively disables the server until the array is finished resyncing. We've only been able to reproduce this on 64-bit servers so far (in particular, two Sun v40z servers and an IBM HS20-8843 blade). When I tried to reproduce it on a 32-bit blade server (an IBM HS20-8678), the blade was able to boot with no problems even when the disk mirror was resyncing. Also, we tried booting a Sun v40z with just one disk and then hot-inserting the second disk, and the system booted fine and continued running fine after the disk was inserted and the array started resyncing. So it's not the case that Redhat has a problem running while an LSI RAID volume is resyncing in general--it's just during the boot sequence that it causes a problem. Here's the output of the kernel crash (there's no netdump output or any other output, since the netdump service hasn't started when this crash occurs): GRUB loading, please wait... Press any key to continue. Press any key to continue. Press any key to continue. Press any key to continue. Press any key to continue. Red Hat nash version 4.2.1.13 starting Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: <ffffffffa0034d80>{:mptbase:mpt_base_reply+1624} PML4 4000ca067 PGD 4000ce067 PMD 0 Oops: 0000 [1] SMP CPU 7 Modules linked in: mptspi mptscsi mptbase sd_mod scsi_mod Pid: 0, comm: swapper Not tainted 2.6.9-78.0.1.ELsmp RIP: 0010:[<ffffffffa0034d80>] <ffffffffa0034d80>{:mptbase:mpt_base_reply+1624} RSP: 0000:00000101f8f9fe38 EFLAGS: 00010282 RAX: 00000000ffffffff RBX: 0000000000000001 RCX: 0000000000000246 RDX: 0000000000000000 RSI: 00000104000e0008 RDI: ffffffff803f64c0 RBP: 0000000000000003 R08: 00000000fffffffb R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 00000100fbe806fc R13: 00000000000000ff R14: 00000104000e0000 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffffffff8050d600(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000008 CR3: 00000007f8fbe000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo 00000105f8f08000, task 00000100fbf55030) Stack: 0000000000000000 000000000b000001 0000000100032853 00000100fbe806e0 00000100fbe82800 0000000000000007 7461726765746e49 3a64696152206465 20656d756c6f5620 4320737574617453 Call Trace:<IRQ> <ffffffffa002d638>{:mptbase:mpt_interrupt+1211} <ffffffff80112ff2>{handle_IRQ_event+41} <ffffffff8011326c>{do_IRQ+197} <ffffffff801108bf>{ret_from_intr+0} <EOI> <ffffffff8010e789>{default_idle+0} <ffffffff8010e7a9>{default_idle+32} <ffffffff8010e81c>{cpu_idle+26} Code: 0f b7 42 08 77 05 80 cc 01 eb 03 80 e4 fe 41 f6 c7 04 66 89 RIP <ffffffffa0034d80>{:mptbase:mpt_base_reply+1624} RSP <00000101f8f9fe38> CR2: 0000000000000008 <0>Kernel panic - not syncing: Oops Badness in panic at kernel/panic.c:118 Call Trace:<IRQ> <ffffffff8013871e>{panic+527} <ffffffff80110c81>{apic_timer_interrupt+133} <ffffffff80111b7c>{oops_end+38} <ffffffff80111b97>{oops_end+65} <ffffffff80124aed>{do_page_fault+1125} <ffffffff80138f0e>{release_console_sem+369} <ffffffff8013913c>{vprintk+498} <ffffffff80110e1d>{error_exit+0} <ffffffffa0034d80>{:mptbase:mpt_base_reply+1624} <ffffffffa0034d2c>{:mptbase:mpt_base_reply+1540} <ffffffffa002d638>{:mptbase:mpt_interrupt+1211} <ffffffff80112ff2>{handle_IRQ_event+41} <ffffffff8011326c>{do_IRQ+197} <ffffffff801108bf>{ret_from_intr+0} <EOI> <ffffffff8010e789>{default_idle+0} <ffffffff8010e7a9>{default_idle+32} <ffffffff8010e81c>{cpu_idle+26} Badness in i8042_panic_blink at drivers/input/serio/i8042.c:987 Call Trace:<IRQ> <ffffffff802481d3>{i8042_panic_blink+238} <ffffffff801386cc>{panic+445} <ffffffff80110c81>{apic_timer_interrupt+133} <ffffffff80111b7c>{oops_end+38} <ffffffff80111b97>{oops_end+65} <ffffffff80124aed>{do_page_fault+1125} <ffffffff80138f0e>{release_console_sem+369} <ffffffff8013913c>{vprintk+498} <ffffffff80110e1d>{error_exit+0} <ffffffffa0034d80>{:mptbase:mpt_base_reply+1624} <ffffffffa0034d2c>{:mptbase:mpt_base_reply+1540} <ffffffffa002d638>{:mptbase:mpt_interrupt+1211} <ffffffff80112ff2>{handle_IRQ_event+41} <ffffffff8011326c>{do_IRQ+197} <ffffffff801108bf>{ret_from_intr+0} <EOI> <ffffffff8010e789>{default_idle+0} <ffffffff8010e7a9>{default_idle+32} <ffffffff8010e81c>{cpu_idle+26} Badness in i8042_panic_blink at drivers/input/serio/i8042.c:990 Call Trace:<IRQ> <ffffffff80248265>{i8042_panic_blink+384} <ffffffff801386cc>{panic+445} <ffffffff80110c81>{apic_timer_interrupt+133} <ffffffff80111b7c>{oops_end+38} <ffffffff80111b97>{oops_end+65} <ffffffff80124aed>{do_page_fault+1125} <ffffffff80138f0e>{release_console_sem+369} <ffffffff8013913c>{vprintk+498} <ffffffff80110e1d>{error_exit+0} <ffffffffa0034d80>{:mptbase:mpt_base_reply+1624} <ffffffffa0034d2c>{:mptbase:mpt_base_reply+1540} <ffffffffa002d638>{:mptbase:mpt_interrupt+1211} <ffffffff80112ff2>{handle_IRQ_event+41} <ffffffff8011326c>{do_IRQ+197} <ffffffff801108bf>{ret_from_intr+0} <EOI> <ffffffff8010e789>{default_idle+0} <ffffffff8010e7a9>{default_idle+32} <ffffffff8010e81c>{cpu_idle+26} Badness in i8042_panic_blink at drivers/input/serio/i8042.c:992 Call Trace:<IRQ> <ffffffff802482ca>{i8042_panic_blink+485} <ffffffff801386cc>{panic+445} <ffffffff80110c81>{apic_timer_interrupt+133} <ffffffff80111b7c>{oops_end+38} <ffffffff80111b97>{oops_end+65} <ffffffff80124aed>{do_page_fault+1125} <ffffffff80138f0e>{release_console_sem+369} <ffffffff8013913c>{vprintk+498} <ffffffff80110e1d>{error_exit+0} <ffffffffa0034d80>{:mptbase:mpt_base_reply+1624} <ffffffffa0034d2c>{:mptbase:mpt_base_reply+1540} <ffffffffa002d638>{:mptbase:mpt_interrupt+1211} <ffffffff80112ff2>{handle_IRQ_event+41} <ffffffff8011326c>{do_IRQ+197} <ffffffff801108bf>{ret_from_intr+0} <EOI> <ffffffff8010e789>{default_idle+0} <ffffffff8010e7a9>{default_idle+32} <ffffffff8010e81c>{cpu_idle+26}
Clarification: When I said "we tried booting a Sun v40z with just one disk and then hot-inserting the second disk" I meant that we hot-inserted the second disk *after* the system had finished booting.