Created attachment 394313 [details] script crashing the kernel Description of problem: running the crash.sh script causes kernel to panic. I was able to reproduce this on two different x86_64 machines. Version-Release number of selected component (if applicable): RHEL5.5-Server-20100215.nightly_nfs-x86_64 kernel-2.6.18-187.el5 dmraid-1.0.0.rc13-60.el5 How reproducible: always Steps to Reproduce: 1. get data.tar.bz2 (13M, link in attachements) 2. run crash.sh (attached as well) Actual results: kernel panic Expected results: kernal panic not happening Additional info: happens on updated RHEL 5.4 as well. Doesn't seem to happen on RHEL6-Alpha-3.
Created attachment 394314 [details] different kernel panic messages I have catched during crashes
Created attachment 394315 [details] data for creation of dmraid device-mapper devices
Michal, I tested the metadata sample in question on a vanilla kernel without any OOPS. Access to the RAID set was alright. I assume this may be a loop device issue, because I based the VG on a real disk. Can you avoid using loop and see if you're able to reproduce on the el5 kernel ?
I just randomly picked machine (hp-dl360-04.rhts.eng.brq.redhat.com) created new physical partition and used it instead of the /dev/loop1 with the same result (follows): rhel 5.4, kernel 2.6.18-185.el5 BUG: unable to handle kernel NULL pointer dereference at virtual address 0000000 printing eip: f88cbfc3 *pde = 72d47067 Oops: 0000 [#1] SMP last sysfs file: /block/hda/removable Modules linked in: dm_zero dm_snapshot autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ipv6 xfrm_nalgo crypto_api dm_mirror dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi ac parport_pc lp parport scb2_flash mtdcore chipreg ide_cd floppy cdrom i2c_piix4 i2c_core pcspkr hpilo serio_raw tg3 dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd CPU: 0 EIP: 0060:[<f88cbfc3>] Not tainted VLI EFLAGS: 00010282 (2.6.18-185.el5 #1) EIP is at dm_io_client_destroy+0x3/0x1a [dm_mod] eax: 00000000 ebx: 00000000 ecx: f7fff060 edx: c15bfe40 esi: f776c128 edi: edff2680 ebp: f776c000 esp: f5825bc0 ds: 007b es: 007b ss: 0068 Process dmraid (pid: 2719, ti=f5825000 task=f76ab000 task.ti=f5825000) Stack: f776c0a4 f88f3a47 f776c0a4 00000001 f776c000 f88f3d02 f776c128 f88f6a3b 00000002 0439bfff f34464c0 00000013 f8b6e080 f34464c8 00000000 00000495 f88fb278 00000001 f776c128 f47f0a40 00008000 00000000 00000050 0439bfff Call Trace: [<f88f3a47>] stripe_recover_free+0x4f/0x5a [dm_raid45] [<f88f3d02>] sc_exit+0x14/0x69 [dm_raid45] [<f88f6a3b>] raid_ctr+0xc4b/0x11c0 [dm_raid45] [<c04169d7>] smp_send_reschedule+0x51/0x53 [<f88c9209>] dm_table_add_target+0x14e/0x27d [dm_mod] [<f88cad29>] table_load+0xcd/0x186 [dm_mod] [<f88cb79f>] ctl_ioctl+0x1f3/0x238 [dm_mod] [<f88cac5c>] table_load+0x0/0x186 [dm_mod] [<c0485e60>] do_ioctl+0x47/0x5d [<c04863c9>] vfs_ioctl+0x47b/0x4d3 [<c0486469>] sys_ioctl+0x48/0x5f [<c0404f17>] syscall_call+0x7/0xb ======================= Code: 24 24 89 e0 89 e3 e8 ee f9 ff ff 89 f9 89 ea 31 c0 ff 74 24 2c ff 74 24 2c 53 56 e8 57 fe ff ff 83 c4 20 5b 5e 5f 5d c3 53 89 c3 <8b> 00 e8 5d f6 b8 c7 8b 43 04 e8 04 e7 ba c7 89 d8 5b e9 94 5a EIP: [<f88cbfc3>] dm_io_client_destroy+0x3/0x1a [dm_mod] SS:ESP 0068:f5825bc0 <0>Kernel panic - not syncing: Fatal exception
Analysis shows an uncovered error code path in the dm-raid45 target to deal with a failing resource allocation during construction of the RAID mapping. This is unlikely to show up in the field, because such metadata format ain't in use on an i386 system anyway. Question is, why it's being hit in the fist place because only default # of stripes is being tried allocating and the test system has enough RAM (2GB). An error path fix would still let the mapping creation fail but the OOPS will go away.
Fix sent to rhkernel-list with subject "[RHEL5.5 PATCH] dm: raid45 target: constructor error path oops fix" -> POST.
in kernel-2.6.18-191.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Please update the appropriate value in the Verified field (cf_verified) to indicate this fix has been successfully verified. Include a comment with verification details.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html