Bug 565494 - "dmraid -ay" panics kernel
Summary: "dmraid -ay" panics kernel
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: x86_64
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 567605
TreeView+ depends on / blocked
 
Reported: 2010-02-15 13:35 UTC by michal novacek
Modified: 2010-03-30 07:44 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 567605 (view as bug list)
Environment:
Last Closed: 2010-03-30 07:44:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
script crashing the kernel (279 bytes, text/x-sh)
2010-02-15 13:35 UTC, michal novacek
no flags Details
different kernel panic messages I have catched during crashes (13.16 KB, text/plain)
2010-02-15 13:36 UTC, michal novacek
no flags Details
data for creation of dmraid device-mapper devices (47 bytes, text/plain)
2010-02-15 13:39 UTC, michal novacek
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description michal novacek 2010-02-15 13:35:14 UTC
Created attachment 394313 [details]
script crashing the kernel

Description of problem: running the crash.sh script causes kernel to panic. I was able to reproduce this on two different x86_64 machines.

Version-Release number of selected component (if applicable):
RHEL5.5-Server-20100215.nightly_nfs-x86_64 
kernel-2.6.18-187.el5
dmraid-1.0.0.rc13-60.el5

How reproducible: always

Steps to Reproduce:
1. get data.tar.bz2 (13M, link in attachements)
2. run crash.sh (attached as well)

Actual results: kernel panic
 
Expected results: kernal panic not happening

Additional info: happens on updated RHEL 5.4 as well. Doesn't seem to happen on RHEL6-Alpha-3.

Comment 1 michal novacek 2010-02-15 13:36:46 UTC
Created attachment 394314 [details]
different kernel panic messages I have catched during crashes

Comment 2 michal novacek 2010-02-15 13:39:08 UTC
Created attachment 394315 [details]
data for creation of dmraid device-mapper devices

Comment 3 Heinz Mauelshagen 2010-02-16 17:35:16 UTC
Michal,

I tested the metadata sample in question on a vanilla kernel without any OOPS. Access to the RAID set was alright.

I assume this may be a loop device issue, because I based the VG on a real disk.

Can you avoid using loop and see if you're able to reproduce on the el5 kernel ?

Comment 4 michal novacek 2010-02-17 16:05:38 UTC
I just randomly picked machine (hp-dl360-04.rhts.eng.brq.redhat.com)
created new physical partition and used it instead of the /dev/loop1 with the same result (follows):

rhel 5.4, kernel 2.6.18-185.el5

BUG: unable to handle kernel NULL pointer dereference at virtual address 0000000
 printing eip:
f88cbfc3
*pde = 72d47067
Oops: 0000 [#1]
SMP 
last sysfs file: /block/hda/removable
Modules linked in: dm_zero dm_snapshot autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ipv6 xfrm_nalgo crypto_api dm_mirror dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi ac parport_pc lp parport scb2_flash mtdcore chipreg ide_cd floppy cdrom i2c_piix4 i2c_core pcspkr hpilo serio_raw tg3 dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
CPU:    0
EIP:    0060:[<f88cbfc3>]    Not tainted VLI
EFLAGS: 00010282   (2.6.18-185.el5 #1) 
EIP is at dm_io_client_destroy+0x3/0x1a [dm_mod]
eax: 00000000   ebx: 00000000   ecx: f7fff060   edx: c15bfe40
esi: f776c128   edi: edff2680   ebp: f776c000   esp: f5825bc0
ds: 007b   es: 007b   ss: 0068
Process dmraid (pid: 2719, ti=f5825000 task=f76ab000 task.ti=f5825000)
Stack: f776c0a4 f88f3a47 f776c0a4 00000001 f776c000 f88f3d02 f776c128 f88f6a3b 
       00000002 0439bfff f34464c0 00000013 f8b6e080 f34464c8 00000000 00000495 
       f88fb278 00000001 f776c128 f47f0a40 00008000 00000000 00000050 0439bfff 
Call Trace:
 [<f88f3a47>] stripe_recover_free+0x4f/0x5a [dm_raid45]
 [<f88f3d02>] sc_exit+0x14/0x69 [dm_raid45]
 [<f88f6a3b>] raid_ctr+0xc4b/0x11c0 [dm_raid45]
 [<c04169d7>] smp_send_reschedule+0x51/0x53
 [<f88c9209>] dm_table_add_target+0x14e/0x27d [dm_mod]
 [<f88cad29>] table_load+0xcd/0x186 [dm_mod]
 [<f88cb79f>] ctl_ioctl+0x1f3/0x238 [dm_mod]
 [<f88cac5c>] table_load+0x0/0x186 [dm_mod]
 [<c0485e60>] do_ioctl+0x47/0x5d
 [<c04863c9>] vfs_ioctl+0x47b/0x4d3
 [<c0486469>] sys_ioctl+0x48/0x5f
 [<c0404f17>] syscall_call+0x7/0xb
 =======================
Code: 24 24 89 e0 89 e3 e8 ee f9 ff ff 89 f9 89 ea 31 c0 ff 74 24 2c ff 74 24 2c 53 56 e8 57 fe ff ff 83 c4 20 5b 5e 5f 5d c3 53 89 c3 <8b> 00 e8 5d f6 b8 c7 8b 43 04 e8 04 e7 ba c7 89 d8 5b e9 94 5a 
EIP: [<f88cbfc3>] dm_io_client_destroy+0x3/0x1a [dm_mod] SS:ESP 0068:f5825bc0
 <0>Kernel panic - not syncing: Fatal exception

Comment 5 Heinz Mauelshagen 2010-02-22 12:19:43 UTC
Analysis shows an uncovered error code path in the dm-raid45 target to deal with a failing resource allocation during construction of the RAID mapping.

This is unlikely to show up in the field, because such metadata format ain't in use on an i386 system anyway. Question is, why it's being hit in the fist place because only default # of stripes is being tried allocating and the test system has enough RAM (2GB).

An error path fix would still let the mapping creation fail but the OOPS will go away.

Comment 7 Heinz Mauelshagen 2010-02-23 11:36:56 UTC
Fix sent to rhkernel-list with subject "[RHEL5.5 PATCH] dm: raid45 target: constructor error path oops fix" -> POST.

Comment 11 Jarod Wilson 2010-03-03 15:45:12 UTC
in kernel-2.6.18-191.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Please update the appropriate value in the Verified field
(cf_verified) to indicate this fix has been successfully
verified. Include a comment with verification details.

Comment 18 errata-xmlrpc 2010-03-30 07:44:25 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html


Note You need to log in before you can comment on or make changes to this bug.