Bug 161160 - Reproducable panic in mdadm multipathing
Reproducable panic in mdadm multipathing
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
medium Severity high
: ---
: ---
Assigned To: Doug Ledford
Depends On:
Blocks: 168424
  Show dependency treegraph
Reported: 2005-06-20 17:20 EDT by Wendy Cheng
Modified: 2007-11-30 17:07 EST (History)
5 users (show)

See Also:
Fixed In Version: RHSA-2006-0144
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-03-15 11:07:10 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
patch submitted by IBM (3.12 KB, patch)
2005-07-21 16:09 EDT, Wendy Cheng
no flags Details | Diff

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0144 qe-ready SHIPPED_LIVE Moderate: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 7 2006-03-15 00:00:00 EST

  None (edit)
Description Wendy Cheng 2005-06-20 17:20:08 EDT
Description of problem:

Two recreatable kernel oops have been reported with mdadm multpathing - one on
i686 and one on IPF machines. With the 2.4.21-32.0.1.ELsmp kernel, the panic route:

md0: former device sdi is unavailable, removing from array!
Unable to handle kernel NULL pointer dereference at virtual address 00000040
printing eip:
*pde = 35779001
*pte = 3c01c067
Oops: 0000
multipath netconsole usbserial lp parport autofs4 audit pool e1000 floppy sg
microcode loop lvm-mod keybdev mousedev hid input usb-uhci usbcore ext3 jbd qla
CPU:    1
EIP:    0060:[<f8b8859a>]    Not tainted
EFLAGS: 00010246

EIP is at multipath_run [multipath] 0x1ea (2.4.21-32.0.1.ELsmp/i686)
eax: d1210000   ebx: 00000000   ecx: 00000000   edx: f7caa294
esi: 00000000   edi: f7caa294   ebp: f7caa294   esp: f57cbd94
ds: 0068   es: 0068   ss: 0068
Process mdadm (pid: 4381, stackpage=f57cb000)
Stack: d1210000 00000000 000002c4 cf940000 c043fc80 c0440054 f7caa294 c043fc80
      f57cbde8 00000086 00000000 00000000 cf940000 f57ca000 f5c43000 00000001
      0000000a d1210000 00000000 c048135f 00007ca3 c0129553 00000282 00007ca3
Call Trace:   [<c0129553>] call_console_drivers [kernel] 0x63 (0xf57cbde8)
[<c0129883>] printk [kernel] 0x153 (0xf57cbe20)
[<c0217594>] device_size_calculation [kernel] 0x154 (0xf57cbe40)
[<c021786d>] do_md_run [kernel] 0x1dd (0xf57cbe6c)
[<c0129883>] printk [kernel] 0x153 (0xf57cbe88)
[<c0215a45>] bind_rdev_to_array [kernel] 0xa5 (0xf57cbea8)
[<c02186ed>] add_new_disk [kernel] 0x24d (0xf57cbec8)
[<c021928c>] md_ioctl [kernel] 0x38c (0xf57cbeec)
[<c0126154>] context_switch [kernel] 0xa4 (0xf57cbf60)
[<c01b2a3f>] tty_write [kernel] 0x14f (0xf57cbf68)
[<c016dbfe>] blkdev_ioctl [kernel] 0x3e (0xf57cbf80)
[<c0178756>] sys_ioctl [kernel] 0xf6 (0xf57cbf94)

Code: 8b 49 40 85 c9 0f 85 5f 02 00 00 8b 44 24 38 bf 01 00 00 00

Version-Release number of selected component (if applicable):
All versions of RHEL 3 kernels up to the current RHN distribution

How reproducible:
Each time and every time

Steps to Reproduce:
1. connect linux box to SAN storage with multipath.
2. create a lun on SAN storage, and start up with SAN boot.
3. create two more luns on SAN storage, then reboot.

/dev/sda:  50GB (including /, /boot, swap partition)
/dev/sdb:  12GB
/dev/sdc:  1GB
/dev/sdd:  multipath device for /dev/sda
/dev/sde:  multipath device for /dev/sdb
/dev/sdf:  multipath device for /dev/sdc

4. create a partition on /dev/sdc (multipath /dev/sdf) by parted, then assign
them to /dev/md0
5. On shell> mdadm -C -lmp -n2 /dev/md0 /dev/sdc1 /dev/sdf1
6. removing /dev/sdb and /dev/sde on SAN storage, then reboot.

now the device names have changed:
previous /dev/sdc becomes /dev/sdb, and previous /dev/sdf becomes /dev/sdd.
/dev/sda:  50GB (including /, /boot, swap partition)
/dev/sdb:  1GB (previous /dev/sdc)
/dev/sdd:  multipath device for /dev/sda
/dev/sde:  multipath device for /dev/sdb (previous /dev/sdf)

7. after editing /etc/mdadm.conf, does a "mdadm -As /dev/md0"

Actual result:
kernel oops.

Expected result:
no oops.

Additional Info:

--- /etc/mdadm.conf ---
DEVICE /dev/sd[abcdef][0-9]
ARRAY /dev/md0 devices=/dev/sdb1,/dev/sdd1
Comment 1 Wendy Cheng 2005-06-20 17:53:01 EDT
Sorry, typo in the device names have changed lines - should be:

now the device names have changed:
previous /dev/sdc becomes /dev/sdb, and previous /dev/sdf becomes /dev/sdd.
/dev/sda:  50GB (including /, /boot, swap partition)
/dev/sdb:  1GB (previous /dev/sdc)
/dev/sdc:  multipath device for /dev/sda
/dev/sdd:  multipath device for /dev/sdb (previous /dev/sdf)
Comment 11 Rene Klootwijk 2005-09-26 10:25:53 EDT
This same problem is happening when creating a multipath device on one system,
and activating the mulitpath device on another system which has assigned other
device names for these LUN's. We require several multipath devices activated on
multiple system for a Oracle10g RAC environment.
Comment 12 Doug Ledford 2005-09-26 17:28:00 EDT
This patch has passed my internal testing and the patch has been submitted
internally for review and possible inclusion in the next RHEL3 update release. 
I've also built a test kernel that has this patch included.  RPMs can be found
at http://people.redhat.com/dledford/st_tape_test/ and the kernel version that
includes this patch is 2.4.21-37.1.EL_st_tape_test3.
Comment 13 Rene Klootwijk 2005-09-27 03:16:08 EDT
Can you compile a hugemem version of the kernel?
Comment 14 Doug Ledford 2005-09-27 10:24:01 EDT
One is already present in the i686 directory.
Comment 17 Ernie Petrides 2005-10-07 22:12:57 EDT
A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.5.EL).
Comment 27 Red Hat Bugzilla 2006-03-15 11:07:11 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.