Bug 161160
| Summary: | Reproducable panic in mdadm multipathing | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 3 | Reporter: | Wendy Cheng <nobody+wcheng> | ||||
| Component: | kernel | Assignee: | Doug Ledford <dledford> | ||||
| Status: | CLOSED ERRATA | QA Contact: | |||||
| Severity: | high | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 3.0 | CC: | bjohnson, petrides, rene.klootwijk, rkenna, tao | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | RHSA-2006-0144 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2006-03-15 16:07:10 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 168424 | ||||||
| Attachments: |
|
||||||
Sorry, typo in the device names have changed lines - should be: now the device names have changed: previous /dev/sdc becomes /dev/sdb, and previous /dev/sdf becomes /dev/sdd. /dev/sda: 50GB (including /, /boot, swap partition) /dev/sdb: 1GB (previous /dev/sdc) /dev/sdc: multipath device for /dev/sda /dev/sdd: multipath device for /dev/sdb (previous /dev/sdf) This same problem is happening when creating a multipath device on one system, and activating the mulitpath device on another system which has assigned other device names for these LUN's. We require several multipath devices activated on multiple system for a Oracle10g RAC environment. This patch has passed my internal testing and the patch has been submitted internally for review and possible inclusion in the next RHEL3 update release. I've also built a test kernel that has this patch included. RPMs can be found at http://people.redhat.com/dledford/st_tape_test/ and the kernel version that includes this patch is 2.4.21-37.1.EL_st_tape_test3. Can you compile a hugemem version of the kernel? One is already present in the i686 directory. A fix for this problem has just been committed to the RHEL3 U7 patch pool this evening (in kernel version 2.4.21-37.5.EL). An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0144.html |
Description of problem: Two recreatable kernel oops have been reported with mdadm multpathing - one on i686 and one on IPF machines. With the 2.4.21-32.0.1.ELsmp kernel, the panic route: md0: former device sdi is unavailable, removing from array! Unable to handle kernel NULL pointer dereference at virtual address 00000040 printing eip: f8b8859a *pde = 35779001 *pte = 3c01c067 Oops: 0000 multipath netconsole usbserial lp parport autofs4 audit pool e1000 floppy sg microcode loop lvm-mod keybdev mousedev hid input usb-uhci usbcore ext3 jbd qla 23 CPU: 1 EIP: 0060:[<f8b8859a>] Not tainted EFLAGS: 00010246 EIP is at multipath_run [multipath] 0x1ea (2.4.21-32.0.1.ELsmp/i686) eax: d1210000 ebx: 00000000 ecx: 00000000 edx: f7caa294 esi: 00000000 edi: f7caa294 ebp: f7caa294 esp: f57cbd94 ds: 0068 es: 0068 ss: 0068 Process mdadm (pid: 4381, stackpage=f57cb000) Stack: d1210000 00000000 000002c4 cf940000 c043fc80 c0440054 f7caa294 c043fc80 f57cbde8 00000086 00000000 00000000 cf940000 f57ca000 f5c43000 00000001 0000000a d1210000 00000000 c048135f 00007ca3 c0129553 00000282 00007ca3 Call Trace: [<c0129553>] call_console_drivers [kernel] 0x63 (0xf57cbde8) [<c0129883>] printk [kernel] 0x153 (0xf57cbe20) [<c0217594>] device_size_calculation [kernel] 0x154 (0xf57cbe40) [<c021786d>] do_md_run [kernel] 0x1dd (0xf57cbe6c) [<c0129883>] printk [kernel] 0x153 (0xf57cbe88) [<c0215a45>] bind_rdev_to_array [kernel] 0xa5 (0xf57cbea8) [<c02186ed>] add_new_disk [kernel] 0x24d (0xf57cbec8) [<c021928c>] md_ioctl [kernel] 0x38c (0xf57cbeec) [<c0126154>] context_switch [kernel] 0xa4 (0xf57cbf60) [<c01b2a3f>] tty_write [kernel] 0x14f (0xf57cbf68) [<c016dbfe>] blkdev_ioctl [kernel] 0x3e (0xf57cbf80) [<c0178756>] sys_ioctl [kernel] 0xf6 (0xf57cbf94) Code: 8b 49 40 85 c9 0f 85 5f 02 00 00 8b 44 24 38 bf 01 00 00 00 Version-Release number of selected component (if applicable): All versions of RHEL 3 kernels up to the current RHN distribution (2.4.21-32.0.1.ELsmp). How reproducible: Each time and every time Steps to Reproduce: 1. connect linux box to SAN storage with multipath. 2. create a lun on SAN storage, and start up with SAN boot. 3. create two more luns on SAN storage, then reboot. e.g. /dev/sda: 50GB (including /, /boot, swap partition) /dev/sdb: 12GB /dev/sdc: 1GB /dev/sdd: multipath device for /dev/sda /dev/sde: multipath device for /dev/sdb /dev/sdf: multipath device for /dev/sdc 4. create a partition on /dev/sdc (multipath /dev/sdf) by parted, then assign them to /dev/md0 5. On shell> mdadm -C -lmp -n2 /dev/md0 /dev/sdc1 /dev/sdf1 6. removing /dev/sdb and /dev/sde on SAN storage, then reboot. now the device names have changed: previous /dev/sdc becomes /dev/sdb, and previous /dev/sdf becomes /dev/sdd. /dev/sda: 50GB (including /, /boot, swap partition) /dev/sdb: 1GB (previous /dev/sdc) /dev/sdd: multipath device for /dev/sda /dev/sde: multipath device for /dev/sdb (previous /dev/sdf) 7. after editing /etc/mdadm.conf, does a "mdadm -As /dev/md0" Actual result: kernel oops. Expected result: no oops. Additional Info: --- /etc/mdadm.conf --- DEVICE /dev/sd[abcdef][0-9] ARRAY /dev/md0 devices=/dev/sdb1,/dev/sdd1