From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050512 Red Hat/1.7.8-1.1.3.1 Description of problem: [ First of all, yes, this isn't the latest kernel version for 2.1, or even close. This is a test system for one where we can't change the versions easily.] A rhel AS 2.1 box have two qla4010 iSCSI adapters (with qla4xxx-v3.22-2, the latest version), talking to an IBM DS300 iSCSI array. /etc/raidtab: raiddev /dev/md0 raid-level multipath persistent-superblock 1 nr-raid-disks 2 device /dev/sdb1 raid-disk 0 device /dev/sdd1 raid-disk 1 ######################################## raiddev /dev/md1 raid-level multipath persistent-superblock 1 nr-raid-disks 2 device /dev/sdc1 raid-disk 0 device /dev/sde1 raid-disk 1 dmesg(the md part of it): md: autorun ... md: considering sdd1 ... md: adding sdd1 ... md: adding sdb1 ... md: created md0 md: running: <sdd1><sdb1> md: multipath personality registered as nr 7 md0: max total readahead window set to 124k md0: 1 data-disks, max readahead per data-disk: 124k multipath: device sdd1 operational as IO path 0 multipath: making IO path sdb1 a spare path (not in sync) (checking disk 0) multipath: array md0 active with 1 out of 1 IO paths (1 spare IO paths) md: updating md0 RAID superblock on device md: ... autorun DONE. md: autorun ... md: considering sde1 ... md: adding sde1 ... md: adding sdc1 ... md: created md1 md: running: <sde1><sdc1> md1: max total readahead window set to 124k md1: 1 data-disks, max readahead per data-disk: 124k multipath: making IO path sde1 a spare path (not in sync) multipath: device sdc1 operational as IO path 0 (checking disk 0) multipath: array md1 active with 1 out of 1 IO paths (1 spare IO paths) md: updating md1 RAID superblock on device md: ... autorun DONE. md: mount(pid 302) used obsolete MD ioctl, upgrade your software to use new ictls. md: mount(pid 302) used obsolete MD ioctl, upgrade your software to use new ictls. So it runs them as active-passive. Sure. We failed (made an ext3, dd:d lots of data, pulled the plug) the active path, and it started using the other one after a while, and lots of messages.1:May 27 10:01:17 curtis kernel: I/O error: dev 08:41, sector 1053376 messages.1:May 27 10:01:17 curtis kernel: I/O error: dev 08:41, sector 1053496 messages.1:May 27 10:01:17 curtis kernel: I/O error: dev 08:41, sector 1053384 messages.1:May 27 10:01:17 curtis kernel: I/O error: dev 08:41, sector 1053504 Fine. So we re-plug the old path, and do (according to http://docs.hp.com/en/B9903-90012/ch08s03.html) raidsetfaulty -c raidtab /dev/md1 /dev/sde1 Which causes loads of messages.1:May 27 10:02:47 curtis kernel: multipath: sdc1: redirecting sector 816312 to another IO path messages.1:May 27 10:02:47 curtis kernel: multipath: sdc1: redirecting sector 816440 to another IO path messages.1:May 27 10:02:47 curtis kernel: multipath: sdc1: redirecting sector 81 but after that, the command hangs, in the D state. Sysrq-t gives: May 27 12:18:36 curtis kernel: raidsetfaulty D CC497E30 5000 4847 2386 (L-TLB) May 27 12:18:36 curtis kernel: Call Trace: [__wait_on_buffer+118/160] __wait_on_buffer [kernel] 0x76 (0xcc497e44) May 27 12:18:36 curtis kernel: Call Trace: [<c0146e66>] __wait_on_buffer [kernel] 0x76 (0xcc497e44) May 27 12:18:36 curtis kernel: [wait_for_locked_buffers+132/176] wait_for_locked_buffers [kernel] 0x84 (0xcc497e88) May 27 12:18:36 curtis kernel: [<c0147124>] wait_for_locked_buffers [kernel] 0x84 (0xcc497e88) May 27 12:18:36 curtis kernel: [sync_buffers+53/64] sync_buffers [kernel] 0x35 (0xcc497eb0) May 27 12:18:36 curtis kernel: [<c0147185>] sync_buffers [kernel] 0x35 (0xcc497eb0) May 27 12:18:36 curtis kernel: [fsync_no_super+22/32] fsync_no_super [kernel] 0x16 (0xcc497edc) May 27 12:18:36 curtis kernel: [<c0147266>] fsync_no_super [kernel] 0x16 (0xcc497edc) May 27 12:18:36 curtis kernel: [blkdev_put+64/224] blkdev_put [kernel] 0x40 (0xcc497ef4) May 27 12:18:36 curtis kernel: [<c014da70>] blkdev_put [kernel] 0x40 (0xcc497ef4) May 27 12:18:36 curtis kernel: [__fput+43/208] __fput [kernel] 0x2b (0xcc497f0c) May 27 12:18:36 curtis kernel: [<c0146b9b>] __fput [kernel] 0x2b (0xcc497f0c) May 27 12:18:36 curtis kernel: [filp_close+158/176] filp_close [kernel] 0x9e (0xcc497f38) May 27 12:18:36 curtis kernel: [<c01457ae>] filp_close [kernel] 0x9e (0xcc497f38) May 27 12:18:36 curtis kernel: [put_files_struct+77/224] put_files_struct [kernel] 0x4d (0xcc497f5c) May 27 12:18:36 curtis kernel: [<c011f1fd>] put_files_struct [kernel] 0x4d (0xcc497f5c) May 27 12:18:36 curtis kernel: [do_exit+311/624] do_exit [kernel] 0x137 (0xcc497f78) May 27 12:18:36 curtis kernel: [<c011fa47>] do_exit [kernel] 0x137 (0xcc497f78) May 27 12:18:36 curtis kernel: [blkdev_ioctl+38/64] blkdev_ioctl [kernel] 0x26 (0xcc497f80) May 27 12:18:36 curtis kernel: [<c014db56>] blkdev_ioctl [kernel] 0x26 (0xcc497f80) May 27 12:18:36 curtis kernel: [sys_ioctl+599/672] sys_ioctl [kernel] 0x257 (0xcc497f94) May 27 12:18:36 curtis kernel: [<c01558a7>] sys_ioctl [kernel] 0x257 (0xcc497f94) May 27 12:18:36 curtis kernel: [sys_ioctl+659/672] sys_ioctl [kernel] 0x293 (0xcc497fa4) May 27 12:18:36 curtis kernel: [<c01558e3>] sys_ioctl [kernel] 0x293 (0xcc497fa4) May 27 12:18:36 curtis kernel: [system_call+51/56] system_call [kernel] 0x33 (0xcc497fc0) May 27 12:18:36 curtis kernel: [<c01073e3>] system_call [kernel] 0x33 (0xcc497fc0) May 27 12:18:36 curtis kernel: (That's for the raidsetfaulty command; I have the rest of the dump if you want it.) All kinds of stuff stop working after that. We rebooted. The system came up fine with both paths. /August. Version-Release number of selected component (if applicable): kernel-smp-2.4.9-e.35 How reproducible: Didn't try Steps to Reproduce: 1. See above. 2. 3. Additional info:
RHEL2.1 is currently accepting only critical security fixes. This issue is outside the current scope of support.