Hide Forgot
Description of problem: We have a classical setup of dmraid over multipath. We're experiencing exaclty what's specified here: https://access.redhat.com/knowledge/solutions/48634 To resolve: - stop all dmraid raids - multipath -F - multipath -ll Result: all multipaths are ok then Version-Release number of selected component (if applicable): mdadm-3.2.2-9.el6.x86_64 device-mapper-1.02.66-6.el6.x86_64 device-mapper-libs-1.02.66-6.el6.x86_64 device-mapper-event-libs-1.02.66-6.el6.x86_64 device-mapper-event-1.02.66-6.el6.x86_64 device-mapper-multipath-libs-0.4.9-46.el6.x86_64 device-mapper-multipath-0.4.9-46.el6.x86_64 How reproducible: Always after a reboot, see also the attachment (/var/log/messages) Actual results: it seems half of the devices are used for dmraid, the other half for multipath (active/passive): [root@ ~]# cat /proc/mdstat Personalities : [raid1] md125 : active (auto-read-only) raid1 sdc[1] 104856504 blocks super 1.2 [2/1] [_U] md126 : active (auto-read-only) raid1 sdb[1] 104856504 blocks super 1.2 [2/1] [_U] md127 : active (auto-read-only) raid1 sdd[1] 104856504 blocks super 1.2 [2/1] [_U] [root@ ~]# multipath -ll VIRT-A20-2 (3600a0b80003314be00005c5d4f31e66a) dm-4 STK,FLEXLINE 380 size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=6 status=active | `- 2:0:1:2 sdm 8:192 active ready running `-+- policy='round-robin 0' prio=1 status=enabled `- 1:0:1:2 sdg 8:96 active ghost running VIRT-A20-1 (3600a0b80003314be00005c5b4f31e646) dm-3 STK,FLEXLINE 380 size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=6 status=active | `- 2:0:1:1 sdl 8:176 active ready running `-+- policy='round-robin 0' prio=1 status=enabled `- 1:0:1:1 sdf 8:80 active ghost running VIRT-A20-0 (3600a0b80003314be00005c594f31e627) dm-2 STK,FLEXLINE 380 size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=6 status=active | `- 2:0:1:0 sdk 8:160 active ready running `-+- policy='round-robin 0' prio=1 status=enabled `- 1:0:1:0 sde 8:64 active ghost running Expected results: [root@ ~]# mdadm --stop /dev/md125 /dev/md126 /dev/md127 mdadm: stopped /dev/md125 mdadm: stopped /dev/md126 mdadm: stopped /dev/md127 [root@ ~]# multipath create: VIRT-B18-0 (3600a0b800029b4540000771a4f3207c4) undef STK,FLEXLINE 380 size=100G features='0' hwhandler='1 rdac' wp=undef |-+- policy='round-robin 0' prio=6 status=undef | `- 1:0:0:0 sdb 8:16 undef ready running `-+- policy='round-robin 0' prio=1 status=undef `- 2:0:0:0 sdh 8:112 undef ghost running create: VIRT-B18-1 (3600a0b800029b4540000771c4f3207e4) undef STK,FLEXLINE 380 size=100G features='0' hwhandler='1 rdac' wp=undef |-+- policy='round-robin 0' prio=6 status=undef | `- 1:0:0:1 sdc 8:32 undef ready running `-+- policy='round-robin 0' prio=1 status=undef `- 2:0:0:1 sdi 8:128 undef ghost running create: VIRT-B18-2 (3600a0b800029b4540000771e4f320807) undef STK,FLEXLINE 380 size=100G features='0' hwhandler='1 rdac' wp=undef |-+- policy='round-robin 0' prio=6 status=undef | `- 1:0:0:2 sdd 8:48 undef ready running `-+- policy='round-robin 0' prio=1 status=undef `- 2:0:0:2 sdj 8:144 undef ghost running [root@ ~]# multipath -ll VIRT-B18-2 (3600a0b800029b4540000771e4f320807) dm-13 STK,FLEXLINE 380 size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=6 status=active | `- 1:0:0:2 sdd 8:48 active ready running `-+- policy='round-robin 0' prio=1 status=enabled `- 2:0:0:2 sdj 8:144 active ghost running VIRT-B18-1 (3600a0b800029b4540000771c4f3207e4) dm-12 STK,FLEXLINE 380 size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=6 status=active | `- 1:0:0:1 sdc 8:32 active ready running `-+- policy='round-robin 0' prio=1 status=enabled `- 2:0:0:1 sdi 8:128 active ghost running VIRT-B18-0 (3600a0b800029b4540000771a4f3207c4) dm-11 STK,FLEXLINE 380 size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=6 status=active | `- 1:0:0:0 sdb 8:16 active ready running `-+- policy='round-robin 0' prio=1 status=enabled `- 2:0:0:0 sdh 8:112 active ghost running VIRT-A20-2 (3600a0b80003314be00005c5d4f31e66a) dm-4 STK,FLEXLINE 380 size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=6 status=active | `- 2:0:1:2 sdm 8:192 active ready running `-+- policy='round-robin 0' prio=1 status=enabled `- 1:0:1:2 sdg 8:96 active ghost running VIRT-A20-1 (3600a0b80003314be00005c5b4f31e646) dm-3 STK,FLEXLINE 380 size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=6 status=active | `- 2:0:1:1 sdl 8:176 active ready running `-+- policy='round-robin 0' prio=1 status=enabled `- 1:0:1:1 sdf 8:80 active ghost running VIRT-A20-0 (3600a0b80003314be00005c594f31e627) dm-2 STK,FLEXLINE 380 size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw |-+- policy='round-robin 0' prio=6 status=active | `- 2:0:1:0 sdk 8:160 active ready running `-+- policy='round-robin 0' prio=1 status=enabled `- 1:0:1:0 sde 8:64 active ghost running Additional info: [root@ ~]# cat /etc/mdadm.conf DEVICE /dev/mapper/* (we tried with /dev/disk/by-id as well, and with or without the actual raid configs in it, i.e. the output of mdadm --examine --scan) [root@ ~]# dmsetup status vg_00-lv_tmp: 0 4194304 linear vg_00-lv_home: 0 16777216 linear VIRT-A20-2: 0 209715200 multipath 2 0 1 0 2 1 A 0 1 0 8:192 A 0 E 0 1 0 8:96 A 0 vg_00-lv_usr: 0 8388608 linear vg_00-lv_var: 0 8388608 linear VIRT-A20-1: 0 209715200 multipath 2 0 1 0 2 1 A 0 1 0 8:176 A 0 E 0 1 0 8:80 A 0 VIRT-A20-0: 0 209715200 multipath 2 0 1 0 2 1 A 0 1 0 8:160 A 0 E 0 1 0 8:64 A 0 vg_00-lv_switch: 0 62914560 linear vg_00-lv_swap: 0 4194304 linear vg_00-lv_root: 0 4194304 linear vg_00-lv_opt: 0 16777216 linear [root@ ~]# dmsetup table vg_00-lv_tmp: 0 4194304 linear 8:2 16779264 vg_00-lv_home: 0 16777216 linear 8:2 2048 VIRT-A20-2: 0 209715200 multipath 1 queue_if_no_path 1 rdac 2 1 round-robin 0 1 1 8:192 1 round-robin 0 1 1 8:96 1 vg_00-lv_usr: 0 8388608 linear 8:2 29362176 vg_00-lv_var: 0 8388608 linear 8:2 20973568 VIRT-A20-1: 0 209715200 multipath 1 queue_if_no_path 1 rdac 2 1 round-robin 0 1 1 8:176 1 round-robin 0 1 1 8:80 1 VIRT-A20-0: 0 209715200 multipath 1 queue_if_no_path 1 rdac 2 1 round-robin 0 1 1 8:160 1 round-robin 0 1 1 8:64 1 vg_00-lv_switch: 0 62914560 linear 8:2 37750784 vg_00-lv_swap: 0 4194304 linear 8:2 100665344 vg_00-lv_root: 0 4194304 linear 8:2 104859648 vg_00-lv_opt: 0 16777216 linear 8:2 109053952 [root@ ~]# more /etc/multipath.conf blacklist { #devnode "cciss!c[0-9]d[0-9]*" #devnode "sda[0-9]*" wwid 3600508b1001cbe9603a597997098bbc3 } defaults { polling_interval 15 user_friendly_names yes verbosity 2 } multipaths { # B18 6540 multipath { alias VIRT-B18-2 wwid 3600a0b800029b4540000771e4f320807 } multipath { alias VIRT-B18-1 wwid 3600a0b800029b4540000771c4f3207e4 } multipath { alias VIRT-B18-0 wwid 3600a0b800029b4540000771a4f3207c4 } # A20 6540 multipath { alias VIRT-A20-1 wwid 3600a0b80003314be00005c5b4f31e646 } multipath { alias VIRT-A20-2 wwid 3600a0b80003314be00005c5d4f31e66a } multipath { alias VIRT-A20-0 wwid 3600a0b80003314be00005c594f31e627 } }
Created attachment 570596 [details] /var/log/messages Lot's of boot errors: Mar 16 13:00:53 MYHOSTNAME kernel: Buffer I/O error on device sdj, logical block 0 Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:0:1: [sdi] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:0:1: [sdi] Sense Key : Illegal Request [current] Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:0:1: [sdi] <<vendor>> ASC=0x94 ASCQ=0x1ASC=0x94 ASCQ=0x1 Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:0:1: [sdi] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00 and then (might be an indication why things go wrong): Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: multipath: version 1.3.0 loaded Mar 16 13:00:53 MYHOSTNAME kernel: sd 1:0:0:0: rdac: LUN 0 (RDAC) (owned) Mar 16 13:00:53 MYHOSTNAME kernel: sd 1:0:0:1: rdac: LUN 1 (RDAC) (owned) Mar 16 13:00:53 MYHOSTNAME kernel: sd 1:0:0:2: rdac: LUN 2 (RDAC) (owned) Mar 16 13:00:53 MYHOSTNAME kernel: sd 1:0:1:0: rdac: LUN 0 (RDAC) (unowned) Mar 16 13:00:53 MYHOSTNAME kernel: sd 1:0:1:1: rdac: LUN 1 (RDAC) (unowned) Mar 16 13:00:53 MYHOSTNAME kernel: sd 1:0:1:2: rdac: LUN 2 (RDAC) (unowned) Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:0:0: rdac: LUN 0 (RDAC) (unowned) Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:0:1: rdac: LUN 1 (RDAC) (unowned) Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:0:2: rdac: LUN 2 (RDAC) (unowned) Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:1:0: rdac: LUN 0 (RDAC) (owned) Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:1:1: rdac: LUN 1 (RDAC) (owned) Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:1:2: rdac: LUN 2 (RDAC) (owned) Mar 16 13:00:53 MYHOSTNAME kernel: rdac: device handler registered Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: multipath round-robin: version 1.0.0 loaded Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:2: multipath: error getting device Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:2: multipath: error getting device Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:2: multipath: error getting device Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:2: multipath: error getting device Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:2: multipath: error getting device Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:2: multipath: error getting device Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:5: multipath: error getting device Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:5: multipath: error getting device Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:5: multipath: error getting device Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:5: multipath: error getting device Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:5: multipath: error getting device Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:5: multipath: error getting device Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table Mar 16 13:00:53 MYHOSTNAME kernel: md: raid1 personality registered for level 1 Mar 16 13:00:53 MYHOSTNAME kernel: bio: create slab <bio-1> at 1 Mar 16 13:00:53 MYHOSTNAME kernel: md/raid1:md126: active with 1 out of 2 mirrors Mar 16 13:00:53 MYHOSTNAME kernel: md126: detected capacity change from 0 to 107373060096 Mar 16 13:00:53 MYHOSTNAME kernel: md126: Mar 16 13:00:53 MYHOSTNAME kernel: md/raid1:md127: active with 1 out of 2 mirrors Mar 16 13:00:53 MYHOSTNAME kernel: md127: detected capacity change from 0 to 107373060096 Mar 16 13:00:53 MYHOSTNAME kernel: md127: Mar 16 13:00:53 MYHOSTNAME kernel: md/raid1:md125: active with 1 out of 2 mirrors Mar 16 13:00:53 MYHOSTNAME kernel: md125: detected capacity change from 0 to 107373060096 Mar 16 13:00:53 MYHOSTNAME kernel: md125: unknown partition table
Btw, after the commands [root@ ~]# mdadm --stop /dev/md125 /dev/md126 /dev/md127 mdadm: stopped /dev/md125 mdadm: stopped /dev/md126 mdadm: stopped /dev/md127 [root@ ~]# multipath <snip> The raid devices are there (and using dm-devices) after a scan: [root@ ~]# mdadm --assemble --scan mdadm: /dev/md/0 has been started with 2 drives. mdadm: /dev/md/1 has been started with 2 drives. mdadm: /dev/md/2 has been started with 2 drives. [root@ ~]# cat /proc/mdstat Personalities : [raid1] md2 : active raid1 dm-13[0] dm-4[1] 104856504 blocks super 1.2 [2/2] [UU] md1 : active raid1 dm-12[0] dm-3[1] 104856504 blocks super 1.2 [2/2] [UU] md0 : active raid1 dm-11[0] dm-2[1] 104856504 blocks super 1.2 [2/2] [UU] unused devices: <none>
Update: it seemed that an old module installed by somebody else caused this. Doing "dracut -o mpp" removed that module and now the multipath is ok. Now I just need to figure out why the mdraid is not starting, as I need to do "mdadm --assemble --scan" after boot for the raid devices to show up. Could it be that this as /etc/mdadm.conf is not ok: DEVICE /dev/mapper/* ARRAY /dev/md/0 metadata=1.2 UUID=c83042c0:99b95f57:927e18ad:0a8008c8 name=MYHOSTNAME:0 ARRAY /dev/md/1 metadata=1.2 UUID=14789ac3:b7a1d7df:6055c282:cc3c8ee3 name=MYHOSTNAME:1 ARRAY /dev/md/2 metadata=1.2 UUID=41ac393c:03b17717:bb89d8b4:eeef4176 name=MYHOSTNAME:2
Btw, during my tests, I removed "rd_no_MD" and "rd_NO_DM" from the kernel command line at boot, could this be the cause of my raid not being there after reboot?
mdraid rather than dmraid. Summary fixed.
(In reply to comment #5) > Btw, during my tests, I removed "rd_no_MD" and "rd_NO_DM" from the kernel > command line at boot, could this be the cause of my raid not being there after > reboot? Those disable MD raid and DM raid. Because you want mdraid, could you try with the later (rd_no_DM) in and report if that fixes the issue?
Sorry, I just noticed I had the old notation put in place: nodmraid was in the kernel line. From dmesg output: [root@ ~]# dmesg |grep -i "command line" Command line: ro root=/dev/mapper/vg_00-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_LVM_LV=vg_00/lv_root quiet SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=vg_00/lv_swap KEYBOARDTYPE=pc KEYTABLE=be-latin1 nodmraid elevator=deadline Kernel command line: ro root=/dev/mapper/vg_00-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_LVM_LV=vg_00/lv_root quiet SYSFONT=latarcyrheb-sun16 crashkernel=130M@0M rd_LVM_LV=vg_00/lv_swap KEYBOARDTYPE=pc KEYTABLE=be-latin1 nodmraid elevator=deadline [root@ ~]# dmesg |grep dracut dracut: dracut-004-256.el6 dracut: rd_NO_LUKS: removing cryptoluks activation dracut: Starting plymouth daemon dracut: rd_NO_DM: removing DM RAID activation dracut: rd_NO_MDIMSM: no MD RAID for imsm/isw raids dracut: Scanning devices sda2 for LVM logical volumes vg_00/lv_root vg_00/lv_swap dracut: inactive '/dev/vg_00/lv_home' [8.00 GiB] inherit dracut: inactive '/dev/vg_00/lv_tmp' [2.00 GiB] inherit dracut: inactive '/dev/vg_00/lv_var' [4.00 GiB] inherit dracut: inactive '/dev/vg_00/lv_usr' [4.00 GiB] inherit dracut: inactive '/dev/vg_00/lv_swap' [2.00 GiB] inherit dracut: inactive '/dev/vg_00/lv_root' [2.00 GiB] inherit dracut: inactive '/dev/vg_00/lv_opt' [8.00 GiB] inherit dracut: inactive '/dev/vg_00/lv_switch' [30.00 GiB] inherit dracut: Mounted root filesystem /dev/mapper/vg_00-lv_root dracut: dracut: Switching root Since rd_no_DM and nodmraid are the same for dracut, that won't change a thing. But I believe to have read somewhere that rd_no_MD only postpones the init of the MD RAID until after the init phase was finished? Anyway, it seems the following lines in rc.sysinit don't have any effect now: # Start any MD RAID arrays that haven't been started yet [ -r /proc/mdstat -a -r /dev/md/md-device-map ] && /sbin/mdadm -IRs Manually doing "/sbin/mdadm -IRs" didn't change anything as well, but "/sbin/mdadm -ARs" does work ...
Since RHEL 6.3 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
According to Comment 4, the multipath issue was not actually a bug.