Bug 804058 - mdraid over multipath fails to set up multipath and raid correctly
mdraid over multipath fails to set up multipath and raid correctly
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: device-mapper-multipath (Show other bugs)
6.2
x86_64 Linux
unspecified Severity medium
: rc
: ---
Assigned To: Ben Marzinski
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-16 08:43 EDT by Franky Van Liedekerke
Modified: 2015-09-30 12:41 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-09-30 12:41:40 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/var/log/messages (525.95 KB, text/plain)
2012-03-16 08:46 EDT, Franky Van Liedekerke
no flags Details

  None (edit)
Description Franky Van Liedekerke 2012-03-16 08:43:01 EDT
Description of problem:

We have a classical setup of dmraid over multipath. We're experiencing exaclty what's specified here:
https://access.redhat.com/knowledge/solutions/48634 
To resolve:
- stop all dmraid raids
- multipath -F
- multipath -ll

Result: all multipaths are ok then

Version-Release number of selected component (if applicable):
mdadm-3.2.2-9.el6.x86_64
device-mapper-1.02.66-6.el6.x86_64
device-mapper-libs-1.02.66-6.el6.x86_64
device-mapper-event-libs-1.02.66-6.el6.x86_64
device-mapper-event-1.02.66-6.el6.x86_64
device-mapper-multipath-libs-0.4.9-46.el6.x86_64
device-mapper-multipath-0.4.9-46.el6.x86_64


How reproducible:
Always after a reboot, see also the attachment (/var/log/messages)


Actual results:
it seems half of the devices are used for dmraid, the other half for multipath (active/passive):

[root@ ~]# cat /proc/mdstat 
Personalities : [raid1] 
md125 : active (auto-read-only) raid1 sdc[1]
      104856504 blocks super 1.2 [2/1] [_U]
      
md126 : active (auto-read-only) raid1 sdb[1]
      104856504 blocks super 1.2 [2/1] [_U]
      
md127 : active (auto-read-only) raid1 sdd[1]
      104856504 blocks super 1.2 [2/1] [_U]

[root@ ~]# multipath -ll
VIRT-A20-2 (3600a0b80003314be00005c5d4f31e66a) dm-4 STK,FLEXLINE 380
size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| `- 2:0:1:2 sdm 8:192 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 1:0:1:2 sdg 8:96  active ghost running
VIRT-A20-1 (3600a0b80003314be00005c5b4f31e646) dm-3 STK,FLEXLINE 380
size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| `- 2:0:1:1 sdl 8:176 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 1:0:1:1 sdf 8:80  active ghost running
VIRT-A20-0 (3600a0b80003314be00005c594f31e627) dm-2 STK,FLEXLINE 380
size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| `- 2:0:1:0 sdk 8:160 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 1:0:1:0 sde 8:64  active ghost running


Expected results:
[root@ ~]# mdadm --stop /dev/md125 /dev/md126 /dev/md127
mdadm: stopped /dev/md125
mdadm: stopped /dev/md126
mdadm: stopped /dev/md127
[root@ ~]# multipath
create: VIRT-B18-0 (3600a0b800029b4540000771a4f3207c4) undef STK,FLEXLINE 380
size=100G features='0' hwhandler='1 rdac' wp=undef
|-+- policy='round-robin 0' prio=6 status=undef
| `- 1:0:0:0 sdb 8:16  undef ready running
`-+- policy='round-robin 0' prio=1 status=undef
  `- 2:0:0:0 sdh 8:112 undef ghost running
create: VIRT-B18-1 (3600a0b800029b4540000771c4f3207e4) undef STK,FLEXLINE 380
size=100G features='0' hwhandler='1 rdac' wp=undef
|-+- policy='round-robin 0' prio=6 status=undef
| `- 1:0:0:1 sdc 8:32  undef ready running
`-+- policy='round-robin 0' prio=1 status=undef
  `- 2:0:0:1 sdi 8:128 undef ghost running
create: VIRT-B18-2 (3600a0b800029b4540000771e4f320807) undef STK,FLEXLINE 380
size=100G features='0' hwhandler='1 rdac' wp=undef
|-+- policy='round-robin 0' prio=6 status=undef
| `- 1:0:0:2 sdd 8:48  undef ready running
`-+- policy='round-robin 0' prio=1 status=undef
  `- 2:0:0:2 sdj 8:144 undef ghost running
[root@ ~]# multipath -ll
VIRT-B18-2 (3600a0b800029b4540000771e4f320807) dm-13 STK,FLEXLINE 380
size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| `- 1:0:0:2 sdd 8:48  active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 2:0:0:2 sdj 8:144 active ghost running
VIRT-B18-1 (3600a0b800029b4540000771c4f3207e4) dm-12 STK,FLEXLINE 380
size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| `- 1:0:0:1 sdc 8:32  active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 2:0:0:1 sdi 8:128 active ghost running
VIRT-B18-0 (3600a0b800029b4540000771a4f3207c4) dm-11 STK,FLEXLINE 380
size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| `- 1:0:0:0 sdb 8:16  active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 2:0:0:0 sdh 8:112 active ghost running
VIRT-A20-2 (3600a0b80003314be00005c5d4f31e66a) dm-4 STK,FLEXLINE 380
size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| `- 2:0:1:2 sdm 8:192 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 1:0:1:2 sdg 8:96  active ghost running
VIRT-A20-1 (3600a0b80003314be00005c5b4f31e646) dm-3 STK,FLEXLINE 380
size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| `- 2:0:1:1 sdl 8:176 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 1:0:1:1 sdf 8:80  active ghost running
VIRT-A20-0 (3600a0b80003314be00005c594f31e627) dm-2 STK,FLEXLINE 380
size=100G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=6 status=active
| `- 2:0:1:0 sdk 8:160 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
  `- 1:0:1:0 sde 8:64  active ghost running




Additional info:
[root@ ~]# cat /etc/mdadm.conf 
DEVICE /dev/mapper/*

(we tried with /dev/disk/by-id as well, and with or without the actual raid configs in it, i.e. the output of mdadm --examine --scan)

[root@ ~]# dmsetup status
vg_00-lv_tmp: 0 4194304 linear 
vg_00-lv_home: 0 16777216 linear 
VIRT-A20-2: 0 209715200 multipath 2 0 1 0 2 1 A 0 1 0 8:192 A 0 E 0 1 0 8:96 A 0 
vg_00-lv_usr: 0 8388608 linear 
vg_00-lv_var: 0 8388608 linear 
VIRT-A20-1: 0 209715200 multipath 2 0 1 0 2 1 A 0 1 0 8:176 A 0 E 0 1 0 8:80 A 0 
VIRT-A20-0: 0 209715200 multipath 2 0 1 0 2 1 A 0 1 0 8:160 A 0 E 0 1 0 8:64 A 0 
vg_00-lv_switch: 0 62914560 linear 
vg_00-lv_swap: 0 4194304 linear 
vg_00-lv_root: 0 4194304 linear 
vg_00-lv_opt: 0 16777216 linear 

[root@ ~]# dmsetup table
vg_00-lv_tmp: 0 4194304 linear 8:2 16779264
vg_00-lv_home: 0 16777216 linear 8:2 2048
VIRT-A20-2: 0 209715200 multipath 1 queue_if_no_path 1 rdac 2 1 round-robin 0 1 1 8:192 1 round-robin 0 1 1 8:96 1 
vg_00-lv_usr: 0 8388608 linear 8:2 29362176
vg_00-lv_var: 0 8388608 linear 8:2 20973568
VIRT-A20-1: 0 209715200 multipath 1 queue_if_no_path 1 rdac 2 1 round-robin 0 1 1 8:176 1 round-robin 0 1 1 8:80 1 
VIRT-A20-0: 0 209715200 multipath 1 queue_if_no_path 1 rdac 2 1 round-robin 0 1 1 8:160 1 round-robin 0 1 1 8:64 1 
vg_00-lv_switch: 0 62914560 linear 8:2 37750784
vg_00-lv_swap: 0 4194304 linear 8:2 100665344
vg_00-lv_root: 0 4194304 linear 8:2 104859648
vg_00-lv_opt: 0 16777216 linear 8:2 109053952


[root@ ~]# more /etc/multipath.conf
blacklist {
        #devnode "cciss!c[0-9]d[0-9]*"
        #devnode "sda[0-9]*"
        wwid 3600508b1001cbe9603a597997098bbc3
}

defaults {
       polling_interval        15
       user_friendly_names yes
       verbosity                2
}

multipaths {
  # B18 6540
  multipath {
    alias VIRT-B18-2
    wwid 3600a0b800029b4540000771e4f320807
  }
  multipath {
    alias VIRT-B18-1
    wwid 3600a0b800029b4540000771c4f3207e4
  }
  multipath {
    alias VIRT-B18-0
    wwid 3600a0b800029b4540000771a4f3207c4
  }

  # A20 6540
  multipath {
    alias VIRT-A20-1
    wwid 3600a0b80003314be00005c5b4f31e646
  }
  multipath {
    alias VIRT-A20-2
    wwid 3600a0b80003314be00005c5d4f31e66a
  }
  multipath {
    alias VIRT-A20-0
    wwid 3600a0b80003314be00005c594f31e627
  }
}
Comment 1 Franky Van Liedekerke 2012-03-16 08:46:08 EDT
Created attachment 570596 [details]
/var/log/messages

Lot's of boot errors:

Mar 16 13:00:53 MYHOSTNAME kernel: Buffer I/O error on device sdj, logical block 0
Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:0:1: [sdi] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:0:1: [sdi] Sense Key : Illegal Request [current]
Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:0:1: [sdi] <<vendor>> ASC=0x94 ASCQ=0x1ASC=0x94 ASCQ=0x1
Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:0:1: [sdi] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00

and then (might be an indication why things go wrong):
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: multipath: version 1.3.0 loaded
Mar 16 13:00:53 MYHOSTNAME kernel: sd 1:0:0:0: rdac: LUN 0 (RDAC) (owned)
Mar 16 13:00:53 MYHOSTNAME kernel: sd 1:0:0:1: rdac: LUN 1 (RDAC) (owned)
Mar 16 13:00:53 MYHOSTNAME kernel: sd 1:0:0:2: rdac: LUN 2 (RDAC) (owned)
Mar 16 13:00:53 MYHOSTNAME kernel: sd 1:0:1:0: rdac: LUN 0 (RDAC) (unowned)
Mar 16 13:00:53 MYHOSTNAME kernel: sd 1:0:1:1: rdac: LUN 1 (RDAC) (unowned)
Mar 16 13:00:53 MYHOSTNAME kernel: sd 1:0:1:2: rdac: LUN 2 (RDAC) (unowned)
Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:0:0: rdac: LUN 0 (RDAC) (unowned)
Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:0:1: rdac: LUN 1 (RDAC) (unowned)
Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:0:2: rdac: LUN 2 (RDAC) (unowned)
Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:1:0: rdac: LUN 0 (RDAC) (owned)
Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:1:1: rdac: LUN 1 (RDAC) (owned)
Mar 16 13:00:53 MYHOSTNAME kernel: sd 2:0:1:2: rdac: LUN 2 (RDAC) (owned)
Mar 16 13:00:53 MYHOSTNAME kernel: rdac: device handler registered
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: multipath round-robin: version 1.0.0 loaded
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:2: multipath: error getting device
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:2: multipath: error getting device
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:2: multipath: error getting device
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:2: multipath: error getting device
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:2: multipath: error getting device
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:2: multipath: error getting device
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:5: multipath: error getting device
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:5: multipath: error getting device
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:5: multipath: error getting device
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:5: multipath: error getting device
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:5: multipath: error getting device
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: table: 253:5: multipath: error getting device
Mar 16 13:00:53 MYHOSTNAME kernel: device-mapper: ioctl: error adding target to table
Mar 16 13:00:53 MYHOSTNAME kernel: md: raid1 personality registered for level 1
Mar 16 13:00:53 MYHOSTNAME kernel: bio: create slab <bio-1> at 1
Mar 16 13:00:53 MYHOSTNAME kernel: md/raid1:md126: active with 1 out of 2 mirrors 
Mar 16 13:00:53 MYHOSTNAME kernel: md126: detected capacity change from 0 to 107373060096
Mar 16 13:00:53 MYHOSTNAME kernel: md126:
Mar 16 13:00:53 MYHOSTNAME kernel: md/raid1:md127: active with 1 out of 2 mirrors
Mar 16 13:00:53 MYHOSTNAME kernel: md127: detected capacity change from 0 to 107373060096 
Mar 16 13:00:53 MYHOSTNAME kernel: md127:
Mar 16 13:00:53 MYHOSTNAME kernel: md/raid1:md125: active with 1 out of 2 mirrors 
Mar 16 13:00:53 MYHOSTNAME kernel: md125: detected capacity change from 0 to 107373060096
Mar 16 13:00:53 MYHOSTNAME kernel: md125: unknown partition table
Comment 3 Franky Van Liedekerke 2012-03-16 09:05:18 EDT
Btw, after the commands
[root@ ~]# mdadm --stop /dev/md125 /dev/md126 /dev/md127
mdadm: stopped /dev/md125
mdadm: stopped /dev/md126
mdadm: stopped /dev/md127
[root@ ~]# multipath
<snip>

The raid devices are there (and using dm-devices) after a scan:
[root@ ~]# mdadm --assemble --scan
mdadm: /dev/md/0 has been started with 2 drives.
mdadm: /dev/md/1 has been started with 2 drives.
mdadm: /dev/md/2 has been started with 2 drives.
[root@ ~]# cat /proc/mdstat 
Personalities : [raid1] 
md2 : active raid1 dm-13[0] dm-4[1]
      104856504 blocks super 1.2 [2/2] [UU]
      
md1 : active raid1 dm-12[0] dm-3[1]
      104856504 blocks super 1.2 [2/2] [UU]
      
md0 : active raid1 dm-11[0] dm-2[1]
      104856504 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>
Comment 4 Franky Van Liedekerke 2012-03-16 09:56:52 EDT
Update: it seemed that an old module installed by somebody else caused this. Doing "dracut -o mpp" removed that module and now the multipath is ok. Now I just need to figure out why the mdraid is not starting, as I need to do "mdadm --assemble --scan" after boot for the raid devices to show up. Could it be that this as /etc/mdadm.conf is not ok:

DEVICE /dev/mapper/*
ARRAY /dev/md/0 metadata=1.2 UUID=c83042c0:99b95f57:927e18ad:0a8008c8 name=MYHOSTNAME:0
ARRAY /dev/md/1 metadata=1.2 UUID=14789ac3:b7a1d7df:6055c282:cc3c8ee3 name=MYHOSTNAME:1
ARRAY /dev/md/2 metadata=1.2 UUID=41ac393c:03b17717:bb89d8b4:eeef4176 name=MYHOSTNAME:2
Comment 5 Franky Van Liedekerke 2012-03-16 10:20:28 EDT
Btw, during my tests, I removed "rd_no_MD" and "rd_NO_DM" from the kernel command line at boot, could this be the cause of my raid not being there after reboot?
Comment 6 Heinz Mauelshagen 2012-03-16 10:24:07 EDT
mdraid rather than dmraid.
Summary fixed.
Comment 7 Heinz Mauelshagen 2012-03-16 10:28:03 EDT
(In reply to comment #5)
> Btw, during my tests, I removed "rd_no_MD" and "rd_NO_DM" from the kernel
> command line at boot, could this be the cause of my raid not being there after
> reboot?

Those disable MD raid and DM raid.
Because you want mdraid, could you try with the later (rd_no_DM) in and report if that fixes the issue?
Comment 8 Franky Van Liedekerke 2012-03-16 11:03:48 EDT
Sorry, I just noticed I had the old notation put in place: nodmraid was in the kernel line. From dmesg output:

[root@ ~]# dmesg |grep -i "command line"
Command line: ro root=/dev/mapper/vg_00-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_LVM_LV=vg_00/lv_root quiet SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=vg_00/lv_swap  KEYBOARDTYPE=pc KEYTABLE=be-latin1 nodmraid elevator=deadline 
Kernel command line: ro root=/dev/mapper/vg_00-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_LVM_LV=vg_00/lv_root quiet SYSFONT=latarcyrheb-sun16 crashkernel=130M@0M rd_LVM_LV=vg_00/lv_swap  KEYBOARDTYPE=pc KEYTABLE=be-latin1 nodmraid elevator=deadline 

[root@ ~]# dmesg |grep dracut
dracut: dracut-004-256.el6
dracut: rd_NO_LUKS: removing cryptoluks activation
dracut: Starting plymouth daemon
dracut: rd_NO_DM: removing DM RAID activation
dracut: rd_NO_MDIMSM: no MD RAID for imsm/isw raids
dracut: Scanning devices sda2  for LVM logical volumes vg_00/lv_root vg_00/lv_swap 
dracut: inactive '/dev/vg_00/lv_home' [8.00 GiB] inherit
dracut: inactive '/dev/vg_00/lv_tmp' [2.00 GiB] inherit
dracut: inactive '/dev/vg_00/lv_var' [4.00 GiB] inherit
dracut: inactive '/dev/vg_00/lv_usr' [4.00 GiB] inherit
dracut: inactive '/dev/vg_00/lv_swap' [2.00 GiB] inherit
dracut: inactive '/dev/vg_00/lv_root' [2.00 GiB] inherit
dracut: inactive '/dev/vg_00/lv_opt' [8.00 GiB] inherit
dracut: inactive '/dev/vg_00/lv_switch' [30.00 GiB] inherit
dracut: Mounted root filesystem /dev/mapper/vg_00-lv_root
dracut: 
dracut: Switching root

Since rd_no_DM and nodmraid are the same for dracut, that won't change a thing. But I believe to have read somewhere that rd_no_MD only postpones the init of the MD RAID until after the init phase was finished? Anyway, it seems the following lines in rc.sysinit don't have any effect now:

# Start any MD RAID arrays that haven't been started yet
[ -r /proc/mdstat -a -r /dev/md/md-device-map ] && /sbin/mdadm -IRs

Manually doing "/sbin/mdadm -IRs" didn't change anything as well, but "/sbin/mdadm -ARs" does work ...
Comment 9 RHEL Product and Program Management 2012-05-03 01:41:03 EDT
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.
Comment 10 Ben Marzinski 2015-09-30 12:41:40 EDT
According to Comment 4, the multipath issue was not actually a bug.

Note You need to log in before you can comment on or make changes to this bug.