158670 – mdadm fails to create array in multipath environment

Bug 158670 - mdadm fails to create array in multipath environment

Summary: mdadm fails to create array in multipath environment

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	mdadm
Sub Component:
Version:	4.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Doug Ledford
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	168429
TreeView+	depends on / blocked

Reported:	2005-05-24 18:44 UTC by David Milburn
Modified:	2007-11-30 22:07 UTC (History)
CC List:	2 users (show)
Fixed In Version:	RHBA-2006-0122
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-03-07 18:54:21 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2006:0122	0	qe-ready	SHIPPED_LIVE	mdadm bug fix update	2006-03-06 05:00:00 UTC

Description David Milburn 2005-05-24 18:44:16 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050302 Firefox/1.0.1 Fedora/1.0.1-1.3.2

Description of problem:
Customer cannot use the mdadm command in a multipath environment:

# mdadm -C /dev/md0 --level=multipath --raid-devices=2 /dev/sdxe1 /dev/sddg1
mdadm: ADD_NEW_DISK for /dev/sdxe1 failed: Value too large for defined data type

Customer can reproduce every time with a disk configuration consisting of
three targets.

Version-Release number of selected component (if applicable):
mdadm-1.6.0-2

How reproducible:
Always

Steps to Reproduce:
1. # mdadm -C /dev/md0 --level=multipath --raid-devices=2 /dev/sdxe1 /dev/sddg1
2.
3.
  

Actual Results:  # mdadm -C /dev/md0 --level=multipath --raid-devices=2 /dev/sdxe1 /dev/sddg1
mdadm: ADD_NEW_DISK for /dev/sdxe1 failed: Value too large for defined data type
# cat /proc/mdstat
Personalities : [multipath]
md0 : inactive
unused devices : <none>


Expected Results:  #mdadm -C /dev/md0 --level=multipath --raid-devices=2 /dev/sdxe1 /dev/sddg1
mdadm: array /dev/md0 started.
# cat /proc/mdstat
Personalities : [multipath]
md0 : active multipath sdxe1[1] sddg1[0]
3874688 blocks [2/2] [UU]
unsed devices: <none>


Additional info:

Hardware info:
The disk configuration consists of three target on a single path.
Target: 230000004c7f0761
      LUN0  1.9TB (LUN0+LUN1 exceeds 2TB)
      LUN1  130GB
      LUN2  4GB
      ...
      LUN253  4GB
Target: 210000004c7f0761
      LUN0  4GB
      ...
      LUN251  4GB
Target: 230000004c517f49
       LUN2  286GB

NEC examined the issue in details, and said as follows:

The cause of this problem is that the Create() in mdadm is calling a macro which does not support the extended major, minor numbers
in kernel2.6. It should be fixed by changing Create() in mdadm as follows.
  disk.major = MAJOR(stb.st_rdev)
  disk.minor = MINOR(stb.st_rdev)
should be changed to
  disk.major = major(stb.st_rdev)
  disk.minor = minor(stb.st_rdev)

Comment 2 Doug Ledford 2005-07-21 19:46:59 UTC

There is a test RPM at http://people.redhat.com/dledford/mdadm-1.6.0-3.i386.rpm

If you could please test this to see if it resolves your problem I would
appreciate it (I don't have access to any machines with enough devices attached
to them to trigger this problem).

Comment 12 Albert Graham 2005-11-06 21:25:47 UTC

I can confirm that is also fixes the same problem when using mdadm -C using aoe
raid devices. - Should this fix have been in RHEL 4U2 ? Will it be in U3 ?

[root@server3 latest]# rpm -qa|grep mdadm
mdadm-1.6.0-2
rpm -Uvh [root@server3 latest]# rpm -Uvh mdadm-1.6.0-3.i386.rpm
Preparing...                ########################################### [100%]
   1:mdadm                  ########################################### [100%]
[root@server3 latest]# mdadm --stop /dev/md0
[root@server3 latest]# mdadm -C /dev/md0 -l 1 -n 2 /dev/etherd/e1.1 /dev/etherd/e1.2
mdadm: array /dev/md0 started.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ :)
[root@server3 latest]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 etherd/e1.2[1] etherd/e1.1[0]
      4194240 blocks [2/2] [UU]
      [>....................]  resync =  0.7% (32128/4194240) finish=66.7min
speed=1036K/sec
unused devices: <none>

Comment 14 Doug Ledford 2005-11-29 16:20:47 UTC

Albert, can you please post the output of ls -l /dev/etherd/e1.[12] into the
bugzilla so I can verify that it is using large kdev types.  If you can do that
for me, then I'll go ahead and push this fix for U3.

Comment 15 Albert Graham 2005-11-29 18:39:43 UTC

Hi Doug,

System seems to support high minor numbers OK. I changed my config since so that
one mirror is a local disk (sdb) and the other is aoe disk (e5.5).

It's been up since the previous post, no problems.

Personalities : [raid1]
md0 : active raid1 etherd/e3.5p1[1] sdb1[0]
      732379136 blocks [2/2] [UU]

unused devices: <none>

# ls -l /dev/etherd/
total 0
brw-------  1 root disk 152, 848 Nov 13 22:54 e3.5
brw-------  1 root disk 152, 849 Nov 13 22:54 e3.5p1

Albert.

Comment 16 Doug Ledford 2005-11-29 19:55:49 UTC

Thanks Albert, that's exactly the confirmation I needed.  This verifies that it
creates arrays with high minor numbers OK, could you also verify things like
mdadm -E, mdadm -A, and mdadm -S on a device with high minor numbers also all
operate properly?

Comment 17 Doug Ledford 2005-12-02 09:15:10 UTC

This change has been committed to CVS.

Comment 18 Albert Graham 2005-12-02 10:13:58 UTC

OK, my /etc/mdadm.conf contains:
DEVICE partitions
ARRAY /dev/md0 devices=/dev/sdb1,/dev/etherd/e3.5p1


requested output as follows:

[root@server2 ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 etherd/e3.5p1[1] sdb1[0]
      732379136 blocks [2/2] [UU]

unused devices: <none>

[root@server2 ~]# mdadm -S /dev/md0

[root@server2 ~]# cat /proc/mdstat
Personalities : [raid1]
unused devices: <none>

[root@server2 ~]# mdadm -A /dev/md0
mdadm: /dev/md0 has been started with 2 drives.

[root@server2 ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[0] etherd/e3.5p1[1]
      732379136 blocks [2/2] [UU]
unused devices: <none>

[root@server2 ~]# mdadm -E /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : b8e6f99a:b1bdf09d:eb4c6088:11eff8f9
  Creation Time : Mon Nov 14 05:04:46 2005
     Raid Level : raid1
    Device Size : 732379136 (698.45 GiB 749.96 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Fri Nov 18 04:19:13 2005
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : c3a959e5 - correct
         Events : 0.3254


      Number   Major   Minor   RaidDevice State
this     0       8       17        0      active sync   /dev/sdb1
   0     0       8       17        0      active sync   /dev/sdb1
   1     1     152      849        1      active sync   /dev/etherd/e3.5p1


[root@server2 ~]# mdadm -E /dev/etherd/e3.5p1
/dev/etherd/e3.5p1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : b8e6f99a:b1bdf09d:eb4c6088:11eff8f9
  Creation Time : Mon Nov 14 05:04:46 2005
     Raid Level : raid1
    Device Size : 732379136 (698.45 GiB 749.96 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Fri Nov 18 04:19:13 2005
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : c3a95db7 - correct
         Events : 0.3254


      Number   Major   Minor   RaidDevice State
this     1     152      849        1      active sync   /dev/etherd/e3.5p1
   0     0       8       17        0      active sync   /dev/sdb1
   1     1     152      849        1      active sync   /dev/etherd/e3.5p1


Hope this helps.

Albert.

Comment 22 Red Hat Bugzilla 2006-03-07 18:54:22 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0122.html

Note You need to log in before you can comment on or make changes to this bug.