683605 – raid creation doesn't do the right thing with --spares during kickstart

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 683605 - raid creation doesn't do the right thing with --spares during kickstart

Summary: raid creation doesn't do the right thing with --spares during kickstart

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	anaconda
Sub Component:
Version:	6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	David Lehman
QA Contact:	Release Test Team
Docs Contact:
URL:
Whiteboard:
Depends On:	690469
Blocks:
TreeView+	depends on / blocked

Reported:	2011-03-09 20:30 UTC by Jeff Vier
Modified:	2018-11-27 21:41 UTC (History)
CC List:	2 users (show)
Fixed In Version:	anaconda-13.21.107-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	683956 (view as bug list)
Environment:
Last Closed:	2011-05-19 12:38:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:0530	0	normal	SHIPPED_LIVE	anaconda bug fix and enhancement update	2011-05-18 17:44:52 UTC

Description Jeff Vier 2011-03-09 20:30:58 UTC

Description of problem:
This config (snippet from my actual kickstart config):
 part raid.0 --ondisk=sda --asprimary --fstype ext4 --size=10000 --grow
 part raid.1 --ondisk=sdb --asprimary --fstype ext4 --size=10000 --grow
 part raid.2 --ondisk=sdc --asprimary --fstype ext4 --size=10000 --grow
 raid /     --level=1 --device=md0 --spares=1  --fstype=ext4 raid.0 raid.1 raid.2

Should result in a two-device mirror + one spare.

Instead, it's a three-device mirror with no spare.


Also:
 raid /data --level=10 --device=md2 --spares=1 --fstype=ext4 raid.20 raid.21 raid.22 raid.23 raid.24 raid.25 raid.26 raid.27 raid.28

Results in a 9-device RAID 10 device.  I'm not even sure how that's possible :)


Version-Release number of selected component (if applicable):
uname -a:
Linux sync13.db.scl2.svc.mozilla.com 2.6.32-71.el6.x86_64 #1 SMP Wed Sep 1 01:33:01 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

mdadm is v1.1, if that matters.

How reproducible:
Every time.

Steps to Reproduce:
1. Configure kickstart as above
2. kickstart
3. mdadm --detail /dev/mdN

Note "Active Devices" vs "Spare Devices"

Further confirmed with `iostat -dx 1 | grep sd[abc]` while doing a heavy disk-write.  All three devices are lockstep in their write load.
  
Actual results:
No spares -- all devices are "active".

Expected results:
Number of spares specified used as spares.

Additional info:
Examples above are ext4, but I have reproduced with ext2, as well.

Comment 1 Jeff Vier 2011-03-09 20:41:24 UTC

Also, in the mean time, can you recommend a work-around to "convert" an active device to a spare?

Comment 3 David Lehman 2011-03-09 21:57:56 UTC

Look at the mdadm man page, in the section titled "MANAGE MODE" (you can search for that text). It gives an example of a single command to do exactly what you want done.

Comment 4 Jeff Vier 2011-03-09 22:15:57 UTC

(In reply to comment #3)
> Look at the mdadm man page, in the section titled "MANAGE MODE" (you can search
> for that text). It gives an example of a single command to do exactly what you
> want done.

Yeah, I tried that -- it does not re-add as a spare, but just back in as an active member.

# mdadm /dev/md127 -f /dev/sdc2 -r /dev/sdc2 -a /dev/sdc2
mdadm: set /dev/sdc2 faulty in /dev/md127
mdadm: hot removed /dev/sdc2 from /dev/md127
mdadm: re-added /dev/sdc2

# mdadm --detail /dev/md127
/dev/md127:
        Version : 1.0
  Creation Time : Wed Feb  9 00:56:40 2011
     Raid Level : raid1
     Array Size : 102388 (100.01 MiB 104.85 MB)
  Used Dev Size : 102388 (100.01 MiB 104.85 MB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Wed Mar  9 14:14:39 2011
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

           Name : localhost.localdomain:1
           UUID : e3182825:0d34dd8a:a03bddb0:9006954f
         Events : 80

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
       2       8       34        2      active sync   /dev/sdc2

Comment 5 David Lehman 2011-03-09 23:31:30 UTC

If you read the description of the --add command you'll see how that behavior is determined. It looks like you'd need to wipe the device of all metadata in order for it be re-added as a spare instead of as an active member. The wipefs utility could be used to do that in between the remove and add operations. It would look like this:

  mdadm -f /dev/sdc2
  mdadm -r /dev/sdc2
  wipefs -a /dev/sdc2
  mdadm /dev/md127 -a /dev/sdc2

WARNING: the wipefs command will invalidate/destroy any data on /dev/sdc2

Comment 6 Jeff Vier 2011-03-09 23:44:24 UTC

Still no dice (I had actually done an attempt at the same kind of thing with dd, pushing /dev/zero and then /dev/urandom onto /dev/sdc2).

Here's what I did just now (per your instructions):

 # mdadm --manage /dev/md127 --fail /dev/sdc2
mdadm: set /dev/sdc2 faulty in /dev/md127
 # mdadm --manage /dev/md127 --remove faulty
mdadm: hot removed 8:34 from /dev/md127
 # wipefs -a /dev/sdc2
2 bytes [53 ef] erased at offset 0x438 (ext2)
4 bytes [fc 4e 2b a9] erased at offset 0x63fe000 (linux_raid_member)
 # mdadm --manage /dev/md127 --add /dev/sdc2
mdadm: added /dev/sdc2
 # mdadm --detail /dev/md127
/dev/md127:
        Version : 1.0
  Creation Time : Wed Feb  9 00:56:40 2011
     Raid Level : raid1
     Array Size : 102388 (100.01 MiB 104.85 MB)
  Used Dev Size : 102388 (100.01 MiB 104.85 MB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Wed Mar  9 15:43:13 2011
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

           Name : localhost.localdomain:1
           UUID : e3182825:0d34dd8a:a03bddb0:9006954f
         Events : 220

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
       3       8       34        2      active sync   /dev/sdc2

Comment 7 Jeff Vier 2011-03-10 19:36:05 UTC

Should *that* behavior (mdadm really, really wanting to drag the device back in as an active member, despite the man page describing differently) be a separate bug?

Comment 8 David Lehman 2011-03-10 19:41:24 UTC

Yes -- it probably should.

Comment 9 Jeff Vier 2011-03-10 20:23:31 UTC

(In reply to comment #8)
> Yes -- it probably should.

done in bug 683976

Comment 10 RHEL Program Management 2011-03-16 16:30:01 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 13 Jan Stodola 2011-04-04 06:58:49 UTC

tested on build RHEL6.1-20110330.2 with anaconda-13.21.108-1.el6 using kickstart with the following part:

part raid.0 --ondisk=dasdc --asprimary --fstype ext4 --size=2000 --grow
part raid.1 --ondisk=dasdd --asprimary --fstype ext4 --size=2000 --grow
part raid.2 --ondisk=dasde --asprimary --fstype ext4 --size=2000 --grow
raid /      --level=1 --device=md0 --spares=1  --fstype=ext4 raid.0 raid.1 raid.2

After the installation, there was one spare device:

[root@rtt6 ~]# mdadm --detail /dev/md0 
/dev/md0:
        Version : 1.1
  Creation Time : Mon Apr  4 02:52:48 2011
     Raid Level : raid1
     Array Size : 2402952 (2.29 GiB 2.46 GB)
  Used Dev Size : 2402952 (2.29 GiB 2.46 GB)
   Raid Devices : 2
  Total Devices : 3
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Apr  4 02:58:53 2011
          State : active
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1

           Name : 
           UUID : 9afb610e:50af809f:6e17fcbe:bbde0244
         Events : 32

    Number   Major   Minor   RaidDevice State
       0      94        1        0      active sync   /dev/dasda1
       1      94        5        1      active sync   /dev/dasdb1

       2      94        9        -      spare   /dev/dasdc1

Moving to VERIFIED.

Comment 14 errata-xmlrpc 2011-05-19 12:38:48 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0530.html

Note You need to log in before you can comment on or make changes to this bug.