Bug 1156614

Summary: mdraid set name different between anaconda and installed system - causes failure of installed system to boot
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: anacondaAssignee: Brian Lane <bcl>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 21CC: agk, anaconda-maint-list, dledford, g.kaviyarasu, Jes.Sorensen, jonathan, kparal, robatino, vanmeeuwen+fedora
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: lorax-21.28-1.fc21 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1158110 (view as bug list) Environment:
Last Closed: 2014-11-16 14:42:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Adam Williamson 2014-10-24 19:07:52 UTC
Testing Fedora 21 Beta RC1 Server x86_64 DVD install to an Intel firmware RAID set (RAID-0), with https://dlehman.fedorapeople.org/updates/updates-1156534.2.img to avoid bug #1156534.

The installation completes successfully, but the installed system fails to boot. It fails because it can't mount /boot. /etc/fstab has this:

/dev/md/Volume0_0p1     /boot     (normal options)

but what actually exists in /dev/md is:

/dev/md/imsm0 -> ../md127
/dev/md/Volume0 -> ../md126
/dev/md/Volume0p1 -> ../md126p1
/dev/md/Volume0p2 -> ../md126p2

note, Volume0, not Volume0_0.

/etc/mdadm.conf has this:

# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md/Volume0_0 UUID=4786ac39-ee6f-1bea-4821-6193be13503b
ARRAY /dev/md/imsm0 UUID=213b64a5-0337-c96b-f87c-2c75dfd16a99

If I reboot and check in the anaconda environment, then within anaconda the entries in /dev/md are indeed Volume0_0, Volume0_0p1 and Volume0_0p2. So they have different names in the anaconda environment vs. the booted system, but I'm not sure why.

Comment 1 Adam Williamson 2014-10-24 19:09:05 UTC
Proposing as a Beta blocker per fw RAID criterion: "The installer must be able to detect and install to hardware or firmware RAID storage devices."

https://fedoraproject.org/wiki/Fedora_21_Beta_Release_Criteria#Hardware_and_firmware_RAID

Comment 2 Adam Williamson 2014-10-24 19:21:18 UTC
for the record, the earlier bug that dlehman's update is for is https://bugzilla.redhat.com/show_bug.cgi?id=1156354 (6354, not 6534).

Comment 3 David Lehman 2014-10-24 19:25:48 UTC
I think the problem boils down to the following: when auto-assembling the array during installer boot, it gets the '_0' suffix since there is no hostname set (and probably no homehost in the fwraid superblock, either). When we reboot after installation, the array is again auto-assembled, but this time is treated as local (no '_0' suffix).

Doug (or Jes), can you explain this a bit further so I can have enough information to find a way to make the names match across reboot?

Comment 4 Adam Williamson 2014-10-24 20:56:39 UTC
dlehman provided a further updates image:

https://dlehman.fedorapeople.org/updates/updates-1156534.4.img

which includes his proposed fix for #1156354 and a fix for this bug. It doesn't change the fact that the set is 'Volume0_0' in anaconda but 'Volume0' on the installed system, but it identifies the /boot partition by UUID rather than by /dev/md partition node name when writing /etc/fstab . That approach does solve the problem at least in my test scenario: for the first time on F21, with this updates.img on top of Beta RC1, I can install to an fwraid set and boot the installed system.

I'm not gonna pretend I can evaluate whether this approach might cause other problems somehow, but I can say at least that it fixes my test case.

Comment 5 Kamil Páral 2014-10-27 12:09:33 UTC
I can confirm this bug also on my fwraid machine (same symptoms), and the linked updates.img fixes the problem for me as well.

Comment 6 Doug Ledford 2014-10-27 14:53:35 UTC
The best way to make sure this isn't an issue during install is to simply turn off the HOMEHOST checking during install.  A simple mdadm.conf file in the install image that has these two lines would do the trick:

AUTO -all
HOMEHOST <any>

Comment 7 Adam Williamson 2014-10-27 19:04:25 UTC
Discussed at 2014-10-27 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2014-10-27/f21-blocker-review.2014-10-27-16.01.log.txt . Accepted as a blocker per criterion cited in c#2.

Comment 8 Brian Lane 2014-10-28 15:18:06 UTC
The plan here is to:

1. Add a mdadm.conf to /usr/share/anaconda/
2. Have lorax move this to /etc/ in the rootfs
3. Live spins will move this into place in %post

I'll handle this for anaconda and lorax using this bug and clone it for spin-kickstarts.

Comment 9 Adam Williamson 2014-10-28 15:30:01 UTC
Note that anaconda is already writing a /etc/mdadm.conf to the installed system, though I don't know the details of how, when, or why. I included its contents in my initial report:

# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md/Volume0_0 UUID=4786ac39-ee6f-1bea-4821-6193be13503b
ARRAY /dev/md/imsm0 UUID=213b64a5-0337-c96b-f87c-2c75dfd16a99

Comment 10 Adam Williamson 2014-10-28 15:30:39 UTC
oh, I see, this is about having one in the installer environment. Gotcha.

Comment 11 Doug Ledford 2014-10-28 16:58:34 UTC
(In reply to bcl from comment #8)
> The plan here is to:
> 
> 1. Add a mdadm.conf to /usr/share/anaconda/

Sure.

> 2. Have lorax move this to /etc/ in the rootfs

OK.  As I recall, lorax is the name of our engine that builds the install image from the repos, so this translates to "The install DVD will have this file in the /etc/ directory as burned to disc."  All good there.

> 3. Live spins will move this into place in %post

This I'm not so sure about.  Do you mean that when the live spin boots up, some %post section will move the file to the live spin's /etc, or do you mean if you do an install from a live spin, then the file will get moved over in the %post of the install to /mnt/sysimage/etc and therefore be on the final, installed root filesystem?  If the latter, that would be a mistake.  The sample mdadm.conf I gave you would prevent the Volume0_0 stuff, but it also prevents auto assembly, which you do *not* want on the final installed system.  Anaconda takes care of array assembly during the detection and creation phase of disk operations so it's safe (and even desirable) in the installer.

> I'll handle this for anaconda and lorax using this bug and clone it for
> spin-kickstarts.

Comment 12 Brian Lane 2014-10-28 17:07:52 UTC
(In reply to Doug Ledford from comment #11)
> (In reply to bcl from comment #8)
> > The plan here is to:
> > 
> > 1. Add a mdadm.conf to /usr/share/anaconda/
> 
> Sure.
> 
> > 2. Have lorax move this to /etc/ in the rootfs
> 
> OK.  As I recall, lorax is the name of our engine that builds the install
> image from the repos, so this translates to "The install DVD will have this
> file in the /etc/ directory as burned to disc."  All good there.
> 

correct.

> > 3. Live spins will move this into place in %post
> 
> This I'm not so sure about.  Do you mean that when the live spin boots up,
> some %post section will move the file to the live spin's /etc, or do you

Correct, it needs to happen at live boot time so that it doesn't show up in the filesystem that gets copied to the target system. This can be done in the same script that creates the live user and sets up the autologin.

Comment 13 Doug Ledford 2014-10-28 17:21:59 UTC
OK, sounds like a good plan to me then.  Thanks for the clarification.

Comment 14 Fedora Update System 2014-10-28 18:17:49 UTC
anaconda-21.48.13-1.fc21, python-blivet-0.61.8-1.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/python-blivet-0.61.8-1.fc21,anaconda-21.48.13-1.fc21

Comment 15 Adam Williamson 2014-10-28 18:56:45 UTC
Doug, dlehman and I are both a bit suspicious of the mdadm.conf snippet you provided:

AUTO -all
HOMEHOST <any>

from 'man mdadm.conf' it doesn't look like it does the right thing at all. It seems to entirely disable automatic assembly of arrays, but not actually do anything to disable HOMEHOST checking (the <any> value isn't documented for HOMEHOST at all).

We both would've expected this:

HOMEHOST <ignore>

Are we misunderstanding, is the manpage wrong, or is the snippet you suggested wrong? Thanks!

Comment 16 Doug Ledford 2014-10-28 19:06:36 UTC
My snippet is wrong.  I went from memory.  Please use ignore instead of any.

Comment 17 Adam Williamson 2014-10-28 19:08:40 UTC
And the AUTO -all is not required, right? We think it's wrong because we don't think we actually want to disable auto-assembly of arrays.

Comment 18 Adam Williamson 2014-10-28 20:38:32 UTC
For clarity, for Beta purposes: a sufficient change to avoid this issue being encountered on FW RAID installs was included in blivet 0.61-8: "Wait for udev to settle before collecting UUID for new filesystems." The change to include a /etc/mdadm.conf in the installer/live environments is still apparently desired, but is not necessary for Beta, is not considered part of the 'beta blocking' aspect of this bug, and is not included in Beta RC1. It may be best to make it a separate bug, for ease of understanding.

Comment 19 Adam Williamson 2014-10-28 20:38:59 UTC
"is not included in Beta RC1"....I meant "is not included in Beta RC2", though of course it isn't in RC1 either.

Comment 20 Kamil Páral 2014-10-29 14:14:46 UTC
This works well with Beta RC2. Setting to verified. Please create a new ticket for the "better fix" approach, or reuse this one once it is closed automatically by bodhi.

Comment 21 Fedora Update System 2014-10-31 02:42:46 UTC
anaconda-21.48.13-1.fc21, python-blivet-0.61.8-1.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 22 Kamil Páral 2014-11-03 14:49:08 UTC
I'm reopening this for the purposes of tracking the "proper fix". If you prefer a separate ticket, please fork it and close this one.

One further question, though. How problematic/inconvenient the current temporary fix is? Is there any reason to propose the proper fix to block Final release? Thanks.

Comment 23 Adam Williamson 2014-11-03 23:46:42 UTC
Blocker status is no longe required.

Comment 24 Brian Lane 2014-11-04 01:23:27 UTC
lorax-21.27-1 will have 'HOMEHOST <ignore>'

Comment 25 Fedora Update System 2014-11-06 01:41:20 UTC
lorax-21.27-1.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/lorax-21.27-1.fc21

Comment 26 Fedora Update System 2014-11-07 05:31:46 UTC
Package lorax-21.28-1.fc21:
* should fix your issue,
* was pushed to the Fedora 21 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing lorax-21.28-1.fc21'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-14450/lorax-21.28-1.fc21
then log in and leave karma (feedback).

Comment 27 Fedora Update System 2014-11-16 14:42:11 UTC
lorax-21.28-1.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.