Bug 995875 - conversion from 17 boot fails with raid1
conversion from 17 boot fails with raid1
Status: CLOSED EOL
Product: Fedora
Classification: Fedora
Component: fedup (Show other bugs)
19
i686 Linux
unspecified Severity urgent
: ---
: ---
Assigned To: Will Woods
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-11 14:06 EDT by Vince Herried
Modified: 2015-02-17 11:43 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-02-17 11:43:10 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
output of /boot/grub2/grub.cfg (9.39 KB, text/plain)
2013-08-12 15:52 EDT, Vince Herried
no flags Details

  None (edit)
Description Vince Herried 2013-08-11 14:06:57 EDT
Description of problem: did a fedup from f17 first fails 


Version-Release number of selected component (if applicable):
mdadm-3.2.6-19.fc17.i686
3.8.4-102.fc17.i686.PAE

How reproducible:
every time.

Steps to Reproduce:
1.install raid1 on f17
2.fedup upgrade to f18
3.attempt to boot

Actual results:
boot fails
console messages....
[ ok ] ....
      Starting Initialize storage subsystems ( RAID, LVM .....
[ TIME ] Timed out waiting for device dev=md125p3.device...
[DEPEND ] Dependency failed for /mnt/Win ....  ( mine )
[DEPEND ] Dependency failed for Local File Systems.
Welcome to emergency mode.  Use .....

Expected results:
A good first boot.

Additional info:
Same boot problem as on F17 on kernels after 3.8.4-102 which  I reported but was closed no-fix because 17 went out of service.

I attempted to change the grub config for the fedup boot to use
the old kernel with the upgrade parms.  System boots then immediately re-boots.
Comment 1 Jes Sorensen 2013-08-12 03:38:36 EDT
Vince,

You need to provide more details on your configuration, nobody can debug things
based on the little information you have provided. Hence:
1) Is the raid your / partition or just data partitions?
2) What type of raid are we talking? IMSM BIOS raid or regular md raid?
3) /proc/mdstat output
4) How did you run fedup? network, local media, or?

Jes
Comment 2 Vince Herried 2013-08-12 14:46:56 EDT
sorry about that.  this appears to be the same bug as id 959798
.
1.  all my partitions are raid.
2.  IMSM bios raid.
3.  cat /proc/mdstat ( from old 3.8.4-102 kernel
# cat /proc/mdstat 
Personalities : [raid1] 
md125 : active raid1 sda[1] sdb[0]
      312568832 blocks super external:/md127/0 [2/2] [UU]
      
md127 : inactive sda[1](S) sdb[0](S)
      4520 blocks super external:imsm
       
unused devices: <none>
# cat /etc/mdadm.conf
MAILADDR vince@planetvince.com
DEVICE /dev/sda* /dev/sdb*
ARRAY /dev/md127 metadata=imsm UUID=79eff8c4:6c26b3ad:a1c43742:f35ebb5c
ARRAY /dev/md/Volume0 container=/dev/md127 member=0 UUID=6657e630:f5a20d3d:ac6bdbc5:d3e2d1f0


3. the attempt to upgrade was fedup --network 18

-------------


Is there anything else I can do to help diagnose this issue?  On another  issue some one suggested removing quiet from boot parms?
Comment 3 Doug Ledford 2013-08-12 14:57:16 EDT
Can you paste the contents of the /etc/grub.conf file?  I wonder what the kernel command line fedup is building says about the devices/partitions...
Comment 4 Jes Sorensen 2013-08-12 15:07:22 EDT
I am pretty sure it's not the same bug, albeit they are related. I haven't
played with fedup but my guess is that it doesn't assemble the IMSM arrays
correctly when booting into the update kernel :(

When you hit emergency mode, my guess is that mdmon isn't launched correctly.

Does 'ps xu | grep dmon' provide any output?

The /etc/grub.conf file as Doug is suggesting would be very useful too.

Thanks,
Jes
Comment 5 Vince Herried 2013-08-12 15:52:25 EDT
Created attachment 785892 [details]
output of /boot/grub2/grub.cfg

In the attached I' ve edited it to remove "quiet" from  kernel parms, also
added attempt to boot with 'upgrade' parm into my only working kernel,
but that just comes up, then  re-boots asap.


I can't get a command line when this  guy fails so I can't enter ps command.
once I get  emergency mode, I'm stuck.



Wondering if I yank one of the drives will help.  Guessing not, but I"m getting desperate.
Comment 6 Jes Sorensen 2013-08-13 02:45:52 EDT
Vince,

It looks like fedup isn't raid aware and simply doesn't do what is needed
in order to assemble/launch the raids correctly during it's boot process.
I am not sure we are going to be able to find a way to fix this in the middle
of a bust upgrade :(

Are you able to boot normally if you use one of the older kernels? 

If you can, I would try booting the system into console mode, then yum
installing the fedora-release* rpms manually for Fedora 18, then running
yum update like described here:

https://fedoraproject.org/wiki/Upgrading_Fedora_using_yum?rd=YumUpgradeFaq

It's not the pretty solution, but if it gets you back into running mode, that
is at least something.

Regards,
Jes
Comment 7 Jes Sorensen 2013-08-16 04:04:04 EDT
I reproduced this problem by doing a fresh install of Fedora 17, with all
updates applied onto a RAID5 IMSM BIOS RAID array. 

Installed fedup and ran 'fedup --network 18' and rebooted - the system would
hang at boot and there was nothing I could do to get to a prompt.

It is almost certain this is caused by fedup not assembling the RAID arrays
correctly in the initramfs (and launching mdmon). I am not sure how fedup
differs from a normal dracut initramfs, but this needs to be fixed urgently
as we will see more people stuck with unbootable systems due to this flaw :(

This is going to fail for any RAID 1/4/5/6/10 BIOS RAID array

Reassigning to fedup

Jes
Comment 8 Will Woods 2013-08-16 14:26:55 EDT
The main difference between your regular initramfs and the fedup initramfs is that the fedup initramfs is not built on your system - which means it doesn't contain any of your custom config files or kernel modules.

However, if your system requires mdadm.conf to assemble the arrays correctly, fedup-0.7.3 should notice this and append /etc/mdadm.conf to its initramfs before the reboot.

See fedup.sysprep.prep_boot():

  https://github.com/wgwoods/fedup/blob/0.7.3/fedup/sysprep.py#L143

and fedup.boot.need_mdadmconf():

  https://github.com/wgwoods/fedup/blob/0.7.3/fedup/boot.py#L68

If there are *other* configuration files necessary to get your system to bring up its RAID array, please let me know so I can make fedup check for them and add them to initramfs.

Otherwise, check to see if mdadm.conf is actually inside the fedup initramfs - e.g. by booting the fedup image with "rd.break=cmdline" and looking in /etc.

So: for your system(s),
1) Is mdadm.conf necessary?
2) Is it present when you boot the fedup image?
3) Are there other config files that are needed to bring up the root device?
Comment 9 Vince Herried 2013-08-16 18:36:06 EDT
Unfortunately I'm not a guru on raid....  Answers are....
1.  As far as I know mdadm.conf is necessary.  I suppose I could hide it on my current machine and see.  If I rename /etc/mdadm.conf and try to boot ( f18 )
will that  answer your question?  
2.  it is  present, in /etc when I boot the fedup image.
3.  no idea.

Unfortunately now that I  have done the upgrade via yum I deleted the fedup image and also the old f17 kernel.  So I can't inspect the initrd file.

But!  I have another machine and discovered it will support the same kind of firmware raid.  So I grabbed a couple of empty disks and set them up as raid1 and attempted to install f19.  no go. it doesn't recognize any drives.

F17 recognized the drives.  I hope to do the install of  a dual boot system
in the next few days, apparently using f17 then to migrate to f18 or f19 ( if the mei problem gets fixed very soon ).

Would it make any sense to open  a problem record on the f19 fails to recognize firmware raid?
Comment 10 Vince Herried 2013-08-18 22:48:27 EDT
I don't know  what I did wrong the first time, but f19 recognized drives ok.
I installed raid array, then windoz then f19.  at moment waiting for the mei issue to  get resolved.
Comment 11 Jes Sorensen 2013-08-19 09:49:47 EDT
Will,

Does the fedup initramfs contain mdmon and the systemd file required to launch
it?

IMSM BIOS raid arrays require mdmon, and it looks like mdmon isn't being
launched correctly in this case.

Jes
Comment 12 Will Woods 2013-08-19 16:20:31 EDT
Yes:

[wwoods@metroid fedup]$ xz -dc f19-upgrade.img | cpio -t | grep mdmon
usr/lib/systemd/system/mdmon@.service
usr/lib/dracut/hooks/pre-shutdown/30-mdmon-pre-shutdown.sh
usr/lib/dracut/hooks/pre-udev/30-mdmon-pre-udev.sh
usr/sbin/mdmon

The fedup initramfs is just a regular dracut initramfs, built on a F19 system, using:

  dracut --no-hostonly --add system-upgrade

The problem might be that dracut's 90mdraid module takes some liberties with boot arguments. Instead of putting the rd.md.uuid arguments in the bootloader config, it builds them into the initramfs *when the initramfs is built*. Since we're not building the initramfs on the target system, fedup won't get the necessary boot args.

Future versions of dracut have a '--print-cmdline' flag that should generate the needed boot arguments for us. For now, though, we'll need to get fedup to generate and insert the arguments like 90mdraid would.

In the meantime: If I'm right about that being the cause of the problem, then you should be able to work around it by grabbing etc/cmdline.d/90mdraid.conf from your existing initramfs and adding that to the fedup initramfs.

Something like this might work:

  mkdir tmp-upgrade; cd tmp-upgrade
  sudo gzip -dc /boot/initramfs-$(uname -r).img | \
       cpio -iumd etc/cmdline.d etc/cmdline.d/*
  find etc | cpio -c -o | sudo bash -c 'cat >> /boot/initramfs-fedup.img'
  cd ..; rm -rf tmp-upgrade

Can anyone confirm that:
a) etc/cmdline.d/90mdraid.conf is present in your existing initramfs.img, and
b) adding that file to the fedup initramfs (or putting those arguments on the fedup boot line) makes the initramfs set up the raid device(s) correctly?
Comment 13 Jes Sorensen 2013-08-22 04:51:59 EDT
Will,

When is/was the fedup initramfs created? We had some issues with selinux
biting our rear ends when Fedora was released, I wonder if this is the
reason for it - see BZ#983141

Is there a way to get fedup to spit out /proc/mdstat and 'ps aux | grep dmon'
output once it is hanging?

That would tell us which raid arrays have been assembled, and whether mdmon
is really running (note the grep for 'dmon' not 'mdmon').

Jes
Comment 14 Will Woods 2013-08-22 13:54:50 EDT
fedup initramfs is created when the install images are built, at release time - just like the installer initramfs.

If the root device fails to assemble, dracut should give you a shell prompt after a minute or so - you can use that shell to get the info requested.

It's not clear to me if the problem here is that the upgrade fails to start, or that the system fails to start *after* the upgrade finishes. These would be very different failures.

Can anyone tell me when the boot failure occurs?
Comment 15 Fedora End Of Life 2013-12-21 09:27:54 EST
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 16 Jes Sorensen 2014-01-10 04:54:06 EST
We should not lose track of this due to Fedora 18 expiring.
Comment 17 Fedora End Of Life 2015-01-09 14:24:35 EST
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 18 Fedora End Of Life 2015-02-17 11:43:10 EST
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.