Bug 1112092 - after yum update, system no longer boots
Summary: after yum update, system no longer boots
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: mdadm
Version: 20
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Jes Sorensen
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-23 06:46 UTC by Turgut Kalfaoglu
Modified: 2015-09-28 11:22 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-06-29 21:16:48 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
boot error (184.73 KB, image/jpeg)
2014-06-23 06:47 UTC, Turgut Kalfaoglu
no flags Details

Description Turgut Kalfaoglu 2014-06-23 06:46:49 UTC
This AMD system has been working fine with Fedora 20 for months.

It consists of two identical drives working as RAID 1.

These drives only have 1 partition: the root.

After yesterday's yum update, the system no longer boots. It waits for a long time at startup, and then dies with a dracut error,
"dracut initqueue device uuid not found" (and it shows the UUID of the raid system), and drops me in dracut rescue prompt. At this prompt, mdadm shows NO raid devices found, cat /proc/mdstat is likewise missing.

However, when I boot from live F20 USB, it finds the raid system fine, and working -- I see both devices in /proc/mdstat, and I can mount the /dev/md127 just fine.   I even did an fsck -f /dev/md127 and it was fine.

Likewise, if from grub2 menu I pick the third kernel listed (3.14.5-200), that boots the system right up!   So only the newest two grub entries (kernel 3.14.7 and 3.14.8) fail to boot the system.
(Yes; I checked to see what was different about the third entry versus the first two, and only the kernel and initrd version numbers are different)

Note: After some more fiddling with mkinitrd, I managed to lose the third kernel as well. Only the "rescue" kernel listed now boots the machine

Comment 1 Turgut Kalfaoglu 2014-06-23 06:47:42 UTC
Created attachment 911292 [details]
boot error

Comment 2 Turgut Kalfaoglu 2014-06-23 06:54:23 UTC
Comment on attachment 911292 [details]
boot error

Note: The "warning: could not boot" line appears about 5 minutes after the first few lines are displayed.

Comment 3 Jes Sorensen 2014-06-23 07:32:05 UTC
Hi,

There should be nothing new mdadm related in recent yum updates, so I wonder
if this is related to the kernel.

Once you hit the dracut shell, could you please run the following:
cat /proc/mdstat
and also supply your /etc/mdadm.conf and /etc/fstab ?

Could you also provide a copy of /proc/mdstat for when the system is booted
succesfully using the older kernel?

We also would need 'rpm -q mdadm dracut' output.

Thanks,
Jes

Comment 4 Turgut Kalfaoglu 2014-06-23 08:32:19 UTC
Right now I can only supply some of them, because while trying everything since yesterday, I inadvertantly lost access to my data when I zero-blocked my raid drives.  However I can say that /proc/mdstat was empty -- there was no mdstat file at all.   /etc/mdadm.conf contained a few lines that were put there when anaconda created the raid array.  The /etc/fstab contained the UUID= of the raid array, something very simple like:
UUID=........    /    defaults 1 1

Comment 5 Turgut Kalfaoglu 2014-06-23 09:54:01 UTC
there is a little more info here:  http://forums.fedoraforum.org/showthread.php?p=1702763&posted=1#post1702763

Comment 6 Jes Sorensen 2014-06-23 10:40:44 UTC
Ouf, sorry to hear that! :(

I hope you didn't lose any valuable data.

The only thing in your yum update log that would be relevant for RAID is
the kernel package. There is no mention of mdadm, systemd, dracut, etc. in
that list.

Your message mentioned you were rebuilding the RAID, is that still going on
or did you lose the data?

Thanks,
Jes

Comment 7 Turgut Kalfaoglu 2014-06-23 13:45:59 UTC
I had done a mdadm --zero-superblock on both drives, but re-creating the array with the same parameters allowed me to get my data back)..

Comment 8 Turgut Kalfaoglu 2014-06-23 20:05:47 UTC
I also just managed to get the system back to boot.
I had to tweak with mdadm, dracut, and grub2-mkconfig..

Comment 9 Jes Sorensen 2014-06-27 15:22:33 UTC
Any chance this could be related to this problem:

https://bugzilla.redhat.com/show_bug.cgi?id=1111442

?

Comment 10 Turgut Kalfaoglu 2014-06-30 09:30:09 UTC
I read it and it sounds similar indeed.
But I'm not an expert whether to say they are identical or not.
It felt like the "mdadm" module was not loaded by the couple of latest kernel updates..

Comment 11 Jes Sorensen 2014-06-30 09:38:35 UTC
I understand, the reason I suggest it could be the kernel is that there have
been no mdadm updates in a long time, and your yum update log didn't show
any mdadm updates.

It would be interesting to know if the problem goes away once the new kernel
propagates out.

Cheers,
Jes

Comment 12 Jes Sorensen 2014-07-18 07:54:26 UTC
Turgut,

Did you try this out again with a recent kernel?

Thanks,
Jes

Comment 13 Jes Sorensen 2014-12-08 16:49:16 UTC
Ping!

Is this still an issue?

Jes

Comment 14 Trevor Cordes 2015-01-14 00:19:11 UTC
Might be related: bug 1097664

Comment 15 Matti Laitala 2015-02-05 14:16:39 UTC
I have had similar problems with Fedora 20.
Booting goes fine with all the 3.16 series kernels like 3.16.7, but none of the 3.17 series kernels work :(

Raid is a mirror configuration on two identical 2TB disks. 

All updates have been installed.

Booting with 3.17 fails in a timeout.

Systemd will timeout trying to mount raid in a boot.

cat /proc/mdstat does not find any raid configurations.

mdadm --assemble --scan
will give 
unexpected failure opening 

With 3.16 series kernels raid setup is working just fine.

Comment 16 Matti Laitala 2015-02-05 16:51:17 UTC
Here 
cat /proc/mdstat 
with 3.16 series kernel, when all is working as it should.

Personalities : [raid1] 
md2013 : active raid1 sde1[0] sdd1[1]
      1953382208 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>

Comment 17 Matti Laitala 2015-02-08 21:00:56 UTC
Here also output for mdadm --detail /dev/md2013. Just masked the hostname from the output :)

/dev/md2013:
        Version : 1.2
  Creation Time : Sun Apr 14 03:55:41 2013
     Raid Level : raid1
     Array Size : 1953382208 (1862.89 GiB 2000.26 GB)
  Used Dev Size : 1953382208 (1862.89 GiB 2000.26 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Sun Feb  8 22:55:15 2015
          State : clean 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : MASKEDAWAY.SOME:2013  (local to host MASKEDAWAY.SOME)
           UUID : 10db33a0:78e4d98e:cb278297:b54a8347
         Events : 3143

    Number   Major   Minor   RaidDevice State
       0       8       65        0      active sync   /dev/sde1
       1       8       49        1      active sync   /dev/sdd1

Comment 18 Matti Laitala 2015-02-09 18:41:19 UTC
cat /etc/mdadm.conf

ARRAY /dev/md/2013  metadata=1.2 UUID=10db33a0:78e4d98e:cb278297:b54a8347 name=MASKEDAWAY.SOME:2013
MAILADDR root

/etc/fstab line for this array is following

/dev/md2013	/var/pub				ext4  defaults 1 2

Comment 19 Jes Sorensen 2015-02-09 18:49:15 UTC
(In reply to Matti Laitala from comment #18)
> cat /etc/mdadm.conf
> 
> ARRAY /dev/md/2013  metadata=1.2 UUID=10db33a0:78e4d98e:cb278297:b54a8347
> name=MASKEDAWAY.SOME:2013
> MAILADDR root
> 
> /etc/fstab line for this array is following
> 
> /dev/md2013	/var/pub				ext4  defaults 1 2

Matti,

You tell mdadm to create /dev/md/2013, but at the same time you try to mount
/dev/md2013 - that makes no sense.

Jes

Comment 20 Matti Laitala 2015-02-10 10:36:02 UTC
Jes,

Good point, thanks... If I remember correctly there is a link between those to device files. But I'll fix this anyway and I will retest this with new kernel.

But from my understaning this does not solve the problem that array is not usable for mounting.
Here referring that 
cat /proc/mdstat
does not find array.

Comment 21 Matti Laitala 2015-02-10 18:49:36 UTC
Jes,

fstab change had no effect (there is a symbolic link between those two file handles).
System works with all the 3.16 series kernels like
Linux kernel 3.16.7-200.fc20.x86_64

But none of the 3.17 or 3.18 kernels. Just tested kernel-3.18.5-101.fc20.x86_64.

cat mdstat.txt with 3.18 kernel is the same as with 3.17 kernels 
Personalities : 
unused devices: <none>

Causing the boot to fail. Do you have any ideas or am I forced to overwrite superblocks to fix the problem (as others have done to fix the issue)?

Comment 22 Matti Laitala 2015-03-11 08:52:43 UTC
Fixing the problem required update of name and device file for an array.
Steps for the fix:
mdadm --stop /dev/md2013
mdadm --assemble /dev/md1 --name=MASKEDAWAY.SOME:1 --update=name /dev/sde1 /dev/sdd1
mdadm --detail --scan > /etc/mdadm.conf
After that I edit the /etc/fstab to use /dev/md1.

Now machine works with 3.18.7-200.fc21.x86_64 kernel.

Comment 23 Fedora End Of Life 2015-05-29 12:11:47 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 24 Fedora End Of Life 2015-06-29 21:16:48 UTC
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.