Bug 772926

Summary: dracut unable to boot from a degraded raid1 array
Product: Red Hat Enterprise Linux 6 Reporter: Tru Huynh <pasteur>
Component: dracutAssignee: Harald Hoyer <harald>
Status: CLOSED DUPLICATE QA Contact: Release Test Team <release-test-team-automation>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.2CC: alex.hha, borgan, degg17, forrestt, harald, jcpunk, joe.chlanda, john, mishu, norisdata, pknirsch, poming168, ste.sachse, toracat
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: dracut-004-283.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-24 15:21:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kickstart from dvd for a raid1 installation
none
dmesg log in dracut rdshell
none
dmesg log in dracut rdshell (with dracut-004-259.el6)
none
console from dracut-004-284.el6_3.1
none
console from dracut-004-256.el6_2.1 none

Description Tru Huynh 2012-01-10 11:03:37 UTC
Description of problem:
dracut unable to boot from a degraded raid1 array (possible regression from 6.1)

Version-Release number of selected component (if applicable):
dracut-004-256.el6.noarch

How reproducible:
always

Steps to Reproduce:
1. install a fresh minimal 6.2 machine with raid1
2. poweroff
3. remove one of the raid1 array
4. boot
  
Actual results:
kernel panic

Expected results:
boot

Additional info:

Comment 1 Tru Huynh 2012-01-10 11:05:27 UTC
workaround: add an additionnal "rdshell" to the boot command line and manually activate the detected array

Comment 3 Tru Huynh 2012-01-10 11:48:18 UTC
regression from 6.1 (dracut-004-53.el6.noarch)

Comment 4 Tru Huynh 2012-01-12 15:06:50 UTC
Created attachment 552416 [details]
kickstart from dvd for a raid1 installation

reproducer for the minimal raid1 installation

Comment 5 Alex 2012-01-29 13:06:48 UTC
Any update?

Comment 6 Harald Hoyer 2012-01-30 13:38:43 UTC
please test:

http://people.redhat.com/harald/downloads/dracut/dracut-004-259.el6/

Comment 7 Tru Huynh 2012-01-30 16:10:21 UTC
no changes: with rdshell added on a raid1 with one member down

sd 1:0:0:0: [sda] 4194304 512-byte logical blocks: (2.14 GB/2.00 GiB)
sd 1:0:0:0: [sda] Write Protect is off
sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1
sd 1:0:0:0: [sda] Attached SCSI disk
md: bind<sda1>
dracut Warning: No root device "block:/dev/disk/by-uuid/6b12b35e-d068-4cee-931b-08bfae12e34c" found





Dropping to debug shell.

sh: can't access tty; job control turned off

dracut:/# dmesg| grep drac
dracut: dracut-004-259.el6
dracut: rd_NO_LUKS: removing cryptoluks activation
dracut: rd_NO_LVM: removing LVM activation
dracut: Starting plymouth daemon
dracut: rd_NO_DM: removing DM RAID activation
dracut Warning: No root device "block:/dev/disk/by-uuid/6b12b35e-d068-4cee-931b-08bfae12e34c" found

dracut:/# cat /proc/mdstat
Personalities : 
md0 : inactive sda1[1](S)
      1048564 blocks super 1.0
       
unused devices: <none>
dracut:/# mdadm --run /dev/md0
md: raid1 personality registered for level 1
bio: create slab <bio-1> at 1
md/raid1:md0: active with 1 out of 2 mirrors
created bitmap (1 pages) for device md0
md0: bitmap initialized from disk: read 1/1 pages, set 0 of 16 bits
md0: detected capacity change from 0 to 1073729536
mdadm: started / md0:dev/md0
dracut: unknown partition table
/
# cat /proc/mdstat
Personalities : [raid1] 
md0 : active raid1 sda1[1]
      1048564 blocks super 1.0 [2/1] [_U]
      bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices: <none>
dracut:/# 

I tried:
1) upgrading to dracut-004-259 + rebuilding initramfs
2) installing dracut-004-259 during the initial install
-> same issue/outcome

Comment 8 Ming 2012-02-08 02:49:49 UTC
I confirmed that this problem also occurs in RHEL 6.1

After dropping to debug shell, I can check directory "/dev/disk" that there are "by-id" and "by-path" directories, but no "by-uuid" .

Comment 10 RHEL Program Management 2012-02-10 11:19:38 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 11 joe.chlanda 2012-02-14 00:56:59 UTC
any ETA on when the fix will be released? Thanks!

Comment 12 Harald Hoyer 2012-02-15 16:17:39 UTC

*** This bug has been marked as a duplicate of bug 761584 ***

Comment 13 Ming 2012-02-16 07:28:58 UTC
Could #761584 please be opened, so people actually can track it?

You are not authorized to access bug #761584. 

Thanks.

Comment 14 Harald Hoyer 2012-02-16 09:33:15 UTC
doh.. cannot open that one... reopening this one

Comment 15 Ming 2012-02-20 05:08:19 UTC
Thanks Harald Hoyer, so how can we (and other users) know the solution?

Comment 17 Tru Huynh 2012-02-20 15:30:20 UTC
sd 1:0:0:0: [sda] 4194304 512-byte logical blocks: (2.14 GB/2.00 GiB)
sd 1:0:0:0: [sda] Write Protect is off
sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1
sd 1:0:0:0: [sda] Attached SCSI disk
md: bind<sda1>
dracut: Assembling MD RAID arrays
md: raid1 personality registered for level 1
bio: create slab <bio-1> at 1
md/raid1:md0: active with 1 out of 2 mirrors
created bitmap (1 pages) for device md0
md0: bitmap initialized from disk: read 1/1 pages, set 0 of 16 bits
md0: detected capacity change from 0 to 1073729536
dracut: mdadm: started array /dev/md0
dracut: Autoassembling MD Raid
 md0: unknown partition table
dracut: mdadm: failed to run array /dev/md0: Device or resource busy
EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: 
dracut: Mounted root filesystem /dev/md0
dracut: Loading SELinux policy
type=1404 audit(1329755332.126:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295
type=1403 audit(1329755332.582:3): policy loaded auid=4294967295 ses=4294967295
dracut: 
dracut: Switching root
readahead: starting


-> works for me :)

Comment 18 Ming 2012-02-21 05:58:57 UTC
it does not work for me, with rdshell added on a raid1 with one member down, you can see there are "by-id" & "by-path", but still no "by-uuid".



sd 2:0:0:0: [sda] 8388608 512-byte logical blocks: (4.29 GB/4.00 GiB)
sd 2:0:0:0: [sda] Write Protect is off
sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1 sda2
sd 2:0:0:0: [sda] Attached SCSI disk
md: bind<sda1>
md: bind<sda2>
dracut Warning: No root device "block:/dev/disk/by-uuid/24f509d8-d1f5-42de-b69f-588da0c9c452" found




Dropping to debug shell.

sh: can't access tty; job control truned off
dracut:/# ls /dev/disk/
by-id  by-path
dracut:/#

Comment 19 Ming 2012-02-21 06:48:47 UTC
tested http://people.redhat.com/harald/downloads/dracut/dracut-004-259.el6/ also not work for me

Comment 20 Harald Hoyer 2012-02-21 08:32:47 UTC
(In reply to comment #18)
> it does not work for me, with rdshell added on a raid1 with one member down,
> you can see there are "by-id" & "by-path", but still no "by-uuid".
> 
> 
> 
> sd 2:0:0:0: [sda] 8388608 512-byte logical blocks: (4.29 GB/4.00 GiB)
> sd 2:0:0:0: [sda] Write Protect is off
> sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support
> DPO or FUA
>  sda: sda1 sda2
> sd 2:0:0:0: [sda] Attached SCSI disk
> md: bind<sda1>
> md: bind<sda2>
> dracut Warning: No root device
> "block:/dev/disk/by-uuid/24f509d8-d1f5-42de-b69f-588da0c9c452" found
> 
> 
> 
> 
> Dropping to debug shell.
> 
> sh: can't access tty; job control truned off
> dracut:/# ls /dev/disk/
> by-id  by-path
> dracut:/#

what is in /proc/mdstat ?

# cat /proc/mdstat
# ls -l /dev/disk/*/

Comment 21 Ming 2012-02-21 10:09:36 UTC
dracut:/# cat /proc/mdstat
Personalities :
md1 : inactive sda2[0](S)
      3662808 blocks super 1.0

md0 : inactive sda1[0](S)
      530105 blocks super 1.1

unused devices: <none>
dracut:/# 
dracut:/# 
dracut:/# ls -l /dev/disk/*/
/dev/disk/by-id/:
total 0
lrwxrwxrwx 1 0 root  9 Feb 21 09:57 ata-VBOX_HARDDISK_VBccd15e1d-477e6885 -> ../../sda
lrwxrwxrwx 1 0 root 10 Feb 21 09:57 ata-VBOX_HARDDISK_VBccd15e1d-477e6885-part1 -> ../../sda1
lrwxrwxrwx 1 0 root 10 Feb 21 09:57 ata-VBOX_HARDDISK_VBccd15e1d-477e6885-part2 -> ../../sda2
lrwxrwxrwx 1 0 root  9 Feb 21 09:57 scsi-SATA_VBOX_HARDDISK_VBccd15e1d-477e6885 -> ../../sda
lrwxrwxrwx 1 0 root 10 Feb 21 09:57 scsi-SATA_VBOX_HARDDISK_VBccd15e1d-477e6885-part1 -> ../../sda1
lrwxrwxrwx 1 0 root 10 Feb 21 09:57 scsi-SATA_VBOX_HARDDISK_VBccd15e1d-477e6885-part2 -> ../../sda2

/dev/disk/by-path/:
lrwxrwxrwx 1 0 root  9 Feb 21 09:58 pci-0000:00:01.1-scsi-1:0:0:0 -> ../../sr0
lrwxrwxrwx 1 0 root 10 Feb 21 09:57 pci-0000:00:0d.0-scsi-0:0:0:0 -> ../../sda
lrwxrwxrwx 1 0 root 10 Feb 21 09:57 pci-0000:00:0d.0-scsi-0:0:0:0-part1 -> ../../sda1
lrwxrwxrwx 1 0 root 10 Feb 21 09:57 pci-0000:00:0d.0-scsi-0:0:0:0-part2 -> ../../sda2
dracut:/#

Comment 22 Harald Hoyer 2012-02-21 10:12:55 UTC
(In reply to comment #21)
> dracut:/# cat /proc/mdstat
> Personalities :
> md1 : inactive sda2[0](S)
>       3662808 blocks super 1.0
> 
> md0 : inactive sda1[0](S)
>       530105 blocks super 1.1
> 

I don't see a degraded array here.

Comment 23 Harald Hoyer 2012-02-21 10:16:37 UTC
(In reply to comment #22)
> (In reply to comment #21)
> > dracut:/# cat /proc/mdstat
> > Personalities :
> > md1 : inactive sda2[0](S)
> >       3662808 blocks super 1.0
> > 
> > md0 : inactive sda1[0](S)
> >       530105 blocks super 1.1
> > 
> 
> I don't see a degraded array here.

sorry, disregard that comment. Of course, it's not display, because it's not yet started.

What is your kernel command line?

Can you provide more information?

http://people.redhat.com/harald/dracut-rhel6.html#troubleshooting

Comment 24 Harald Hoyer 2012-02-21 10:20:28 UTC
(In reply to comment #23)
> (In reply to comment #22)
> > (In reply to comment #21)
> > > dracut:/# cat /proc/mdstat
> > > Personalities :
> > > md1 : inactive sda2[0](S)
> > >       3662808 blocks super 1.0
> > > 
> > > md0 : inactive sda1[0](S)
> > >       530105 blocks super 1.1
> > > 
> > 
> > I don't see a degraded array here.
> 
> sorry, disregard that comment. Of course, it's not display, because it's not
> yet started.
> 
> What is your kernel command line?
> 
> Can you provide more information?
> 
> http://people.redhat.com/harald/dracut-rhel6.html#troubleshooting

What is the output of:

# echo /dracut-*
# cat /proc/cmdline
# dmesg

can you do:

# mdadm --run /dev/md0
# mdadm --run /dev/md1
# cat /proc/mdstat

Comment 25 Ming 2012-02-21 16:24:48 UTC
Created attachment 564742 [details]
dmesg log in dracut rdshell

Comment 26 Ming 2012-02-21 16:26:47 UTC
dracut:/# echo /dracut-*
/dracut-004-256.el6

dracut:/# cat /proc/cmdline
ro root=UUID=24f509d8-d1f5-42de-b69f-588da0c9c452 rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_MD_UUID=a137053e:5da3cb3e:4c851b77:11723049 rd_MD_UUID=29f68e2e:74c974f8:691039f6:34077b1b SYSFONT=latarcyrheb-sun16 rhgb  rd_NO_LVM rd_NO_DM rdshell

dracut:/# mdadm --run /dev/md0
md/raid1:md0: active with 1 out of 2 mirrors
md0: detected capacity change from 0 to 542827520
mdadm: started /dev/md0
dracut:/#  md0: unknown partition table

dracut:/#  mdadm --run /dev/md1
md: raid1 personality registered for level 1
bio: create slab <bio-1> at 1
md/raid1:md1: active with 1 out of 2 mirrors
created bitmap (1 pages) for device md1
md1: bitmap initialized from disk: read 1/1/ pages, set 0 of 56 bits
md1: detected capacity change from 0 to 3750715392
mdadm: started /dev/md1
dracut:/#  md1: unknown partition table


For demsg, plz check attachment 564742 [details]


NOTE: I discover that after I run "mdadm --run /dev/md1" in dracut command, "by-uuid" will exist under /dev/disk;
And then I run "mount /dev/md1 /tmp", then hard rebooted, the server can start with one member down.
I tested that both dracut-004-256.el6.noarch & dracut-004-256.el6_2.1.noarch have this result.

Comment 27 Harald Hoyer 2012-02-21 17:03:07 UTC
(In reply to comment #26)
> dracut:/# echo /dracut-*
> /dracut-004-256.el6

So.. this is with the old version... you are supposed to install:

http://people.redhat.com/harald/downloads/dracut/dracut-004-259.el6/

and don't forget to recreate your initramfs with:

# dracut -f

Comment 28 Ming 2012-02-22 04:32:19 UTC
Created attachment 564828 [details]
dmesg log in dracut rdshell (with dracut-004-259.el6)

Comment 29 Ming 2012-02-22 04:33:39 UTC
Sorry for missing "dracut -f"

Now I got result as:
With this version http://people.redhat.com/harald/downloads/dracut/dracut-004-259.el6/ , it does not work
With this version http://people.redhat.com/harald/downloads/dracut/dracut-004-256.el6_2.1/ , it works for me



Below are the checking of dracut-004-259.el6 for your reference:


dracut:/#  echo /dracut-*
/dracut-004-259.el6

dracut:/#  cat /proc/cmdline
ro root=UUID=24f509d8-d1f5-42de-b69f-588da0c9c452 rd_NO_LUKS  KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_MD_UUID=a137053e:5da3cb3e:4c851b77:11723049 rd_MD_UUID=29f68e2e:74c974f8:691039f6:34077b1b SYSFONT=latarcyrheb-sun16 rhgb  rd_NO_LVM rd_NO_DM rdshell

dracut:/#  cat /proc/mdstat
Personalities :
md1 : inactive sda2[0](S)
      3662808 blocks super 1.0

md0 : inactive sda1[0](S)
      530105 blocks super 1.1

unused devices: <none>


dracut:/# dmesg (plz check attachment 564828 [details])


dracut:/# mdadm --run /dev/md0
md: raid1 personality registered for level 1
bio: create slab <bio-1> at 1
md/raid1:md0: active with 1 out of 2 mirrors
md0: detected capacity change from 0 to 542827520
mdadm: started /dev/md0
dracut:/# md0: unknown partition table


dracut:/# mdadm --run /dev/md1
md/raid1:md1: active with 1 out of 2 mirrors
created bitmap (1 pages) for device md1
md1: bitmap initialized from disk: read 1/1 pages, set 0 of 56 bits
md1: detected capacity change from 0 to 3750715392
mdadm: started /dev/md1
dracut:/#  md1: unknown partition table



# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid sda2[0]
      3662808 blocks super 1.0 [2/1] [U_]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md0 : active raid sda1[0]
      530105 blocks super 1.1 [2/1] [U_]

unused devices: <none>

Comment 30 Ming 2012-02-27 04:26:31 UTC
May I ask whether I should use dracut-004-256.el6_2.1 temporarily? But dracut-004-259.el6 is newer version, whether later version will contains this fix?

Comment 31 Robert 2012-02-29 22:50:11 UTC
At /proc/cmdline (or in kernel cmd in grub.conf), rd_NO_DM should not use.

Sorry my bad english writing.

Comment 34 Harald Hoyer 2012-05-24 15:21:04 UTC

*** This bug has been marked as a duplicate of bug 761584 ***

Comment 36 Forrest Hastings Tiffany 2012-07-13 15:29:13 UTC
Can bug 761584 be opened so we can see the status of it?

Comment 37 Harald Hoyer 2012-07-16 09:10:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0839.html

Comment 38 Stephan Sachse 2013-01-30 16:16:41 UTC
dracut-004-284.el6_3.1.noarch does not work, downgrade to dracut-004-256.el6_2.1.noarch works fine

Comment 39 Stephan Sachse 2013-01-30 16:18:03 UTC
Created attachment 690487 [details]
console from dracut-004-284.el6_3.1

Comment 40 Stephan Sachse 2013-01-30 16:18:51 UTC
Created attachment 690488 [details]
console from dracut-004-256.el6_2.1

Comment 41 Stephan Sachse 2013-01-30 16:58:26 UTC
ok, this is crazy!

256 does not work on first boot with sdb disconnected. drops into rdshell. run "mdadm --run /dev/md0", then "exit", get an error about no root device, again "exit" and the system boots fine. login as root and "reboot" the system and all works fine without rdshell.

i will try this tomorrow with 284

Comment 42 Stephan Sachse 2013-01-31 16:03:37 UTC
it is the same with 284. it only works if the raid ist assembled and run one time from hand. wherever you do that. in the rdshell or with a live cd. for me it looks like mdraid_start.sh ist never run.

Comment 43 Forrest Hastings Tiffany 2013-01-31 16:30:29 UTC
Stephan, you're likely wasting your time.  This bug is closed, meaning Red Hat will not work on or even likely see your or my comments.  The duplicate that this one is supposed to be linked to is hidden from view for us "common folk", so we can't even post useful comments to it (I'm guessing that it too is closed, but all I can see is the message "You are not authorized to access bug #761584.").  I asked for it to be opened to viewing back in July, but my request was either ignored or rejected.  The response from Red Hat was that there is an errata put out for the bug that addresses it.  I'd suggest that you look there, and if it doesn't address your problem, open a new bug.

Comment 44 Stephan Sachse 2013-01-31 16:54:33 UTC
ok, here is it: bug #906464