Bug 1016750

Summary: Server would no longer boot, dracut can not find volume groups
Product: [Fedora] Fedora Reporter: Michael Mussulis <michael>
Component: lvm2Assignee: Peter Rajnoha <prajnoha>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 18CC: agk, bmarzins, bmr, dracut-maint, dwysocha, harald, heinzm, husung, jonathan, jtt77777, lvm-team, michael, msnitzer, prajnoha, prockai, zkabelac
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-05 23:14:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
System report
none
lvm.conf none

Description Michael Mussulis 2013-10-08 15:44:33 UTC
Created attachment 809371 [details]
System report

Description of problem: After a power cut, the server would not boot anymore. After "Reached target System Initialization" it would not display anything for a while, then throws a message about not being able to boot, drops into dracut shell and advises to send you the sosreport.txt. We initially assumed the disks became corrupt but eventually after many tries, we discovered it was the latest kernel at fault. Several days ago I installed kernel-3.10.13-101.fc18.x86_64 but never rebooted. Today, the power loss forced us to reboot the server and this problems showed up. Booting with kernel-3.10.12-100.fc18.x86_64 works just fine.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Update with yum to latest kernel
2. Make sure you have /dev/root as an LVM in raid (hardware)
3.

Actual results:
[  123.784336] localhost dracut-initqueue[163]: Scanning devices  for LVM logical volumes fedora_xvdev/swap fedora_xvdev/root
[  123.787181] localhost dracut-initqueue[163]: No volume groups found
[  123.789270] localhost dracut-initqueue[163]: PARTIAL MODE. Incomplete logical volumes will be processed.
[  123.790497] localhost dracut-initqueue[163]: Volume group "fedora_xvdev" not found
[  123.790669] localhost dracut-initqueue[163]: Skipping volume group fedora_xvdev
[  183.950324] localhost dracut-initqueue[163]: Warning: Could not boot.
[  183.965223] localhost dracut-initqueue[163]: Warning: /dev/fedora_xvdev/root does not exist
[  183.965476] localhost dracut-initqueue[163]: Warning: /dev/fedora_xvdev/swap does not exist
[  183.965695] localhost dracut-initqueue[163]: Warning: /dev/mapper/fedora_xvdev-root does not exist

Expected results:
Boot as normal.

Additional info:
It would seem the latest kernel has a problem with /dev/root + LVM + RAID. We are using an HP SmartArray P400 with 4 SAS 146Gb drives in RAID 1. See attached sosreport.txt.

Comment 1 Dirk Husung 2013-10-10 08:51:08 UTC
The boot problem, described by Michael Mussulis, still exists for me with kernel-3.10.14-100.fc18.i686, too.
Kernel version 3.10.12-100.fc18.i686 was the last one, which booted without any problems.

The boot problem occurs only on my systems with (software) raids.

sosreport.txt:

:
[    1.838322] localhost kernel: scsi2 : ioc0: LSI53C1030 B2, FwRev=01032571h, Ports=1, MaxQ=222, IRQ=18
[    3.184348] localhost kernel: scsi 2:0:0:0: Direct-Access     COMPAQ   BD07285A25       HPB4 PQ: 0 ANSI: 3
[    3.184367] localhost kernel: scsi target2:0:0: Beginning Domain Validation
[    3.198503] localhost kernel: scsi target2:0:0: Ending Domain Validation
[    3.198568] localhost kernel: scsi target2:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI WRFLOW PCOMP (6.25 ns, offset 63)
[    3.202057] localhost kernel: scsi 2:0:1:0: Direct-Access     COMPAQ   BD07286224       HPB6 PQ: 0 ANSI: 3
[    3.202070] localhost kernel: scsi target2:0:1: Beginning Domain Validation
[    3.224502] localhost kernel: scsi target2:0:1: Ending Domain Validation
[    3.224565] localhost kernel: scsi target2:0:1: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RTI PCOMP (6.25 ns, offset 127)
[    4.476446] localhost kernel: scsi 2:0:8:0: Processor         SDR      GEM318           0    PQ: 0 ANSI: 2
[    4.476461] localhost kernel: scsi target2:0:8: Beginning Domain Validation
[    4.477711] localhost kernel: scsi target2:0:8: Ending Domain Validation
[    4.477775] localhost kernel: scsi target2:0:8: asynchronous
[    6.230915] localhost kernel: sd 2:0:0:0: Attached scsi generic sg1 type 0
[    6.231311] localhost kernel: sd 2:0:0:0: [sda] 142264000 512-byte logical blocks: (72.8 GB/67.8 GiB)
[    6.231659] localhost kernel: sd 2:0:1:0: Attached scsi generic sg2 type 0
[    6.232164] localhost kernel: sd 2:0:1:0: [sdb] 142264000 512-byte logical blocks: (72.8 GB/67.8 GiB)
[    6.232603] localhost kernel: sd 2:0:0:0: [sda] Write Protect is off
[    6.232612] localhost kernel: sd 2:0:0:0: [sda] Mode Sense: d3 00 10 08
[    6.232636] localhost kernel: scsi 2:0:8:0: Attached scsi generic sg3 type 3
[    6.234148] localhost kernel: sd 2:0:1:0: [sdb] Write Protect is off
[    6.234155] localhost kernel: sd 2:0:1:0: [sdb] Mode Sense: cf 00 10 08
[    6.234206] localhost kernel: sd 2:0:0:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
[    6.235430] localhost kernel: sd 2:0:1:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
[    6.244123] localhost kernel:  sdb: sdb1 sdb2 sdb3
[    6.247466] localhost kernel:  sda: sda1 sda2 sda3
[    6.248936] localhost kernel: sd 2:0:1:0: [sdb] Attached SCSI disk
[    6.251679] localhost kernel: sd 2:0:0:0: [sda] Attached SCSI disk
[    6.481820] localhost kernel: md: bind<sda1>
[    6.559244] localhost kernel: md: bind<sdb3>
[    6.573242] localhost kernel: md: bind<sdb2>
[    6.577967] localhost kernel: md: bind<sda3>
[    6.586118] localhost kernel: md: raid1 personality registered for level 1
[    6.586823] localhost kernel: md/raid1:md126: active with 2 out of 2 mirrors
[    6.586862] localhost kernel: md126: detected capacity change from 0 to 67134619648
[    6.589022] localhost kernel: RAID1 conf printout:
[    6.589031] localhost kernel:  --- wd:2 rd:2
[    6.589036] localhost kernel:  disk 0, wo:0, o:1, dev:sdb3
[    6.589041] localhost kernel:  disk 1, wo:0, o:1, dev:sda3
[    6.589470] localhost kernel:  md126: unknown partition table
[    6.599233] localhost kernel: md: bind<sdb1>
[    6.602384] localhost kernel: md/raid1:md127: active with 2 out of 2 mirrors
[    6.602423] localhost kernel: md127: detected capacity change from 0 to 592117760
[    6.602611] localhost kernel: RAID1 conf printout:
[    6.602617] localhost kernel:  --- wd:2 rd:2
[    6.602623] localhost kernel:  disk 0, wo:0, o:1, dev:sdb1
[    6.602627] localhost kernel:  disk 1, wo:0, o:1, dev:sda1
[    6.608480] localhost kernel: md: bind<sda2>
[    6.611333] localhost kernel: md/raid1:md125: active with 2 out of 2 mirrors
[    6.611374] localhost kernel: md125: detected capacity change from 0 to 2212495360
[    6.611554] localhost kernel: RAID1 conf printout:
[    6.611560] localhost kernel:  --- wd:2 rd:2
[    6.611565] localhost kernel:  disk 0, wo:0, o:1, dev:sdb2
[    6.611570] localhost kernel:  disk 1, wo:0, o:1, dev:sda2
[    6.617519] localhost kernel:  md127: unknown partition table
[    6.620090] localhost kernel:  md125: unknown partition table
[    7.118725] localhost systemd[1]: Started Show Plymouth Boot Screen.
[    7.119897] localhost systemd[1]: Started Dispatch Password Requests to Console Directory Watch.
[    7.121080] localhost systemd[1]: Starting Paths.
[    7.121881] localhost systemd[1]: Reached target Paths.
[    7.122578] localhost systemd[1]: Starting Forward Password Requests to Plymouth Directory Watch.
[    7.123255] localhost systemd[1]: Started Forward Password Requests to Plymouth Directory Watch.
[    7.123952] localhost systemd[1]: Starting Basic System.
[    7.124701] localhost systemd[1]: Reached target Basic System.
[  192.250278] localhost dracut-initqueue[152]: Warning: Could not boot.
[  192.256402] localhost dracut-initqueue[152]: Warning: /dev/md1 does not exist
[  192.263734] localhost systemd[1]: Starting Setup Virtual Console...
[  192.269814] localhost systemd[1]: Started Setup Virtual Console.
[  192.271125] localhost systemd[1]: Starting Dracut Emergency Shell...
- end of file -

Any ideas?

Comment 2 Dirk Husung 2013-10-10 16:39:46 UTC
In my case /etc/mdadm.conf was missing in the newly created initial ramdisk.
With a fixed initial ramdisk (only mdadm.conf copied in) kernel 3.10.14-100.fc18.i686 now boots fine on my systems.

Comment 3 Harald Hoyer 2013-10-14 08:49:48 UTC
(In reply to Michael Mussulis from comment #0)
> Created attachment 809371 [details]
> System report
> 
> Description of problem: After a power cut, the server would not boot
> anymore. After "Reached target System Initialization" it would not display
> anything for a while, then throws a message about not being able to boot,
> drops into dracut shell and advises to send you the sosreport.txt. We
> initially assumed the disks became corrupt but eventually after many tries,
> we discovered it was the latest kernel at fault. Several days ago I
> installed kernel-3.10.13-101.fc18.x86_64 but never rebooted. Today, the
> power loss forced us to reboot the server and this problems showed up.
> Booting with kernel-3.10.12-100.fc18.x86_64 works just fine.
> 
> 
> Version-Release number of selected component (if applicable):
> 
> 
> How reproducible:
> 
> 
> Steps to Reproduce:
> 1. Update with yum to latest kernel
> 2. Make sure you have /dev/root as an LVM in raid (hardware)
> 3.
> 
> Actual results:
> [  123.784336] localhost dracut-initqueue[163]: Scanning devices  for LVM
> logical volumes fedora_xvdev/swap fedora_xvdev/root
> [  123.787181] localhost dracut-initqueue[163]: No volume groups found
> [  123.789270] localhost dracut-initqueue[163]: PARTIAL MODE. Incomplete
> logical volumes will be processed.
> [  123.790497] localhost dracut-initqueue[163]: Volume group "fedora_xvdev"
> not found
> [  123.790669] localhost dracut-initqueue[163]: Skipping volume group
> fedora_xvdev
> [  183.950324] localhost dracut-initqueue[163]: Warning: Could not boot.
> [  183.965223] localhost dracut-initqueue[163]: Warning:
> /dev/fedora_xvdev/root does not exist
> [  183.965476] localhost dracut-initqueue[163]: Warning:
> /dev/fedora_xvdev/swap does not exist
> [  183.965695] localhost dracut-initqueue[163]: Warning:
> /dev/mapper/fedora_xvdev-root does not exist
> 
> Expected results:
> Boot as normal.
> 
> Additional info:
> It would seem the latest kernel has a problem with /dev/root + LVM + RAID.
> We are using an HP SmartArray P400 with 4 SAS 146Gb drives in RAID 1. See
> attached sosreport.txt.

In raid 1? On your kernel command line, you have turned _off_ raid with
"rd.md=0 rd.dm=0"

BOOT_IMAGE=/vmlinuz-3.10.13-101.fc18.x86_64 root=/dev/mapper/fedora_xvdev-root ro rd.lvm.lv=fedora_xvdev/swap rd.md=0 rd.dm=0 rd.lvm.lv=fedora_xvdev/root rd.luks=0 vconsole.keymap=us rhgb quiet biosdevname=0 LANG=en_US.UTF-8

Comment 4 Harald Hoyer 2013-10-14 08:52:17 UTC
blkid only sees these partitions:

 blkid
/dev/cciss/c0d0: PTTYPE="dos" 
/dev/cciss/c0d0p1: UUID="b7722444-956b-4b7f-96c1-5664e756913b" TYPE="ext4" 
/dev/cciss/c0d0p2: UUID="ly36me-EIYq-AYPx-lee0-Tetd-6MtF-xeWRCm" TYPE="LVM2_member" 
/dev/cciss/c0d1: UUID="47a37751-7231-470f-b609-84573b53c4aa" TYPE="ext4" 


and only /dev/cciss/c0d0p2 has an LVM member.

+ lvm pvdisplay
  --- Physical volume ---
  PV Name               /dev/cciss/c0d0p2
  VG Name               fedora_xvdev

+ lvm lvdisplay
  --- Logical volume ---
  LV Path                /dev/fedora_xvdev/root
  LV Name                root
  VG Name                fedora_xvdev
  LV UUID                qE45lq-QF5C-VLfs-lwfu-AIq5-sdZo-TB7DMZ
  LV Write Access        read/write
  LV Creation host, time localhost, 2013-03-07 12:23:51 +0000
  LV Status              NOT available
  LV Size                50.00 GiB
  Current LE             12800
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto


LV Status: NOT available...

Comment 5 Harald Hoyer 2013-10-14 08:54:24 UTC
Does it work, if you replace:

"rd.lvm.lv=fedora_xvdev/swap rd.lvm.lv=fedora_xvdev/root "

with

"rd.lvm.vg=fedora_xvdev"

??

Comment 6 Michael Mussulis 2013-10-14 09:12:48 UTC
Hi Harald,

I will schedule some tests a little later on when the server will be less busy.

Thanks,
Michael.

Comment 7 Peter Rajnoha 2013-12-05 13:55:36 UTC
Michael, are you still hitting this problem?

Comment 8 Michael Mussulis 2013-12-05 13:59:55 UTC
Hi,

Sorry, we've been so busy we've really not had time to test with the above suggestions. I've purchased a proper server, an ML370 G5 tower, and have migrated everything across, so I will be able to do the tests without affecting our day-to-day operations.

I hope to have a few minutes tomorrow to look at this and report back.

Cheers,
Michael.

Comment 9 John Taylor 2013-12-19 00:25:05 UTC
Hi All,

I've also run into this problem on an older hp dl360 g5 with the embedded p400i controller.

I tried the suggestion above of changing the rd.lv.lv to the volume group only rd.lv.vg  with no success. I noticed in comparing the output of lsinitrd with the previous initramfs* that /etc/lvm/lvm.conf wasn't included in the 3.11.10, so I tried regenerating the initramfs with
dracut --force --lvmconf 

and it now finds the logical volumes on boot.  

-John

Comment 10 Peter Rajnoha 2013-12-19 10:32:31 UTC
(In reply to John Taylor from comment #9)
> Hi All,
> 
> I've also run into this problem on an older hp dl360 g5 with the embedded
> p400i controller.
> 
> I tried the suggestion above of changing the rd.lv.lv to the volume group
> only rd.lv.vg  with no success. I noticed in comparing the output of
> lsinitrd with the previous initramfs* that /etc/lvm/lvm.conf wasn't included
> in the 3.11.10, so I tried regenerating the initramfs with
> dracut --force --lvmconf 
> 
> and it now finds the logical volumes on boot.  
> 

Can you attach your lvm.conf here? I'd like to see what's the difference from defaults. Thanks.

Comment 11 John Taylor 2013-12-19 13:21:49 UTC
Created attachment 838981 [details]
lvm.conf

Comment 12 Fedora End Of Life 2013-12-21 14:39:02 UTC
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 13 Fedora End Of Life 2014-02-05 23:14:43 UTC
Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.