Bug 142737

Summary: lvm2-related boot failure
Product: [Fedora] Fedora Reporter: Dan Stromberg <strombrg>
Component: lvm2Assignee: Alasdair Kergon <agk>
Status: CLOSED WORKSFORME QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: katzj
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-01-05 16:09:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Stromberg 2004-12-13 15:48:15 UTC
Description of problem:
System won't boot

Version-Release number of selected component (if applicable):


How reproducible:
Probably difficult, but easy on my machine.  :)

Steps to Reproduce:
1. Shut down FC3 without sync'ing disks
2. Try to boot
3. It doesn't.
  
Actual results:
System won't boot

Expected results:
System should boot.

Additional info:
I have an FC3 system, that was happy, but is now unhappy.  This may be
related to someone, who shall remain nameless, having shut off the power
on it without doing an orderly shutdown.  Then again, maybe it was
because of a "yum -y update", because I put off rebooting for a while
after that.

Anyway, now when it tries to boot, I see:

Red Hat nash version 4.1.18 starting
  Reading all physical volumes.  This may take a while...
  Found volume group "VolGroup00" using metadata type lvm2
  2 logical volume(s) in volume group "VolGroup00" now active


...and that's it.  I've left it there for over and hour, and it never
gets past that.

I booted off of an FC3 rescue cd, and found that I could mount the /boot
partition, but I cannot mount the / partition.  I ran various lvm
commands that identified two lvm volumes on the system.
fsck'ing /dev/hda2 (which is /) is getting me no where though - it just
says "invalid argument".

I tried firing up device mapper and udev in order to get
a /dev/VolGroup00 directory, but it just wouldn't do it - at least, not
with the things I tried.  I could mkdir the directory, but then "lvm
vgmknodes" would remove it.

What do I need to do to get past this?  There's stuff in the filesystem
I want quite a bit.  :-S

I tried all 3 FC3 kernels I have on the system, but none would come up,
getting stuck at that same point.

When I boot up into
the rescue CD and let it try to find my fedora install, it gets really
confused.  More specifically, it says:

Searching for Fedora Core installations...

        0%              install exited abnormally -- received signal 15
                                kernel panic - not syncing: Out of
memory and no killable processes


If I remove "quiet" and add "single" to my boot options, I get:

EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: dm-0: orphan cleanup on readonly fs

...and there it hangs.


Also, I ran memtest86 on the box for a while (a little over an hour),
and found no errors.

Comment 1 Dan Stromberg 2004-12-17 04:16:30 UTC
What finally fixed it was:

On FC3's rescue disk, what I actually did was:

1) Do startup network interfaces
2) Don't try to automatically mount the filesystems - not even readonly
3) lvm vgchange --ignorelockingfailure -P -a y
4) fdisk -l, and guess which partition is which based on size: the
small one was /boot, and the large one was /
5) mkdir /mnt/boot
6) mount /dev/hda1 /mnt/boot
7) Look up the device node for the root filesystem in
/mnt/boot/grub/grub.conf
8) A first tentative step, to see if things are working: fsck -n
/dev/VolGroup00/LogVol00
9) Dive in: fsck -f -y /dev/VolGroup00/LogVol00
10) Wait a while...  Be patient.  Don't interrupt it
11) Reboot


Comment 2 Alasdair Kergon 2004-12-21 17:04:27 UTC
So you now think fsck was hanging?

Comment 3 Dan Stromberg 2004-12-21 17:54:47 UTC
I can't do much more than guess, since there's no strace on the FC3
recovery cd image.

However, the fact that LVM2 came right up using the steps above, and
the problem was corrected by an fsck, does seem to suggest an ext3
problem.

It's worth noting that LVM2 obfuscated the filesystem in such a way
that the usual ext2 recovery tools were confused.



Comment 4 Alasdair Kergon 2004-12-21 20:52:59 UTC
> fsck'ing /dev/hda2 (which is /) is getting me no where though 

As you discovered, you needed to run fsck on the logical volume not
the raw device.  This probably needs documenting somewhere - but I'm
not sure where.

Comment 6 Alasdair Kergon 2005-01-05 16:06:03 UTC
I've tried various things with recent CD images and I can't reproduce
this problem: the automatic recovery/rescue mode works fine for me, so
I'm going to assume the cause of this has since been fixed.