Bug 481178

Summary: LVM Incorrect metadata area header checksum after update from UP kernel to SMP
Product: Red Hat Enterprise Linux 4 Reporter: Jan Tluka <jtluka>
Component: anacondaAssignee: Anaconda Maintenance Team <anaconda-maint-list>
Status: CLOSED ERRATA QA Contact: Alexander Todorov <atodorov>
Severity: medium Docs Contact:
Priority: high    
Version: 4.8CC: agk, atodorov, ddumas, dwysocha, edamato, heinzm, jbrassow, jgranado, mbroz, mgahagan, prockai, syeghiay
Target Milestone: beta   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-18 20:16:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan Tluka 2009-01-22 16:23:29 UTC
Description of problem:
When installing nightly build of RHEL4 - RHEL4-U8-re20090115.nightly on machine ibm-mongoose.rhts.bos.redhat.com I got following failure resulting in system panic:

--snip--
Scanning logical volumes

  Reading all physical volumes.  This may take a while...

  Incorrect metadata area header checksum

  Volume group "VolGroup00" inconsistent

  Incorrect metadata area header checksum

  WARNING: Inconsistent metadata found for VG VolGroup00 - updating to use version 3

  Incorrect metadata area header checksum

  Automatic metadata correction failed

ERROR: /bin/lvm exited abnormally! (pid 697)

Activating logical volumes

  Couldn't find device with uuid '7NR7g9-40oo-gIUr-XMRV-nLnr-379e-c2K3Zx'.

  Couldn't find device with uuid 'XrrTeW-UNMQ-BjRw-l4x6-OxXe-rYu0-8IPGH0'.

  Couldn't find device with uuid '7NR7g9-40oo-gIUr-XMRV-nLnr-379e-c2K3Zx'.

  Couldn't find device with uuid 'XrrTeW-UNMQ-BjRw-l4x6-OxXe-rYu0-8IPGH0'.

  LV LogVol00: segment 1 has inconsistent PV area 0

  Couldn't read all logical volumes for volume group VolGroup00.

  Couldn't find device with uuid '7NR7g9-40oo-gIUr-XMRV-nLnr-379e-c2K3Zx'.

  Couldn't find device with Kernel panic - not syncing: Attempted to kill init!

uuid 'XrrTeW-UNM Q-BjRw-l4x6-OxXe
--snip--

(see full log in RHTS) RHTS job http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=42848

Recipe http://rhts.redhat.com/cgi-bin/rhts/recipes.cgi?id=147680
This recipe installs uniprocessor kernel and updates it with smp kernel on multiprocessor system then reboots the system.

Version-Release number of selected component (if applicable):
2.6.9-78.28.ELsmp

How reproducible:
Not sure if this is 100% reproducible. I will check tomorrow.

Steps to Reproduce:
1. Install RHEL4-U8-re20090115.nightly with UP kernel on ibm-mongoose.rhts.bos.redhat.com. 
2. Update to SMP kernel.
3. Reboot.
  
Actual results:
System hangs.

Expected results:
System boots OK.

Additional info:

Comment 1 Milan Broz 2009-01-22 17:05:27 UTC
I expect that there was just old metadata of the same VG name which appeared during boot (or anaconda didn't wiped all metadata properly).

See the anaconda log, there are also some errors:
* WARNING: Installing on a USB device.  This may or may not produce a working system.
...
* parted exception: Error: File system has an invalid signature for a FAT file systems.
* parted exception: Error: File system has an invalid signature for a FAT file systems.
* parted exception: Error: Can't have the end before the start!
...

Isn't possible that anaconda just didn't initialized some device during install and this device re-appered during the system boot with wrong lvm metadata?

Reassigning to anaoconda, if you still see it is bug in lvm2, please provide full lvm2 debug log (run commands with -vvvv) and if possible, "lvmdump -m" diagnostic data from the machine.)

Comment 3 Joel Andres Granados 2009-01-23 10:22:20 UTC
It is completely possible that anaconda did not initialize the disk.  but as I see on the parted messages, this might be that the partition table was corrupted and anaconda just ignored the disk.

did something change in the kernel partitioning code?

Comment 4 Joel Andres Granados 2009-01-27 08:35:25 UTC
anaconda and parted where built on Jan 15.  This means that the test was done with the rhel4.7 anaconda version.  Please test with current nightly and confirm that the behavior persists.

Comment 5 Joel Andres Granados 2009-01-27 15:14:32 UTC
The new anaconda was modified in such a way that it uses vgreduce before vgremove.  Please test with current anaconda version anaconda-10.1.1.94-1

I'm also thinking that this bug is a dup of 481698.  But pls test and post your findings.

Comment 6 Jan Tluka 2009-01-27 17:02:47 UTC
Following job was queued in RHTS:
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=43499

This is about to install RHEL4-U8-re20090126.2 tree. I'm aware that it will fail because of udev bug but for our purpose should be sufficient. Anyway does this tree include anaconda version you mentioned in comment 5?

Comment 7 Joel Andres Granados 2009-01-27 17:11:28 UTC
Jan:

RHEL4-U8-re20090126.2 does not have the latest anaconda.  Please wait for today's compose.  Jan 27.  It has a new lvm fix and could make the difference.  For this reason, I will ignore comment #6 and wait for a test that includes anaconda-10.1.1.94-1

Comment 8 Jan Tluka 2009-01-29 11:25:19 UTC
RHTS job running RHEL4-U8-re20090128.1 installation:
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=43883

Comment 9 Joel Andres Granados 2009-01-30 17:30:31 UTC
(In reply to comment #8)
> RHTS job running RHEL4-U8-re20090128.1 installation:
> http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=43883

This link does not show me any relative info.  Do you have the link to the test logs?

Comment 10 Jan Tluka 2009-02-02 10:00:42 UTC
Test logs available here:
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6301495

Comment 11 Joel Andres Granados 2009-02-03 11:23:31 UTC
We have built a new anaconda,  this new version erases stale lvm metadata before doing anything.  Can you please retest with new nightly.  FYI anaconda version you need is anaconda-10.1.1.95-1

Comment 12 Jan Tluka 2009-02-03 18:11:29 UTC
Using nightly from 3rd Feb I got these test logs:
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6331359

Comment 13 Joel Andres Granados 2009-02-03 18:32:35 UTC
Unless I am missreading the logs.... this seems ok now.  I see that the job passed.  If you see anymore missbehavior pls reopen this bug.

Comment 23 errata-xmlrpc 2009-05-18 20:16:03 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0978.html