Bug 783841

Summary:	[RHEL6.2] System fails to install hangs during formatting disks
Product:	Red Hat Enterprise Linux 6	Reporter:	Jeff Burke <jburke>
Component:	anaconda	Assignee:	David Lehman <dlehman>
Status:	CLOSED ERRATA	QA Contact:	Release Test Team <release-test-team-automation>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	6.2	CC:	agk, arozansk, atodorov, bpeck, coughlan, davids, dlehman, dwysocha, emcnabb, gozen, heinzm, jarod, jbrassow, jhutar, jpazdziora, jstancek, jstodola, matt, mbroz, mganisin, mgrigull, msnitzer, pbunyan, pcassaro, prajnoha, prockai, soft-linux-drv, thornber, yoguma, zkabelac
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	anaconda-13.21.158-1	Doc Type:	Bug Fix
Doc Text:	Cause: The issue is caused by a failure of the installer to remove old metadata from complex storage devices like lvm and software RAID. It involves the use of either the kickstart "clearpart" command or the use of one of the installer's automatic partitioning options that clears old data from the system's disks. Consequence: The bug manifests as a deadlock in the lvm tools which causes the installation process to hang. Fix: Two measures were taken to address the issue. First, the lvm commands in the udev rules packaged with the installer were changed to use a less restrictive method of locking. Second, the installer was changed to explicitly remove partitions from disk instead of simply creating a new partition table on top of the old contents when it reinitializes a disk. Result: The deadlock/freeze no longer occurs.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-06-20 12:51:19 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	545868

Description Jeff Burke 2012-01-22 17:59:00 UTC

Description of problem:
While running tests. A system can fail to finish formatting the disks. This
is seen when installing Fedora 16 (i386, x86_64) prior to installing RHEL6.2

Version-Release number of selected component (if applicable):
RHEL6.2 GA

How reproducible:
Always

Steps to Reproduce:
1. Install a system with Fedora16 
2. Then try to install a the same system with RHEL6.2
  
Actual results:
 ------------<snip>------------
 Examining storage devices
 In progress

 Creating device /dev/sda1
 Creating ext4 filesystem on /dev/sda1
 In progress

 Creating device /dev/sda2
 Creating physical volume (LVM) on /dev/sda2
 In progress

 Creating device /dev/mapper/vg_amdgak8nspro01
 In progress

 Creating device /dev/mapper/vg_amdgak8nspro01-lv_root
 In progress
 ------------<snip>------------

Expected results:
System should be able to install as expected.

Additional info:

Comment 4 David Cantrell 2012-01-23 16:41:01 UTC

Would like the LVM team to look in to this one (based on comment #1), so reassigning for now.

Comment 10 Peter Rajnoha 2012-01-31 12:17:56 UTC

(..you'll probably need to use "sshd asknetwork" on kernel cmd line to have the network setup first and have the sshd ready in anaconda to get in there easily)

Comment 11 Jeff Burke 2012-01-31 13:51:28 UTC

(In reply to comment #10)
> (..you'll probably need to use "sshd asknetwork" on kernel cmd line to have the
> network setup first and have the sshd ready in anaconda to get in there easily)

Were you successful in reproducing the issue?

Comment 12 Peter Rajnoha 2012-01-31 14:42:35 UTC

(In reply to comment #11)
> Were you successful in reproducing the issue?

Unfortunately not...

Comment 16 Peter Rajnoha 2012-02-02 15:12:27 UTC

OK, I'm quite curious what it is, we're already trying to reproduce this (and hopefully with the debug as well) - Marian Csontos helped me with the Beaker/kickstart for anaconda. Running it just now...

Comment 19 Milan Broz 2012-02-06 14:40:06 UTC

The difference between F16 and RHEL6 is that F16 is using GPT.

GPT requires that not only beginning but also end of device mus be properly wiped for clearpart command.

It this is not happening, it is quite possible that kernel see the old partition table and it crash installation.

Adding dlehman just to verify that RHEL6 anaconda properly wipes GPT (I don't think so).

Comment 20 David Lehman 2012-02-06 15:00:14 UTC

I was seeing something similar during development for F16. To try with the patch that fixed the deadlock/hang I was seeing, add the following to your boot command line:

  updates=http://dlehman.fedorapeople.org/updates/updates-783841.0.img

That updates.img should work with any RHEL6 from 6.2 GA on.

Comment 21 Jeff Burke 2012-02-06 23:16:55 UTC

re-running the testing the previously failed with the updates image allowed R6.2 to be installed after F16 install:

https://beaker.engineering.redhat.com/jobs/190017

Comment 22 Peter Rajnoha 2012-02-07 12:11:36 UTC

David says that the updated img contains changed anaconda udev rules (/lib/udev/rules.d/70-anaconda.rules) to replace "--ignorelockingfailure" with "--config 'global {locking_type=4}'" for LVM commands called within those rules.

I looked at the machine while the installation was hung and I've noticed that there's a duplicate VG there (installing RHEL 6.2 over F16):

-bash-4.1# vgs
  WARNING: Duplicate VG name vg_hp8100e01: Existing VSddhS-FPZa-ZsGK-Ec50-M2aD-xec9-YnIMv5 (created here) takes precedence over wA8FWF-cKBb-oN2D-ZAEO-LKYn-5vdU-Fs3sIM
  WARNING: Duplicate VG name vg_hp8100e01: Existing VSddhS-FPZa-ZsGK-Ec50-M2aD-xec9-YnIMv5 (created here) takes precedence over wA8FWF-cKBb-oN2D-ZAEO-LKYn-5vdU-Fs3sIM
  WARNING: Duplicate VG name vg_hp8100e01: Existing wA8FWF-cKBb-oN2D-ZAEO-LKYn-5vdU-Fs3sIM (created here) takes precedence over VSddhS-FPZa-ZsGK-Ec50-M2aD-xec9-YnIMv5
  WARNING: Duplicate VG name vg_hp8100e01: Existing VSddhS-FPZa-ZsGK-Ec50-M2aD-xec9-YnIMv5 (created here) takes precedence over wA8FWF-cKBb-oN2D-ZAEO-LKYn-5vdU-Fs3sIM
  VG           #PV #LV #SN Attr   VSize   VFree  
  vg_hp8100e01   1   1   0 wz--n- 232.39g 182.39g
  vg_hp8100e01   1   3   0 wz--n- 232.38g      0 


The VG with 3 LVs is the VG from F16 (containing the root LV, swap LV and home LV) and the VG with 1 LV is the new one created by RHEL installation (that's just in progress). And these two are mixed together.

So it looks like the LVM command called from the udev rule tries to do the repair of the VG it finds from previous installation and this seems to be the problem here. Using locking_type=4 avoids it ("read-only" locking - not changing any metadata).

I think anaconda already tries to erase some parts of the disk before installation to avoid such problems, but it seems it's not complete. It should probably use wipefs to do that reliably? I'm not quite sure now what anaconda uses today exactly - that's a question for anaconda team. David?

Anyway, I'll try to dig deeper and I'll try to see what exactly happens there that it totally hangs.

(I think the udev worker just fails processing any rules further if any command in the run queue fails, also failing to send the notification about completing the udev rule processing and so we end up waiting forever in that lvcreate on the semaphore...)

Comment 23 Peter Rajnoha 2012-02-07 12:44:58 UTC

Another interesting part is this one (there are different VG uuids now as I reinstalled it again):

-bash-4.1# pvs 
  WARNING: Duplicate VG name vg_hp8100e01: Existing 9JIPH2-jxQ8-S6ia-MipJ-QCS6-Tj9V-ZZINBn (created here) takes precedence over 1xKiNC-RbN3-zxTg-ZBxU-zsu8-SC9k-aZXW1m
  WARNING: Duplicate VG name vg_hp8100e01: Existing 9JIPH2-jxQ8-S6ia-MipJ-QCS6-Tj9V-ZZINBn (created here) takes precedence over 1xKiNC-RbN3-zxTg-ZBxU-zsu8-SC9k-aZXW1m
  PV         VG           Fmt  Attr PSize   PFree  
  /dev/dm-0  vg_hp8100e01 lvm2 a--  232.38g      0 
  /dev/sda2  vg_hp8100e01 lvm2 a--  232.39g 182.39g


The dm-0 is the newly created lv_root in RHEL 6.2 which seems to fit the PV created in F16 perfectly and unveils the old metadata this way through a new device (dm-0 - the lv_root). So the lv_root acts like a PV that just appeared. Very nice bug!

Comment 24 Peter Rajnoha 2012-02-07 12:54:22 UTC

The F16 uses this partition layout by default:

  Number  Start     End         Size        File system  Name  Flags
   1      2048s     4095s       2048s                          bios_grub
   2      4096s     1028095s    1024000s    ext4         ext4  boot
   3      1028096s  488396799s  487368704s                     lvm


The RHEL 6.2 this one:

  Number  Start     End         Size        Type     File system  Flags
   1      2048s     1026047s    1024000s    primary  ext4
   2      1026048s  488396799s  487370752s  primary               lvm


The "lvm" partition starts at 1028096s in F16 and 1026048s in RHEL 6.2. That's a difference of 2048 sectors and that's exactly the PE start offset in LVM by default (1 MB):

  PV         1st PE 
  /dev/dm-0    1.00m
  /dev/sda2    1.00m

A perfect fit :)

Comment 25 Peter Rajnoha 2012-02-07 13:23:59 UTC

Though we wipe the first KB in newly created LV (the '--zero y' which is used by default), this does not help because we need to create the mapping/LV first (which generates a CHANGE udev event) and just after that we do the wipe. But there's a window open in which the udev sees it unwiped and so any lvm command run from within udev rules sees the old PV with that offset and is the source of the confusion then... So it's a race even.

Comment 26 Zdenek Kabelac 2012-02-07 13:32:05 UTC

Which is known issue for some time - the correct activation for LV that should be zeroed is  to activate it as private device - zero - deactivate - and activate as regular accessible device. The problem here is - it would be noticeable slower.

The ideal solution here is - to zero PV area directly before activation of LV - which is the most efficient solution.

I guess this will be resolved with 'ddlv' idea.

Comment 27 David Lehman 2012-02-07 16:11:49 UTC

There is also an anaconda bug here. We normally explicitly run 'wipefs -a' to clear PVs before destroying the device it is on, but in the case of 'clearpart --all --initlabel' we do not even look at the partitions. This is an ill-advised shortcut and is the basic cause for this bug IMO. I propose we reassign this bug to anaconda and target for 6.3.

Comment 28 Peter Rajnoha 2012-02-08 16:13:20 UTC

Yes, those metadata remnants really need to be wiped properly. The reason it hangs lies here in this line in 70-anaconda.rules:

# probe metadata of LVM2 physical volumes
ENV{ID_FS_TYPE}=="LVM2_member", IMPORT{program}="$env{ANACBIN}/lvm pvs ...

Now, just an example, considering we have /dev/sda and an LV on top of it.

When creating the LV, a CHANGE udev event is generated. That change fires "blkid" call in udev rules to scan any metadata on that device. But since normally the LV is clear, there are no metadata on it and hence the ENV{ID_FS_TYPE} remains blank.
And so the "pvs" command is not run on newly created LV.

But if we have such an unhappy offset here where the start of newly created LV just fits the start of the PV label we had there sometime before, the ENV{ID_FS_TYPE} is (correctly) set to "LVM2_member" and so it fires the "pvs" command.

So we end up with this sequence:

1. calling lvcreate, taking an lvm lock
2. lvcreate creating the vg-lvol0 mapping (to wipe the first KB of it), generating the CHANGE event
3. *before* wiping the first KBs of the new LV within lvcreate, those udev rules are processed
4. these rules will see the old PV on newly created LV, firing the pvs rule
5. the pvs will try to take an lvm lock, but it needs to wait for the lvcreate to release it first!
6. lvcreate continues until it hits the "sync_local_dev_names" call which waits on a semaphore for relevant udev rules to be processed (there's a notification through a semaphore as one of the last udev rules to be processed)
7. lvcreate waits for udev rules, pvs waits for lvcreate and hence can't process further udev rules - deadlock!

We could avoid this situation by directly wiping the start of the new LV without a need to activate it first (the idea Zdenek mentioned in comment #26). I'm not quite sure now whether we can change lvm locking logic here somehow easily to avoid this...

Wiping those parts of the disk before making new LVs is the most straightforward way to avoid this (the same as using locking_type=4).

Reassigning to anaconda team...

(We should also consider the feasibility of the idea in comment #26 on lvm side as well!).

Comment 31 David Cantrell 2012-02-13 18:37:35 UTC

atodorov,

Can we get a qa_ack+ on this?

Comment 32 RHEL Program Management 2012-02-13 18:49:22 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 40 David Cantrell 2012-02-20 13:15:58 UTC

*** Bug 745421 has been marked as a duplicate of this bug. ***

Comment 44 David Lehman 2012-03-06 15:53:19 UTC

Here is an updated updates image, against anaconda-13.21.157-1:

 http://dlehman.fedorapeople.org/updates/updates-783841.1.img

This one should explicitly destroy any devices and metadata they contain, including lvm-on-md, luks, &c.

Comment 45 Jeff Burke 2012-03-06 16:23:25 UTC

David,
 I want to make sure that I have this correct. This latest update is for RHEL6.3.
Since it is built on top of anaconda-13.21.157-1. Will the next nightly tree we get from rel-eng have a version of Anaconda with this fix applied? If so we should not need an updates image for 6.3.

 Will we need a second updates image for RHEL6.2, one based on anaconda-13.21.149-1?

Regards,
Jeff

Comment 46 David Lehman 2012-03-06 17:20:52 UTC

Alright, I realize now that this is a problem that is impossible to solve
without un-solving a different problem. The patch I added to the updates image
in comment 44 breaks the unattended kickstart by prompting for LUKS
passphrases. So it is impossible to have a completely unattended install and
also guarantee all metadata has been removed from all disks unless you want to
literally zero the entire disks.

We have several pieces from which we can build a solution:

 1. lvm locking change in udev rules

    prevents the problem but is considered more of a workaround
    than a real solution to the problem of stale metadata

 2. properly clearing disklabels, which allows us to also clear
    metadata from the partitions

    this does not clear metadata from within the partitions,
    for example LV/VG metadata within a PV partition

 3. try to explicitly destroy all devices

    the problem with this is it break unattended installs on
    systems that contain encrypted block devices

 4. try to explicitly destroy all devices, except for those
    hidden by encryption

    this gets quite close to a total solution but still can
    hit these lvm deadlocks if encryption was used in the
    previous install

Personally, I am starting to think that the best solution is to leave the patch
mentioned in comment 29 (item 2 above) and also add the lvm locking change
(item 1 above) to cover any additional cases in which the lvm tools might get
deadlocked.

Comment 47 Peter Rajnoha 2012-03-07 10:49:40 UTC

(In reply to comment #46)
>  1. lvm locking change in udev rules
> 
>     prevents the problem but is considered more of a workaround
>     than a real solution to the problem of stale metadata

Just FYI, we're tracking the problem of race-prone wiping with new lvm2 bug #796200, destined for 6.4 to provide a better solution.

Comment 49 David Lehman 2012-04-20 21:19:26 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: The issue is caused by a failure of the installer to remove old metadata from complex storage devices like lvm and software RAID. It involves the use of either the kickstart "clearpart" command or the use of one of the installer's automatic partitioning options that clears old data from the system's disks.

Consequence: The bug manifests as a deadlock in the lvm tools which causes the installation process to hang.

Fix: Two measures were taken to address the issue. First, the lvm commands in the udev rules packaged with the installer were changed to use a less restrictive method of locking. Second, the installer was changed to explicitly remove partitions from disk instead of simply creating a new partition table on top of the old contents when it reinitializes a disk.

Result: The deadlock/freeze no longer occurs.

Comment 50 David Lehman 2012-05-07 16:55:02 UTC

*** Bug 801709 has been marked as a duplicate of this bug. ***

Comment 55 errata-xmlrpc 2012-06-20 12:51:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0782.html

Comment 56 David Lehman 2013-07-17 19:19:27 UTC

opening access...