Bug 1506680

Summary: Anaconda Fails to Wipe/Remove Existing Disks with Some Types of LVM Metadata
Product: Red Hat Enterprise Linux 7 Reporter: Will Foster <wfoster>
Component: python-blivetAssignee: Blivet Maintenance Team <blivet-maint-list>
Status: CLOSED WONTFIX QA Contact: Release Test Team <release-test-team-automation>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: bengland, bmarson, jcall, jharriga, jkonecny, wfoster, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-15 07:44:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Anaconda Log
none
More anaconda logs
none
Anaconda debug logs none

Description Will Foster 2017-10-26 14:23:24 UTC
Created attachment 1343782 [details]
Anaconda Log

Description of problem:

While trying to kickstart and install RHEL on disks with certain types of existing LVM metadata Anaconda will fail.

This is evident in previous deployments that might use GlusterFS or Ceph that construct certain types of LVM metadata on disk.

Manually running the following will fix this, but it requires manual intervention and then subsequent kickstart will complete.

pvremove /dev/* --force -y
for disk in $(ls /dev/{sd*,nv*}); do wipefs -a -f $disk; done

We currently use the following in our kickstarts to wipe disks prior but this doesn't seem to help

--snip--
zerombr
clearpart --all --initlabel
--snip--


Steps to Reproduce:
1. Deploy Ceph or some combination of Ceph + GlusterFS
2. Try to kickstart the existing machine/disks 

Actual results:

Anaconda fails (see attached logs)


Expected results:

Kickstart as usual.


Additional info:

Comment 2 Will Foster 2017-10-26 14:24:02 UTC
Created attachment 1343783 [details]
More anaconda logs

Comment 3 Will Foster 2017-10-26 14:25:39 UTC
Additional note:  Attempts for just pvremove complain:

--snip--
[anaconda root@c07-h25-6048r ~]# pvremove /dev/* --force --force -y
  WARNING: Unrecognised segment type cache-pool+METADATA_FORMAT
  Can't open /dev/loop0 exclusive
--snip--

Comment 5 Jiri Konecny 2017-10-27 08:03:20 UTC
This is storage specific bug. Changing components to our storage library.

Comment 6 Vratislav Podzimek 2017-10-27 08:05:21 UTC
Please attach the /tmp/program.log file from the installation. That should tell us what happened.

Comment 7 Will Foster 2017-10-27 12:50:20 UTC
Created attachment 1344302 [details]
Anaconda debug logs

Attaching anaconda debug logs, I am not sure if I have program.log but this is quite exhaustive so hopefully will provide some more useful info.

I'm still digging up exactly what this particular host was doing prior to attempting kickstart which would have laid down the LVM metadata.

Unfortunately we don't have this machine available for debugging.

Comment 9 John Harrigan 2017-10-27 16:55:35 UTC
Those dm-cache LVMs were created by an ansible playbook.

The actual playbook can be found in the 'Playbooks/SL_2nvmes.yml' file
in this github repo
https://github.com/jharriga/CephVolume

Comment 10 David Lehman 2017-11-15 20:46:53 UTC
23:25:04,254 INFO program: Running... lvm lvremove vg-ceph-sdu/lv-ceph-sdu --config  devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] } 
23:25:04,306 INFO program:   WARNING: Unrecognised segment type cache-pool+METADATA_FORMAT
23:25:04,306 INFO program:   Cannot change VG vg-ceph-sdu with unknown segments in it!
23:25:04,307 INFO program:   Cannot process volume group vg-ceph-sdu
23:25:04,307 DEBUG program: Return code: 5


So this is some type of lvm metadata that's not understood by the version of lvm in the installer's runtime environment. My initial thought is that it's on the user to remove since there's no decent way to remove it from the installer environment.

Comment 11 John Harrigan 2017-11-15 21:21:00 UTC
These are dm-cache lvm's. They can be removed using 'lvremove'. I specify the 
LVPATH as a full pathname. The pv's are FASTDEV (/dev/nvme...) and SLOWDEV (/dev/sdX)
The script can be seen https://github.com/jharriga/CephVolume/blob/master/destroyLVM.sh

Here is the sequence used to destroy them within a bash for loop:

  vg="${cachedVG}$cntr"
  lv="${cachedLV}$cntr"

  # Remove the cached LV
  lvpath="/dev/${vg}/${lv}"
  lvremove --force ${lvpath} || \
    error_exit "$LINENO: Unable to lvremove ${lvpath}"
  updatelog "lvremove of ${lvpath} complete"

  # Remove the VG
  vgremove --force ${vg} || \
    error_exit "$LINENO: Unable to vgremove ${vg}"
  updatelog "vgremove of ${vg} complete"

  # Remove the PVs
  pvremove --force --yes ${fastdev} || \
    error_exit "$LINENO: Unable to pvremove ${fastdev}"
  updatelog "pvremove of ${fastdev} complete"
  pvremove --force --yes ${slowdev} || \
    error_exit "$LINENO: Unable to pvremove ${slowdev}"
updatelog "pvremove of ${slowdev} complete"

Comment 12 John Call 2019-06-07 15:05:07 UTC
I'm still hitting this issue today with RHV-H ISOs.  My workaround has been a cleanup script in kickstart %pre.  Is there any update on an Anaconda/Blivet resolution for this issue?  I specifically write ~10MB of zeros at the front of the disk because wipefs is not enough in all cases.

%pre
KEEP=$(lsblk --noheadings --nodeps -o NAME,LABEL | awk '/RHVH-4.3 RHVH.x86_64/ {print $1}')
echo "DANGER -- Wiping all disks, except loopbacks, roms, and installaion media..."
for i in $(lsblk --noheadings --nodeps --exclude 7,11 -o NAME); do
  [[ $i != "$KEEP" ]] && echo && echo "wiping $i"
  [[ $i != "$KEEP" ]] && lsblk --nodeps /dev/$i -o +MODEL
  [[ $i != "$KEEP" ]] && wipefs -af /dev/$i
  [[ $i != "$KEEP" ]] && dd if=/dev/zero of=/dev/$i bs=1M count=10
done
%end

Comment 13 David Lehman 2019-06-07 19:44:35 UTC
As I said before, this lvm layout is impossible to manage using the lvm tools available to the installer environment, as can be seen below:

23:24:25,217 INFO program: Running... lvm lvchange -a y vg-ceph-sdah/lv-ceph-sdah --config  devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] } 
23:24:25,252 INFO program:   WARNING: Unrecognised segment type cache-pool+METADATA_FORMAT
23:24:25,253 INFO program:   Internal error: _emit_target cannot handle segment type cache-pool+METADATA_FORMAT
23:24:25,253 DEBUG program: Return code: 5


The only option that leaves is to write a bunch of zeros and hope that it hits all the relevant metadata. Maybe the LVM team can advise us on how to handle this.

Comment 15 RHEL Program Management 2021-01-15 07:44:15 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 16 Zdenek Kabelac 2021-01-28 10:15:34 UTC
As for comment 13

"Internal error: _emit_target ...."  means the 'lvm2' command executed there was NOT compiled with newer cache format2 support.
So possibly some stale binary present in that environment - should be fixed by placing newer lvm2 binary there.

Other way is to avoid using newer cache format 2 - if the system should be kept backward compatible with older systems and
kernels without this support.