1506680 – Anaconda Fails to Wipe/Remove Existing Disks with Some Types of LVM Metadata

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1506680 - Anaconda Fails to Wipe/Remove Existing Disks with Some Types of LVM Metadata

Summary: Anaconda Fails to Wipe/Remove Existing Disks with Some Types of LVM Metadata

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	python-blivet
Sub Component:
Version:	7.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Blivet Maintenance Team
QA Contact:	Release Test Team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-10-26 14:23 UTC by Will Foster
Modified:	2021-09-03 14:10 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-01-15 07:44:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Anaconda Log (9.52 KB, text/plain) 2017-10-26 14:23 UTC, Will Foster	no flags	Details
More anaconda logs (1.98 KB, text/plain) 2017-10-26 14:24 UTC, Will Foster	no flags	Details
Anaconda debug logs (4.21 MB, text/plain) 2017-10-27 12:50 UTC, Will Foster	no flags	Details
View All

Description Will Foster 2017-10-26 14:23:24 UTC

Created attachment 1343782 [details]
Anaconda Log

Description of problem:

While trying to kickstart and install RHEL on disks with certain types of existing LVM metadata Anaconda will fail.

This is evident in previous deployments that might use GlusterFS or Ceph that construct certain types of LVM metadata on disk.

Manually running the following will fix this, but it requires manual intervention and then subsequent kickstart will complete.

pvremove /dev/* --force -y
for disk in $(ls /dev/{sd*,nv*}); do wipefs -a -f $disk; done

We currently use the following in our kickstarts to wipe disks prior but this doesn't seem to help

--snip--
zerombr
clearpart --all --initlabel
--snip--


Steps to Reproduce:
1. Deploy Ceph or some combination of Ceph + GlusterFS
2. Try to kickstart the existing machine/disks 

Actual results:

Anaconda fails (see attached logs)


Expected results:

Kickstart as usual.


Additional info:

Comment 2 Will Foster 2017-10-26 14:24:02 UTC

Created attachment 1343783 [details]
More anaconda logs

Comment 3 Will Foster 2017-10-26 14:25:39 UTC

Additional note:  Attempts for just pvremove complain:

--snip--
[anaconda root@c07-h25-6048r ~]# pvremove /dev/* --force --force -y
  WARNING: Unrecognised segment type cache-pool+METADATA_FORMAT
  Can't open /dev/loop0 exclusive
--snip--

Comment 5 Jiri Konecny 2017-10-27 08:03:20 UTC

This is storage specific bug. Changing components to our storage library.

Comment 6 Vratislav Podzimek 2017-10-27 08:05:21 UTC

Please attach the /tmp/program.log file from the installation. That should tell us what happened.

Comment 7 Will Foster 2017-10-27 12:50:20 UTC

Created attachment 1344302 [details]
Anaconda debug logs

Attaching anaconda debug logs, I am not sure if I have program.log but this is quite exhaustive so hopefully will provide some more useful info.

I'm still digging up exactly what this particular host was doing prior to attempting kickstart which would have laid down the LVM metadata.

Unfortunately we don't have this machine available for debugging.

Comment 9 John Harrigan 2017-10-27 16:55:35 UTC

Those dm-cache LVMs were created by an ansible playbook.

The actual playbook can be found in the 'Playbooks/SL_2nvmes.yml' file
in this github repo
https://github.com/jharriga/CephVolume

Comment 10 David Lehman 2017-11-15 20:46:53 UTC

23:25:04,254 INFO program: Running... lvm lvremove vg-ceph-sdu/lv-ceph-sdu --config  devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] } 
23:25:04,306 INFO program:   WARNING: Unrecognised segment type cache-pool+METADATA_FORMAT
23:25:04,306 INFO program:   Cannot change VG vg-ceph-sdu with unknown segments in it!
23:25:04,307 INFO program:   Cannot process volume group vg-ceph-sdu
23:25:04,307 DEBUG program: Return code: 5


So this is some type of lvm metadata that's not understood by the version of lvm in the installer's runtime environment. My initial thought is that it's on the user to remove since there's no decent way to remove it from the installer environment.

Comment 11 John Harrigan 2017-11-15 21:21:00 UTC

These are dm-cache lvm's. They can be removed using 'lvremove'. I specify the 
LVPATH as a full pathname. The pv's are FASTDEV (/dev/nvme...) and SLOWDEV (/dev/sdX)
The script can be seen https://github.com/jharriga/CephVolume/blob/master/destroyLVM.sh

Here is the sequence used to destroy them within a bash for loop:

  vg="${cachedVG}$cntr"
  lv="${cachedLV}$cntr"

  # Remove the cached LV
  lvpath="/dev/${vg}/${lv}"
  lvremove --force ${lvpath} || \
    error_exit "$LINENO: Unable to lvremove ${lvpath}"
  updatelog "lvremove of ${lvpath} complete"

  # Remove the VG
  vgremove --force ${vg} || \
    error_exit "$LINENO: Unable to vgremove ${vg}"
  updatelog "vgremove of ${vg} complete"

  # Remove the PVs
  pvremove --force --yes ${fastdev} || \
    error_exit "$LINENO: Unable to pvremove ${fastdev}"
  updatelog "pvremove of ${fastdev} complete"
  pvremove --force --yes ${slowdev} || \
    error_exit "$LINENO: Unable to pvremove ${slowdev}"
updatelog "pvremove of ${slowdev} complete"

Comment 12 John Call 2019-06-07 15:05:07 UTC

I'm still hitting this issue today with RHV-H ISOs.  My workaround has been a cleanup script in kickstart %pre.  Is there any update on an Anaconda/Blivet resolution for this issue?  I specifically write ~10MB of zeros at the front of the disk because wipefs is not enough in all cases.

%pre
KEEP=$(lsblk --noheadings --nodeps -o NAME,LABEL | awk '/RHVH-4.3 RHVH.x86_64/ {print $1}')
echo "DANGER -- Wiping all disks, except loopbacks, roms, and installaion media..."
for i in $(lsblk --noheadings --nodeps --exclude 7,11 -o NAME); do
  [[ $i != "$KEEP" ]] && echo && echo "wiping $i"
  [[ $i != "$KEEP" ]] && lsblk --nodeps /dev/$i -o +MODEL
  [[ $i != "$KEEP" ]] && wipefs -af /dev/$i
  [[ $i != "$KEEP" ]] && dd if=/dev/zero of=/dev/$i bs=1M count=10
done
%end

Comment 13 David Lehman 2019-06-07 19:44:35 UTC

As I said before, this lvm layout is impossible to manage using the lvm tools available to the installer environment, as can be seen below:

23:24:25,217 INFO program: Running... lvm lvchange -a y vg-ceph-sdah/lv-ceph-sdah --config  devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] } 
23:24:25,252 INFO program:   WARNING: Unrecognised segment type cache-pool+METADATA_FORMAT
23:24:25,253 INFO program:   Internal error: _emit_target cannot handle segment type cache-pool+METADATA_FORMAT
23:24:25,253 DEBUG program: Return code: 5


The only option that leaves is to write a bunch of zeros and hope that it hits all the relevant metadata. Maybe the LVM team can advise us on how to handle this.

Comment 15 RHEL Program Management 2021-01-15 07:44:15 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 16 Zdenek Kabelac 2021-01-28 10:15:34 UTC

As for comment 13

"Internal error: _emit_target ...."  means the 'lvm2' command executed there was NOT compiled with newer cache format2 support.
So possibly some stale binary present in that environment - should be fixed by placing newer lvm2 binary there.

Other way is to avoid using newer cache format 2 - if the system should be kept backward compatible with older systems and
kernels without this support.

Note You need to log in before you can comment on or make changes to this bug.