Bug 1506680
| Summary: | Anaconda Fails to Wipe/Remove Existing Disks with Some Types of LVM Metadata | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Will Foster <wfoster> | ||||||||
| Component: | python-blivet | Assignee: | Blivet Maintenance Team <blivet-maint-list> | ||||||||
| Status: | CLOSED WONTFIX | QA Contact: | Release Test Team <release-test-team-automation> | ||||||||
| Severity: | unspecified | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 7.4 | CC: | bengland, bmarson, jcall, jharriga, jkonecny, wfoster, zkabelac | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2021-01-15 07:44:15 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
Created attachment 1343783 [details]
More anaconda logs
Additional note: Attempts for just pvremove complain: --snip-- [anaconda root@c07-h25-6048r ~]# pvremove /dev/* --force --force -y WARNING: Unrecognised segment type cache-pool+METADATA_FORMAT Can't open /dev/loop0 exclusive --snip-- This is storage specific bug. Changing components to our storage library. Please attach the /tmp/program.log file from the installation. That should tell us what happened. Created attachment 1344302 [details]
Anaconda debug logs
Attaching anaconda debug logs, I am not sure if I have program.log but this is quite exhaustive so hopefully will provide some more useful info.
I'm still digging up exactly what this particular host was doing prior to attempting kickstart which would have laid down the LVM metadata.
Unfortunately we don't have this machine available for debugging.
Those dm-cache LVMs were created by an ansible playbook. The actual playbook can be found in the 'Playbooks/SL_2nvmes.yml' file in this github repo https://github.com/jharriga/CephVolume 23:25:04,254 INFO program: Running... lvm lvremove vg-ceph-sdu/lv-ceph-sdu --config devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] }
23:25:04,306 INFO program: WARNING: Unrecognised segment type cache-pool+METADATA_FORMAT
23:25:04,306 INFO program: Cannot change VG vg-ceph-sdu with unknown segments in it!
23:25:04,307 INFO program: Cannot process volume group vg-ceph-sdu
23:25:04,307 DEBUG program: Return code: 5
So this is some type of lvm metadata that's not understood by the version of lvm in the installer's runtime environment. My initial thought is that it's on the user to remove since there's no decent way to remove it from the installer environment.
These are dm-cache lvm's. They can be removed using 'lvremove'. I specify the LVPATH as a full pathname. The pv's are FASTDEV (/dev/nvme...) and SLOWDEV (/dev/sdX) The script can be seen https://github.com/jharriga/CephVolume/blob/master/destroyLVM.sh Here is the sequence used to destroy them within a bash for loop: vg="${cachedVG}$cntr" lv="${cachedLV}$cntr" # Remove the cached LV lvpath="/dev/${vg}/${lv}" lvremove --force ${lvpath} || \ error_exit "$LINENO: Unable to lvremove ${lvpath}" updatelog "lvremove of ${lvpath} complete" # Remove the VG vgremove --force ${vg} || \ error_exit "$LINENO: Unable to vgremove ${vg}" updatelog "vgremove of ${vg} complete" # Remove the PVs pvremove --force --yes ${fastdev} || \ error_exit "$LINENO: Unable to pvremove ${fastdev}" updatelog "pvremove of ${fastdev} complete" pvremove --force --yes ${slowdev} || \ error_exit "$LINENO: Unable to pvremove ${slowdev}" updatelog "pvremove of ${slowdev} complete" I'm still hitting this issue today with RHV-H ISOs. My workaround has been a cleanup script in kickstart %pre. Is there any update on an Anaconda/Blivet resolution for this issue? I specifically write ~10MB of zeros at the front of the disk because wipefs is not enough in all cases.
%pre
KEEP=$(lsblk --noheadings --nodeps -o NAME,LABEL | awk '/RHVH-4.3 RHVH.x86_64/ {print $1}')
echo "DANGER -- Wiping all disks, except loopbacks, roms, and installaion media..."
for i in $(lsblk --noheadings --nodeps --exclude 7,11 -o NAME); do
[[ $i != "$KEEP" ]] && echo && echo "wiping $i"
[[ $i != "$KEEP" ]] && lsblk --nodeps /dev/$i -o +MODEL
[[ $i != "$KEEP" ]] && wipefs -af /dev/$i
[[ $i != "$KEEP" ]] && dd if=/dev/zero of=/dev/$i bs=1M count=10
done
%end
As I said before, this lvm layout is impossible to manage using the lvm tools available to the installer environment, as can be seen below:
23:24:25,217 INFO program: Running... lvm lvchange -a y vg-ceph-sdah/lv-ceph-sdah --config devices { preferred_names=["^/dev/mapper/", "^/dev/md/", "^/dev/sd"] }
23:24:25,252 INFO program: WARNING: Unrecognised segment type cache-pool+METADATA_FORMAT
23:24:25,253 INFO program: Internal error: _emit_target cannot handle segment type cache-pool+METADATA_FORMAT
23:24:25,253 DEBUG program: Return code: 5
The only option that leaves is to write a bunch of zeros and hope that it hits all the relevant metadata. Maybe the LVM team can advise us on how to handle this.
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. As for comment 13 "Internal error: _emit_target ...." means the 'lvm2' command executed there was NOT compiled with newer cache format2 support. So possibly some stale binary present in that environment - should be fixed by placing newer lvm2 binary there. Other way is to avoid using newer cache format 2 - if the system should be kept backward compatible with older systems and kernels without this support. |
Created attachment 1343782 [details] Anaconda Log Description of problem: While trying to kickstart and install RHEL on disks with certain types of existing LVM metadata Anaconda will fail. This is evident in previous deployments that might use GlusterFS or Ceph that construct certain types of LVM metadata on disk. Manually running the following will fix this, but it requires manual intervention and then subsequent kickstart will complete. pvremove /dev/* --force -y for disk in $(ls /dev/{sd*,nv*}); do wipefs -a -f $disk; done We currently use the following in our kickstarts to wipe disks prior but this doesn't seem to help --snip-- zerombr clearpart --all --initlabel --snip-- Steps to Reproduce: 1. Deploy Ceph or some combination of Ceph + GlusterFS 2. Try to kickstart the existing machine/disks Actual results: Anaconda fails (see attached logs) Expected results: Kickstart as usual. Additional info: