Bug 2155253
| Summary: | Can not recover a host since disk layout recreation script fails | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Roman Safronov <rsafrono> | |
| Component: | rear | Assignee: | Pavel Cahyna <pcahyna> | |
| Status: | ASSIGNED --- | QA Contact: | CS System Management SST QE <rhel-cs-system-management-subsystem-qe> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 9.1 | CC: | dchinner, dranck, ekuris, esandeen, pcahyna | |
| Target Milestone: | rc | Flags: | ekuris:
needinfo?
(esandeen) |
|
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2210773 (view as bug list) | Environment: | ||
| Last Closed: | Type: | Bug | ||
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1823324 | |||
| Bug Blocks: | 2210773 | |||
Hello, can you please provide the ReaR layout (files under /var/lib/rear/layout , esp. disklayout.conf)? Can you please also attach the .cfg file found under /var/lib/rear/layout/lvm/ ? Is the problem reproducible on other systems? If so, can you please attach the full debug log from rear savelayout or rear mkrescue on a system where the problem occurs? (The debug log is produced using the -D flag, so the complete command is "rear -D savelayout", and the log is then found under /var/log/rear .) If you still have the system where the backup was produced, can you please provide the output of the command "lvm lvs --separator=: --noheadings --units b --nosuffix -o origin,lv_name,vg_name,lv_size,lv_layout,pool_lv,chunk_size,stripes,stripe_size,seg_size" ? According to my preliminary investigation, it seems that the problem is here: Volume group "vg" has insufficient free space (16219 extents): 16226 required. and the following errors are not relevant. The problem may have something to do with lv_thinpool being almost as big as the whole VG, but there should still be some space available. Please also try, before running "rear recover", to manually change the size of lv_thinpool to 68027416576b. I.e. change the line lvmvol /dev/vg lv_thinpool 68056776704b thin,pool chunksize:65536b in /var/lib/rear/layout/disklayout.conf to lvmvol /dev/vg lv_thinpool 68027416576b thin,pool chunksize:65536b I don't know whether it is related but I see a one more suspicious printing at the very beginning, one of logical volumes config is not supported by vgcfgrestore, i.e see below: ... Comparing disks Device vda has expected (same) size 68719476736 bytes (will be used for 'recover') Disk configuration looks identical Proceed with 'recover' (yes) otherwise manual disk layout configuration is enforced (default 'yes' timeout 30 seconds) yes User confirmed to proceed with 'recover' Layout 'thin,sparse' of LV 'lv_audit' in VG '/dev/vg' not supported by vgcfgrestore <--- THIS LINE Start system layout restoration. Disk '/dev/vda': creating 'gpt' partition table Disk '/dev/vda': creating partition number 1 with name ''ESP'' Disk '/dev/vda': creating partition number 2 with name ''BSP'' Disk '/dev/vda': creating partition number 3 with name ''boot'' Disk '/dev/vda': creating partition number 4 with name ''root'' Disk '/dev/vda': creating partition number 5 with name ''vda5'' Disk '/dev/vda': creating partition number 6 with name ''growvols'' Creating LVM PV /dev/vda4 Creating LVM PV /dev/vda6 Creating LVM VG 'vg'; Warning: some properties may not be preserved... Creating LVM volume 'vg/lv_thinpool'; Warning: some properties may not be preserved... The disk layout recreation script failed See below some of requested outputs [cloud-admin@controller-0 ~]$ sudo lvm lvs --separator=: --noheadings --units b --nosuffix -o origin,lv_name,vg_name,lv_size,lv_layout,pool_lv,chunk_size,stripes,stripe_size,seg_size :lv_audit:vg:1199570944:thin,sparse:lv_thinpool:0:0:0:1199570944 :lv_home:vg:1249902592:thin,sparse:lv_thinpool:0:0:0:1249902592 :lv_log:vg:3250585600:thin,sparse:lv_thinpool:0:0:0:3250585600 :lv_root:vg:11299454976:thin,sparse:lv_thinpool:0:0:0:11299454976 :lv_srv:vg:10049552384:thin,sparse:lv_thinpool:0:0:0:10049552384 :lv_thinpool:vg:68056776704:thin,pool::65536:1:0:68056776704 :lv_tmp:vg:1249902592:thin,sparse:lv_thinpool:0:0:0:1249902592 :lv_var:vg:39774584832:thin,sparse:lv_thinpool:0:0:0:39774584832 Change the size of lv_thinpool to 68027416576b did not help, the error is similar, just numbers are different: +++ lvm lvcreate -y --chunksize 65536b --type thin-pool -L 68027416576b --thinpool lv_thinpool vg Thin pool volume with chunk size 64.00 KiB can address at most <15.88 TiB of data. Insufficient free space: 16235 extents needed, but only 16219 available I think I know what's wrong. The command lvm lvcreate -y --chunksize 65536b --type thin-pool -L 68056776704b --thinpool lv_thinpool vg is trying to create a thin pool of exactly the same size as was in the original system. The size of the metadata volume is chosen automatically, but in the original system, it had only 8MB and the chosen new size is larger. As there was very little space available in the VG of the old system, the chosen new size of the metadata volume does not leave enough space in the VG for the data volume. Can you please repeat the recovery process, and when it asks you to confirm or edit the /var/lib/rear/layout/diskrestore.sh script, choose to edit it, and change the line lvm lvcreate -y --chunksize 65536b --type thin-pool -L 68056776704b --thinpool lv_thinpool vg to lvm lvcreate -y --chunksize 65536b --poolmetadatasize 8M --type thin-pool -L 68056776704b --thinpool lv_thinpool vg ? (this should be done before it fails the first time, so you will need to do a manual recovery procedure where ReaR asks you for confirmation, currently you seem to be launching it in an unattended way.) (when I said repeat the recovery process, I meant from the start, i.e. the boot of the rescue medium.) I tried specified in comment 19, the command passed but the diskrestore.sh script still fails, see the attached rear-controller-0.log_20221222 This seems to be an unrelated problem. I tried the problematic command mkfs.xfs -f -m uuid=23ce7347-fce3-48b4-9854-60a6db155b16 -i size=512 -d agcount=400 -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=128 -d swidth=128 -l version=2 -l sunit=128 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/rhel_kvm--08--guest09-lv_srv and it dumps core for me as well, so it is easily reproducible. According to the assertion, it seems that it does not like "-d agcount=400". Indeed, when I change "-d agcount=400" to "-d agcount=40", the command passes. Now, the question is how agcount=400 got there. Can you please provide the content of /var/lib/rear/layout/xfs/vg-lv_srv.xfs ? I suppose it will also have "agcount=400". Assuming this is the case, the question is, how did it get there? The file seems to be merely the output of "xfs_info /srv". If that's the case, how could /srv have been created with agcount=400 if mkfs.xfs rejects this value? Has the VM in question been upgraded from an earlier version of RHEL? I am thinking that maybe the filesystem was created with an older version of mkfs.xfs that allowed this and the assertion was added to the code later. This could also explain the thin pool problem, because a similar question arises: how could the thin pool have been created with such a small metadata volume, when the default is a larger metadata volume? Maybe it was created when the default was different, and new LVM with the same parameters now creates a different layout? Or have you provided the (small) metadata volume size manually when creating the volume for the first time? contents of /var/lib/rear/layout/xfs/vg-lv_srv.xfs
meta-data=/dev/mapper/vg-lv_srv isize=512 agcount=400, agsize=6144 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=1 inobtcount=1
data = bsize=4096 blocks=2453504, imaxpct=25
= sunit=16 swidth=16 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=1872, version=2
= sectsz=512 sunit=16 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Regarding the source of agcount=400 I am not sure, I am just using an environment deployed by CI. IIUC, openstack nodes are provisioned using overcloud-hardened-uefi-full.raw image which has a pre-defined disk layout.
the manual page states that agcount and agsize are mutually exclusive. I tried to use your value of agsize and let the command deduce agcount:
mkfs.xfs -f -m uuid=23ce7347-fce3-48b4-9854-60a6db155b16 -i size=512 -d agsize=6144b -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=128 -d swidth=128 -l version=2 -l sunit=128 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/rhel_kvm--08--guest09-lv_srv
this passes. The resulting filesystem has these parameters:
meta-data=/dev/mapper/rhel_kvm--08--guest09-lv_srv isize=512 agcount=399, agsize=6144 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=1 inobtcount=1
data = bsize=4096 blocks=2451456, imaxpct=25
= sunit=16 swidth=16 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=16 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Note that it is using agcount=399. I noticed that the size of the log section is different. I tried to match it by adding "-l size=1872b":
mkfs.xfs -f -m uuid=23ce7347-fce3-48b4-9854-60a6db155b16 -i size=512 -d agsize=6144b -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=128 -d swidth=128 -l version=2 -l sunit=128 -l lazy-count=1 -l size=1872b -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/rhel_kvm--08--guest09-lv_srv
the result has:
meta-data=/dev/mapper/rhel_kvm--08--guest09-lv_srv isize=512 agcount=399, agsize=6144 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=1 inobtcount=1
data = bsize=4096 blocks=2451456, imaxpct=25
= sunit=16 swidth=16 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=1872, version=2
= sectsz=512 sunit=16 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
so, still a bit different, and agcount=399. My conclusion is that it is not feasible to 100% match all the parameters of the original file system in the recreated file system. Not sure why, maybe your image was created using a different version of mkfs.xfs. And for some reason forcing it to match agcount triggers an assertion, and matching agsize works better.
This is in some sense analogous to the LVM problem that we discussed first. The combination of parameters deduced from the original layout does not work 100% when creating the new layout.
Are your images going to be used by customers, or are they produced only for internal use?
BTW, I found an existing bug for handling very small thin pool metadata volume size https://bugzilla.redhat.com/show_bug.cgi?id=2149586 |
Description of problem: Can not recover a host from backup iso. On attempt to recover getting a message "disk layout recreation script failed" From /var/log/rear/rear-controller-0.log 2022-12-20 13:19:59.132236169 Creating LVM volume 'vg/lv_thinpool'; Warning: some properties may not be preserved... +++ Print 'Creating LVM volume '\''vg/lv_thinpool'\''; Warning: some properties may not be preserved...' +++ lvm lvcreate -y --chunksize 65536b --type thin-pool -L 68056776704b --thinpool lv_thinpool vg Thin pool volume with chunk size 64.00 KiB can address at most <15.88 TiB of data. Volume group "vg" has insufficient free space (16219 extents): 16226 required. .... +++ LogPrint 'Creating filesystem of type xfs with mount point / on /dev/mapper/vg-lv_root.' +++ Log 'Creating filesystem of type xfs with mount point / on /dev/mapper/vg-lv_root.' +++ echo '2022-12-20 13:38:29.466548728 Creating filesystem of type xfs with mount point / on /dev/mapper/vg-lv_root.' 2022-12-20 13:38:29.466548728 Creating filesystem of type xfs with mount point / on /dev/mapper/vg-lv_root. +++ Print 'Creating filesystem of type xfs with mount point / on /dev/mapper/vg-lv_root.' +++ wipefs --all --force /dev/mapper/vg-lv_root +++ mkfs.xfs -f -m uuid=1cf3d69c-7dfe-40ab-b6a7-e6110912489e -i size=512 -d agcount=28 -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=128 -d swidth=128 -l version=2 -l sunit=128 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/vg-lv_root mkfs.xfs: xfs_mkfs.c:2703: validate_datadev: Assertion `cfg->dblocks' failed. /var/lib/rear/layout/diskrestore.sh: line 323: 4142 Aborted (core dumped) mkfs.xfs -f -m uuid=1cf3d69c-7dfe-40ab-b6a7-e6110912489e -i size=512 -d agcount=28 -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=128 -d swidth=128 -l version=2 -l sunit=128 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/vg-lv_root 1>&2 +++ mkfs.xfs -f -i size=512 -d agcount=28 -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=128 -d swidth=128 -l version=2 -l sunit=128 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/vg-lv_root mkfs.xfs: xfs_mkfs.c:2703: validate_datadev: Assertion `cfg->dblocks' failed. /var/lib/rear/layout/diskrestore.sh: line 323: 4144 Aborted (core dumped) mkfs.xfs -f -i size=512 -d agcount=28 -s size=512 -i attr=2 -i projid32bit=1 -m crc=1 -m finobt=1 -b size=4096 -i maxpct=25 -d sunit=128 -d swidth=128 -l version=2 -l sunit=128 -l lazy-count=1 -n size=4096 -n version=2 -r extsize=4096 /dev/mapper/vg-lv_root 1>&2 Version-Release number of selected component (if applicable): Relax-and-Recover 2.6 / 2020-06-17 Red Hat Enterprise Linux release 9.1 (Plow) Host is a KVM virtual machine with UEFI, os section <os> <type arch='x86_64' machine='pc-q35-rhel7.6.0'>hvm</type> <loader readonly='yes' secure='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.secboot.fd</loader> <nvram>/var/lib/libvirt/qemu/nvram/controller-0_VARS.fd</nvram> <boot dev='hd'/> </os> How reproducible: 100% Steps to Reproduce: 1. Backup a host 2. Try to recover the host from the backup Actual results: Recovery fails complaining that disk layout recreation script failed Expected results: Recovery completed successfully Additional info: local.conf export TMPDIR="${TMPDIR-/var/tmp}" ISO_DEFAULT="automatic" OUTPUT=ISO BACKUP=NETFS BACKUP_PROG_COMPRESS_OPTIONS=( --gzip) BACKUP_PROG_COMPRESS_SUFFIX=".gz" OUTPUT_URL=nfs://192.168.24.1/ctl_plane_backups ISO_PREFIX=$HOSTNAME-202212201022 BACKUP_URL=nfs://192.168.24.1/ctl_plane_backups BACKUP_PROG_CRYPT_ENABLED=False BACKUP_PROG_OPTIONS+=( --anchored --xattrs-include='*.*' --xattrs ) BACKUP_PROG_EXCLUDE=( '/data/*' '/tmp/*' '/ctl_plane_backups/*' ) EXCLUDE_RECREATE+=( "/dev/cinder-volumes" ) USING_UEFI_BOOTLOADER=1 LOGFILE="$LOG_DIR/rear-$HOSTNAME-202212201022.log" [cloud-admin@controller-0 ~]$ lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,LABEL,SIZE,MOUNTPOINT NAME KNAME PKNAME TRAN TYPE FSTYPE LABEL SIZE MOUNTPOINT /dev/loop0 /dev/loop0 loop LVM2_member 20.1G /dev/vda /dev/vda disk 64G |-/dev/vda1 /dev/vda1 /dev/vda part vfat MKFS_ESP 16M /boot/efi |-/dev/vda2 /dev/vda2 /dev/vda part 8M |-/dev/vda3 /dev/vda3 /dev/vda part ext4 mkfs_boot 500M /boot |-/dev/vda4 /dev/vda4 /dev/vda part LVM2_member 5G | |-/dev/mapper/vg-lv_thinpool_tmeta /dev/dm-0 /dev/vda4 lvm 8M | | `-/dev/mapper/vg-lv_thinpool-tpool /dev/dm-2 /dev/dm-0 lvm 63.4G | | |-/dev/mapper/vg-lv_thinpool /dev/dm-3 /dev/dm-2 lvm 63.4G | | |-/dev/mapper/vg-lv_root /dev/dm-4 /dev/dm-2 lvm xfs img-rootfs 10.5G / | | |-/dev/mapper/vg-lv_tmp /dev/dm-5 /dev/dm-2 lvm xfs fs_tmp 1.2G /tmp | | |-/dev/mapper/vg-lv_var /dev/dm-6 /dev/dm-2 lvm xfs fs_var 37G /var | | |-/dev/mapper/vg-lv_log /dev/dm-7 /dev/dm-2 lvm xfs fs_log 3G /var/log | | |-/dev/mapper/vg-lv_audit /dev/dm-8 /dev/dm-2 lvm xfs fs_audit 1.1G /var/log/audit | | |-/dev/mapper/vg-lv_home /dev/dm-9 /dev/dm-2 lvm xfs fs_home 1.2G /home | | `-/dev/mapper/vg-lv_srv /dev/dm-10 /dev/dm-2 lvm xfs fs_srv 9.4G /srv | `-/dev/mapper/vg-lv_thinpool_tdata /dev/dm-1 /dev/vda4 lvm 63.4G | `-/dev/mapper/vg-lv_thinpool-tpool /dev/dm-2 /dev/dm-1 lvm 63.4G | |-/dev/mapper/vg-lv_thinpool /dev/dm-3 /dev/dm-2 lvm 63.4G | |-/dev/mapper/vg-lv_root /dev/dm-4 /dev/dm-2 lvm xfs img-rootfs 10.5G / | |-/dev/mapper/vg-lv_tmp /dev/dm-5 /dev/dm-2 lvm xfs fs_tmp 1.2G /tmp | |-/dev/mapper/vg-lv_var /dev/dm-6 /dev/dm-2 lvm xfs fs_var 37G /var | |-/dev/mapper/vg-lv_log /dev/dm-7 /dev/dm-2 lvm xfs fs_log 3G /var/log | |-/dev/mapper/vg-lv_audit /dev/dm-8 /dev/dm-2 lvm xfs fs_audit 1.1G /var/log/audit | |-/dev/mapper/vg-lv_home /dev/dm-9 /dev/dm-2 lvm xfs fs_home 1.2G /home | `-/dev/mapper/vg-lv_srv /dev/dm-10 /dev/dm-2 lvm xfs fs_srv 9.4G /srv |-/dev/vda5 /dev/vda5 /dev/vda part iso9660 config-2 65M `-/dev/vda6 /dev/vda6 /dev/vda part LVM2_member 58.5G `-/dev/mapper/vg-lv_thinpool_tdata /dev/dm-1 /dev/vda6 lvm 63.4G `-/dev/mapper/vg-lv_thinpool-tpool /dev/dm-2 /dev/dm-1 lvm 63.4G |-/dev/mapper/vg-lv_thinpool /dev/dm-3 /dev/dm-2 lvm 63.4G |-/dev/mapper/vg-lv_root /dev/dm-4 /dev/dm-2 lvm xfs img-rootfs 10.5G / |-/dev/mapper/vg-lv_tmp /dev/dm-5 /dev/dm-2 lvm xfs fs_tmp 1.2G /tmp |-/dev/mapper/vg-lv_var /dev/dm-6 /dev/dm-2 lvm xfs fs_var 37G /var |-/dev/mapper/vg-lv_log /dev/dm-7 /dev/dm-2 lvm xfs fs_log 3G /var/log |-/dev/mapper/vg-lv_audit /dev/dm-8 /dev/dm-2 lvm xfs fs_audit 1.1G /var/log/audit |-/dev/mapper/vg-lv_home /dev/dm-9 /dev/dm-2 lvm xfs fs_home 1.2G /home `-/dev/mapper/vg-lv_srv /dev/dm-10 /dev/dm-2 lvm xfs fs_srv 9.4G /srv The issue was found during openstack control plane nodes backup and recovery, link to procedure: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.0/html/backing_up_and_restoring_the_undercloud_and_control_plane_nodes/assembly_backing-up-the-control-plane-nodes_br-undercloud-ctlplane#proc_creating-a-backup-of-the-control-plane-nodes_backup-ctlplane mkdir /tmp/backup-recover-temp/ cp ./overcloud-deploy/overcloud/config-download/overcloud/tripleo-ansible-inventory.yaml /tmp/backup-recover-temp/tripleo-inventory.yaml source /home/stack/stackrc openstack overcloud backup --inventory /tmp/backup-recover-temp/tripleo-inventory.yaml --setup-nfs --extra-vars '{"tripleo_backup_and_restore_server": 192.168.24.1,"nfs_server_group_name": Undercloud}' openstack overcloud backup --inventory /tmp/backup-recover-temp/tripleo-inventory.yaml --setup-rear --extra-vars '{"tripleo_backup_and_restore_server": 192.168.24.1}' openstack overcloud backup --inventory /tmp/backup-recover-temp/tripleo-inventory.yaml