Bug 1450667
Summary: | ReaR recovery fails when the OS contains a Thin Pool/Volume | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jesús Serrano Sánchez-Toscano <jserrano> | ||||||
Component: | rear | Assignee: | Pavel Cahyna <pcahyna> | ||||||
Status: | CLOSED ERRATA | QA Contact: | David Jež <djez> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 7.3 | CC: | agk, aglotov, djez, gratien.dhaese, jserrano, ofamera, ovasik, pcahyna, rmetrich, tcerna, yiwu | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | rear-2.4-1.el7 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2018-10-30 11:43:19 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Jesús Serrano Sánchez-Toscano
2017-05-14 13:40:22 UTC
Created attachment 1278655 [details]
TEST3-ERROR2_rear-fastvm-r7-3-60.log
Log of the fail Rear recover process due to missing binary /usr/sbin/thin_check
Comment on attachment 1278654 [details]
TEST3-ERROR1_rear-fastvm-r7-3-60.log
Log of ReaR recovery due to missing --force option in vgcfgrestore command
Additional information from our customer where they attempted to fix the problem by providing missing binaries and symlinks to system. I have verified that recovery process then finishes successfully. ## Changes done in generated ReaR ISO: Binaries copied from source machine: - /usr/sbin/pdata_tools - /lib64/libaio.so.1.0.1 - /lib64/libstdc++.so.6.0.19 Symlinks created in the running ReaR ISO image: /usr/sbin/thin_check -> pdata_tools /usr/sbin/thin_delta -> pdata_tools /usr/sbin/thin_dump -> pdata_tools /usr/sbin/thin_ls -> pdata_tools /usr/sbin/thin_metadata_size -> pdata_tools /usr/sbin/thin_repair -> pdata_tools /usr/sbin/thin_restore -> pdata_tools /usr/sbin/thin_rmap -> pdata_tools /usr/sbin/thin_trim -> pdata_tools /lib64/libaio.so.1 -> /lib64/libaio.so.1.0.1 /lib64/libstdc++.so.6 -> libstdc++.so.6.0.19 Added '--force' to 'vgcfgrestore' command in diskrestore.sh file. I have managed another reproducer with a newer version of ReaR. Although the results are different this time (I got a system which was able to boot), the LVM Thin Pool/Volumes were not restored at all. Details of a new reproducer in my lab using the latest version of ReaR: Version installed: rear-2.00-2.el7.x86_64 Hypervisor: ofamera-devel.usersys.redhat.com Original machine: fvm-rhel-7-3-34 <-- Executed 'rear mkbackup' Recovery machine: fvm-rhel-7-3-38 <-- Executed 'rear recover' NFS backup server: fvm-rhel-7-3-44 User: root Pass: testtest ******************* *** BACKUP TEST *** ******************* RESULT: Backup taken successfully. ISO + backup.tar.gz were sent to the NFS server -> Original system (fvm-rhel-7-3-34): [root@fvm-rhel-7-3-34 ~]# rear -d -v mkbackup Relax-and-Recover 2.00 / Git Using log file: /var/log/rear/rear-fvm-rhel-7-3-34.log Using backup archive 'backup.tar.gz' Creating disk layout Creating root filesystem layout Copying logfile /var/log/rear/rear-fvm-rhel-7-3-34.log into initramfs as '/tmp/rear-fvm-rhel-7-3-34-partial-2017-08-21T10:14:10+0200.log' Copying files and directories Copying binaries and libraries Copying kernel modules Creating initramfs Making ISO image Wrote ISO image: /var/lib/rear/output/rear-fvm-rhel-7-3-34.iso (134M) Copying resulting files to nfs location Saving /var/log/rear/rear-fvm-rhel-7-3-34.log as rear-fvm-rhel-7-3-34.log to nfs location Creating tar archive '/tmp/rear.WaTt8XedE2CpdJ3/outputfs/fvm-rhel-7-3-34/backup.tar.gz' Archived 850 MiB [avg 3349 KiB/sec] OK Archived 850 MiB in 261 seconds [avg 3336 KiB/sec] You should also rm -Rf /tmp/rear.WaTt8XedE2CpdJ3 [root@fvm-rhel-7-3-34 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root_lv r7vg -wi-ao---- 4.88g swap_lv r7vg -wi-ao---- 256.00m lv_thin vg_thin Vwi-a-tz-- 1.00g lv_thinpool 0.00 lv_thinpool vg_thin twi-aotz-- 92.00m 0.00 0.98 -> NFS backup server (fvm-rhel-7-3-44): [root@fvm-rhel-7-3-44 ~]# ls -l /media/backups/fvm-rhel-7-3-34/ total 1010332 -rw-------. 1 nfsnobody nfsnobody 2252128 Aug 21 10:20 backup.log -rw-------. 1 nfsnobody nfsnobody 891757428 Aug 21 10:20 backup.tar.gz -rw-------. 1 nfsnobody nfsnobody 202 Aug 21 10:16 README -rw-------. 1 nfsnobody nfsnobody 140369920 Aug 21 10:15 rear-fvm-rhel-7-3-34.iso -rw-------. 1 nfsnobody nfsnobody 183744 Aug 21 10:16 rear-fvm-rhel-7-3-34.log -rw-------. 1 nfsnobody nfsnobody 0 Aug 21 10:20 selinux.autorelabel -rw-------. 1 nfsnobody nfsnobody 273 Aug 21 10:16 VERSION ********************* *** RECOVERY TEST *** ********************* RESULT: It recovered the system (it was able to boot afterwards) but only the OS itself (root VG/LVs), not the LVM Thin Pool/Volumes in the second disk -> Original system (fvm-rhel-7-3-34): [jserrano@ofamera-devel ~]$ fast-vm ssh 34 [inf] checking the 192.168.33.34 for active SSH connection (ctrl+c to interrupt) [inf] SSH ready Warning: Permanently added '192.168.33.34' (ECDSA) to the list of known hosts. System is booting up. See pam_nologin(8) Last login: Mon Aug 21 12:35:16 2017 from gateway [root@fvm-rhel-7-3-34 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root_lv r7vg -wi-ao---- 4.88g swap_lv r7vg -wi-ao---- 256.00m lv_thin vg_thin Vwi-a-tz-- 1.00g lv_thinpool 0.00 lv_thinpool vg_thin twi-aotz-- 92.00m 0.00 0.98 [root@fvm-rhel-7-3-34 ~]# [root@fvm-rhel-7-3-34 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 6G 0 disk ├─sda1 8:1 0 500M 0 part /boot └─sda2 8:2 0 5.5G 0 part ├─r7vg-root_lv 253:0 0 4.9G 0 lvm / └─r7vg-swap_lv 253:1 0 256M 0 lvm [SWAP] sdb 8:16 0 102.4M 0 disk ├─vg_thin-lv_thinpool_tmeta 253:2 0 4M 0 lvm │ └─vg_thin-lv_thinpool-tpool 253:4 0 92M 0 lvm │ ├─vg_thin-lv_thinpool 253:5 0 92M 0 lvm │ └─vg_thin-lv_thin 253:6 0 1G 0 lvm └─vg_thin-lv_thinpool_tdata 253:3 0 92M 0 lvm └─vg_thin-lv_thinpool-tpool 253:4 0 92M 0 lvm ├─vg_thin-lv_thinpool 253:5 0 92M 0 lvm └─vg_thin-lv_thin 253:6 0 1G 0 lvm sr0 11:0 1 1024M 0 rom -> Recovery machine (fvm-rhel-7-3-38) -after 'rear recover'-: [jserrano@ofamera-devel ~]$ fast-vm ssh 38 [inf] checking the 192.168.33.38 for active SSH connection (ctrl+c to interrupt) [inf] SSH ready Warning: Permanently added '192.168.33.38' (ECDSA) to the list of known hosts. Last login: Mon Aug 21 12:26:46 2017 [root@fvm-rhel-7-3-34 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root_lv r7vg -wi-ao---- 4.88g swap_lv r7vg -wi-ao---- 256.00m [root@fvm-rhel-7-3-34 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 6G 0 disk ├─sda1 8:1 0 500M 0 part /boot └─sda2 8:2 0 5.5G 0 part ├─r7vg-root_lv 253:0 0 4.9G 0 lvm / └─r7vg-swap_lv 253:1 0 256M 0 lvm [SWAP] sdb 8:16 0 102.4M 0 disk Please, let me know if I have missed anything on my test. *** Bug 1500632 has been marked as a duplicate of this bug. *** When LVM Thin Pool is part of the rootvg, then no recovery can ever succeed. Tested with rear-2.00-2.el7.x86_64. Initial error: lvm vgcfgrestore -f /var/lib/rear/layout/lvm/rhel.cfg vgroot Consider using option --force to restore Volume Group rhel with thin volumes. Restore failed. Trying to use the "--force" option by modifying /var/lib/rear/layout/diskrestore.sh doesn't help. It then fails checking the Thin volume, because /usr/sbin/thin_check is not part of the ReaR image by default. Finally, after adding /usr/sbin/thin_check to the ReaR image (using REQUIRED_PROGS variable), it still fails when trying to make the VG available: lvm vgchange --available y rhel WARNING: Failed to connect to lvmetad. Falling back to device scanning. Monitoring rhel/pool00 failed. device-mapper: reload ioctl on (252:5) failed: No data available 2 logical volume(s) in volume group "rhel" now active I've then stopped investigation at this step. Steps to Reproduce: 1. Install a VM, selecting "LVM Thin Provisioning" instead of "LVM" for root LV 2. Create ReaR rescue image 3. Try restoring the disk layout @Renaud Métrich: Please, notice that the recovery fails also when the LVM Thin Pool is *not* part of the rootvg. Refer to the "RECOVERY TEST" section from my latest test in https://bugzilla.redhat.com/show_bug.cgi?id=1450667#c7 [root@fvm-rhel-7-3-34 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 6G 0 disk ├─sda1 8:1 0 500M 0 part /boot └─sda2 8:2 0 5.5G 0 part ├─r7vg-root_lv 253:0 0 4.9G 0 lvm / └─r7vg-swap_lv 253:1 0 256M 0 lvm [SWAP] sdb 8:16 0 102.4M 0 disk ├─vg_thin-lv_thinpool_tmeta 253:2 0 4M 0 lvm │ └─vg_thin-lv_thinpool-tpool 253:4 0 92M 0 lvm │ ├─vg_thin-lv_thinpool 253:5 0 92M 0 lvm │ └─vg_thin-lv_thin 253:6 0 1G 0 lvm └─vg_thin-lv_thinpool_tdata 253:3 0 92M 0 lvm └─vg_thin-lv_thinpool-tpool 253:4 0 92M 0 lvm ├─vg_thin-lv_thinpool 253:5 0 92M 0 lvm └─vg_thin-lv_thin 253:6 0 1G 0 lvm sr0 11:0 1 1024M 0 rom The best approach would be with writing a prep script which identifies that Thin LVM is in use and copy the required binaries to the rescue image. It is important that we have the required binaries within the rescue image to start with. Once that has been done we can identify if more steps are required. The following needs to be added to /etc/rear/local.conf: REQUIRED_PROGS=( "${REQUIRED_PROGS[@]}" thin_dump thin_restore thin_check thin_repair ) LIBS=( "${LIBS[@]}" /usr/lib64/*lvm2* ) Also, "vgcfgrestore" has to be replaced by "vgcfgrestore --force" in /usr/share/rear/layout/prepare/GNU/Linux/110_include_lvm_code.sh: lvm vgcfgrestore --force -f "$VAR_DIR/layout/lvm/${vgrp#/dev/}.cfg" ${vgrp#/dev/} >&2 But this is not sufficient. The issue now is with devicemapper. Starting dmeventd in debug (/usr/sbin/dmeventd -f -l -ddd), we can see the following, when trying to activate the VG: VG activation: # lvm vgchange --available y rhel WARNING: Failed to connect to lvmetad. Falling back to device scanning. device-mapper: reload ioctl on (252:5) failed: No data available 2 logical volume(s) in volume group "rhel" now active dmeventd trace: [ 0:10] b08b5700:lvm Locking memory [ 0:10] b08b5700:lvm lvm plugin initilized. [ 0:10] b08b5700:dm dmeventd/thin_command not found in config: defaulting to lvm lvextend --use-policies [ 0:10] b08b5700:thin Monitoring thin pool rhel-pool00-tpool. [ 0:10] b08b5700:dm dm waitevent LVM-hrMgDvLvLxUqxfsCrVZUGQKRKRrKy2tgNJoCljwt0WOPHbEjh06B1FQyxaafHrMu-tpool [ opencount flush ] [16384] (*1) [ 0:20] b08b5700:dm device-mapper: waitevent ioctl on LVM-hrMgDvLvLxUqxfsCrVZUGQKRKRrKy2tgNJoCljwt0WOPHbEjh06B1FQyxaafHrMu-tpool failed: Interrupted system call [ 0:20] b08b5700:dm dm status LVM-hrMgDvLvLxUqxfsCrVZUGQKRKRrKy2tgNJoCljwt0WOPHbEjh06B1FQyxaafHrMu-tpool [ opencount noflush ] [16384] (*1) ... Few comments from lvm2: There is nothing to backup on thin-pool. There is no way to restore thin-pool. Thin-pool consist of set of data chunks (in _tdata LV) and it's mapping in (_tmeta LV). Try to backup these on running live thin-pool makes no sense at all. You could only backup individual active thin LVs. ---- There is no support on lvm2 side for 'relocation' of thin-pool to another machine - and for this task it would be actually needed. ATM The easiest way to copy thin-pool to another machine is to just have thin-pool inactive - activate individual _tdata & _tmeta LVS (this in only possible on git HEAD of lvm2) copy these volumes. And then you would have to extra thin-pool lvm2 metadata to restore all the settings for thin-pool and thin LV in different VG. --- Please do NOT use 'vgcfgrestore --force' in ANY automated tool. Whenever option --force is used - it CAN and MAY destroy data and several kittens may dies as well... Option --force is there for those who know EXACTLY what they are doing and can accept the risk of data loosing. --- For live online thin-pool migration/copying we would need to implement support for 'remote replication' with the use of tool like 'thin_delta' Thanks for the insights. In a nutshell, we must then implement as shown below: 1. During backup - Collect PV, VG and LV usual data (size, etc) - [NEW] Add "thin" attributes for LVs and Pools as well ("lvmvol" lines), e.g. Current layout: lvmvol /dev/rhel pool00 3068 25133056 lvmvol /dev/rhel root 2556 20938752 lvmvol /dev/rhel swap 512 4194304 Missing knowledge: - "pool00" is a thin pool - "root" is hosted on "pool00" - "swap" is hosted on "pool00" --> To be done in /usr/share/rear/layout/save/GNU/Linux/220_lvm_layout.sh 2. During restore - [NEW] Check whether vgcfgrestore can be used or not (depending on thin pool existence? or just let it fail) - [NEW] If there is a thin pool, do not use vgcfgrestore but "legacy" tools (vgcreate / lvcreate) --> To be done in /usr/share/rear/layout/prepare/GNU/Linux/110_include_lvm_code.sh GitHub Pull Request: https://github.com/rear/rear/pull/1806 The proposed code does the following: 1. During backup - Collect additional LV properties origin: originating LV (for cache and snapshots) lv_layout: type of LV (mirror, raid, thin, etc) pool_lv: thin pool hosting LV chunk_size: size of the chunk, for various volumes stripes: number of stripes, for Mirror volumes and Stripes volumes stripe_size: size of a stripe, for Raid volumes - Skip caches and snapshots 2. During restore - If in Migration mode (e.g. different disks but same size), go through the vgcreate/lvcreate code (Legacy Method), printing Warnings because the initial layout may not be preserved (because we do not save all attributes needed for re-creating LVM volumes) - Otherwise, try "vgcfgrestore" - If it fails - Try "vgcfgrestore --force" - If it fails, use vgcreate/lvcreate (Legacy Method) - Otherwise, remove Thin pools (which are broken due to --force flag) - Create Thin pools using Legacy Method (but do not create other LVs which have been succesfully restored) https://github.com/rear/rear/pull/1806 merged into rear:master (ReaR 2.4???). @pavel, we should definitely rebase to 2.4 for RHEL7.6 if we can. merged upstream in 2.4/b8630a6417255393524e8df4c20f3ba24f00b85d Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3293 *** Bug 1672218 has been marked as a duplicate of this bug. *** |