Bug 1911535

Summary: OSP 13 undercloud recovery wasn't possible because XFS filesystems were created with reflink=1
Product: Red Hat Enterprise Linux 8 Reporter: Vagner Farias <vfarias>
Component: rearAssignee: Pavel Cahyna <pcahyna>
Status: CLOSED NOTABUG QA Contact: CS System Management SST QE <rhel-cs-system-management-subsystem-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.2CC: elicohen, jbadiapa, joflynn, kthakre, ovasik, pcahyna, pveiga
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-02 13:41:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1916851, 1921668    

Description Vagner Farias 2020-12-29 22:52:03 UTC
Description of problem:
OSP 13 to OSP 16.1 upgrade process recommends the usage of ReaR to backup undercloud and control plane. The backup image for the undercloud was generated as instructed [1] but during recovery the filesystems couldn't be mounted. Errors shown in dmesg output were similar to:

[  153.748608] XFS (sdb1): Superblock has unknown read-only compatible features (0x4) enabled.
[  153.748663] XFS (sdb1): Attempted to mount read-only compatible filesystem read-write.
[  153.748669] XFS (sdb1): Filesystem can only be safely mounted read only.

It was eventually identified that XFS filesystems were created with reflink=1, which won't work on RHEL 7 (OSP 13).

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/undercloud_and_control_plane_back_up_and_restore/execute-the-back-up-procedure-osp-ctlplane-br#back-up-the-undercloud-osp-ctlplane-br


Version-Release number of selected component (if applicable):
rear-2.4

How reproducible:
Happened at least once in this environment. Unable to tell if it always happens.

Steps to Reproduce:
1. Install rear on OSP 13 undercloud based on RHEL 7. 
  https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/undercloud_and_control_plane_back_up_and_restore/install-and-configure-rear-osp-ctlplane-br#install-the-required-packages-osp-ctlplane-br

2. Create configuration file
  https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/undercloud_and_control_plane_back_up_and_restore/install-and-configure-rear-osp-ctlplane-br#create-the-configuration-files-osp-ctlplane-br

3. Perform the backup
  rear -d -v mkbackup

4. Boot from the generated ISO and recover
  rear recover

Actual results:
XFS filesystems are created with reflink=1

Expected results:
XFS filesystems should be created with reflink=0 so that they work on RHEL 7 (OSP 13).


Additional info:
We couldn't find the proper way of instructing rear to set "-m reflink=0", so we hard coded the change in every line that had "mkfs.xfs" in /usr/share/rear/layout/prepare/GNU/Linux/130_include_filesystem_code.sh.

Last but not least, I was not sure on which product/component to use when creating this bz.

Comment 3 Pavel Cahyna 2021-01-04 08:30:43 UTC
Hello, is RHEL 8 involved in any way? From your description it looks that RHEL 7 is involved. Please specify the complete name-version-release of the package, as printed by the rpm utility (you gave only rear-2.4, which does not contain the release part that I could use to identify the exact package build.) I don't see then how you can set -m reflink on RHEL 7, and how it is possible that ReaR has created a filesystem with reflink on RHEL 7, because mkfs.xfs does not support this. From further conversation, it looks that you have booted a RHEL 7 kernel on a RHEL 8 system, is that the case? What does it mean "During controller leapp"? Does leapp refer to the RHEL-7 > RHEL-8 upgrade process?

Comment 4 Priscila 2021-01-04 14:48:05 UTC
Leapp[1] is a process from the framwork to upgrade from OSP13 to OSP16.1, so, the entire environment from RHEL7 to RHEL8 

"The long-life Red Hat OpenStack Platform upgrade also requires an upgrade from Red Hat Enterprise Linux 7 to Red Hat Enterprise Linux 8. Red Hat Enterprise Linux 7 includes a tool named leapp, which performs the upgrade to Red Hat Enterprise Linux 8. Both the undercloud and overcloud use a separate process for performing the operating system upgrade."



[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/framework_for_upgrades_13_to_16.1/index#leapp-upgrade-usage-in-red-hat-openstack-platform

Comment 5 Vagner Farias 2021-01-05 15:41:45 UTC
@Juan identified the root cause, which I could confirm.

Timeline was like the following:
- day 1
  . rear mkbackup generated rescue ISO image and system backup
  . undercloud upgrade from 13 to 16.1 (ie RHEL 7.9 to RHEL 8.2)
  . several problems to upgrade overcloud
- day 2
  . tentatives to fix problems to upgrade overcloud were unsuccessful.
  . download undercloud rescue ISO image from NFS server to consultant laptop
  . boot from undercloud rescue ISO image and recover from backup

However, there's a cron job that runs everyday at 1:30am that generates a new rescue image if there were changes in disk layout.

~~~
# cat /etc/cron.d/rear 
30 1 * * * root /usr/sbin/rear checklayout || /usr/sbin/rear mkrescue
~~~

This means that between day 1 and day 2 the rescue ISO image was regenerated while RHEL 8.2 was running, thus generating a RHEL 8.2 image.

~~~
Dec 24 01:30:01 os2001 CROND[176657]: (root) CMD (/usr/sbin/rear checklayout || /usr/sbin/rear mkrescue)
~~~

The log file for "rear mkrescue" is already overwritten and we can only observe "checklayout" being executed in /var/log/rear/rear-director.log, but the symptom indicates that mkrescue was indeed executed and current ReaR configuration overwrites the existing rescue image. 

~~~
# grep ^ISO_PREFIX /etc/rear/local.conf 
ISO_PREFIX=director
~~~

It seems there's nothing wrong with ReaR. Instead, the OpenStack backup documentation[1] could be improved to suggest a configuration that could avoid the image to be overwritten. I did one test adding variables to ISO_PREFIX and it worked, but I'm not sure this is the best way of doing it. See below my configuration file and the output:

~~~
# cat /etc/rear/local.conf
export DATETIME=$(date +%Y%m%d-%H%M)
export RHELRELEASE=$(lsb_release -rs)
OUTPUT=ISO
OUTPUT_URL=nfs://172.16.110.1/home/export/ctl_plane_backups
ISO_PREFIX=director-${RHELRELEASE}-${DATETIME}
BACKUP=NETFS
BACKUP_PROG_COMPRESS_OPTIONS=( --gzip )
BACKUP_PROG_COMPRESS_SUFFIX=".gz"
BACKUP_PROG_EXCLUDE=( '/tmp/*' '/data/*' )
BACKUP_URL=nfs://172.16.110.1/home/export/ctl_plane_backups
BACKUP_PROG_EXCLUDE=("${BACKUP_PROG_EXCLUDE[@]}" '/media' '/var/tmp' '/var/crash')
BACKUP_PROG_OPTIONS+=( --anchored --xattrs-include='*.*' --xattrs )

-- 
[root@tesla director]# pwd
/home/export/ctl_plane_backups/director
[root@tesla director]# ls -l
total 226440
-rw-------. 1 root root 231116800 Jan  5 12:34 director-7.7-20210105-1030.iso
-rw-------. 1 root root       202 Jan  5 12:34 README
-rw-------. 1 root root    746270 Jan  5 12:34 rear-director.log
-rw-------. 1 root root       273 Jan  5 12:34 VERSION
~~~

I can't tell if this will always work, so more comprehensive testing should be done.


[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/undercloud_and_control_plane_back_up_and_restore/install-and-configure-rear-osp-ctlplane-br#create-the-configuration-files-osp-ctlplane-br

Comment 6 Juan Badia Payno 2021-01-14 16:02:07 UTC
Adding a couple of thoughts/tests:
- On Vagner's comment#5 only creates the iso accordingly with the date, but the data of the filesystem will be overwritten. We can add the "BACKUP_URL=iso:///backup/" modification to the local.conf so the data of the filesystem will be added to the iso image. These may make the iso image too big.
From comment#5
export DATETIME=$(date +%Y%m%d-%H%M)
export RHELRELEASE=$(lsb_release -rs)
OUTPUT=ISO
BACKUP=NETFS
ISO_PREFIX=director-${RHELRELEASE}-${DATETIME}
BACKUP_URL=iso:///backup/  # this is the line added.

- Another option, BACKUP_PROG_ARCHIVE, will create backup-DATETIME file:
BACKUP_PROG_ARCHIVE=backup-${DATETIME} 

- A little bit further, these bellow lines will create the directory /DIR_BACKUP/${HOSTNAME}-${DATETIME}/ for every backup. There is a drawback here, the backup.tar.gz (system backup) needs to be manually specify on the restoration as it will use the restoration date instead of the backup day. Just modifying the [export DATETIME="20201231-2250"] on the /etc/rear/local.conf before restore the filesystem.
NETFS_PREFIX=${HOSTNAME}-${DATETIME}
OUTPUT_PREFIX=${HOSTNAME}-${DATETIME}

Comment 7 Vagner Farias 2021-01-14 17:45:14 UTC
I had the impression that mkrescue wouldn't generate the backup again, but only the rescue image. At least this was my experience. 

Regardless, I do like the idea of having versioned backups. On another engagement I was renaming the files myself to ensure I had more than one version of the backup.

Comment 8 Pavel Cahyna 2021-01-14 17:58:07 UTC
Concerning versioned backups, please see bz1896239, ReaR has some possibilities for that, but they are not very well documented.

Comment 9 Pavel Cahyna 2021-02-24 10:20:55 UTC
Hello, is there any problem with ReaR that is not covered by other bugs, like bz1896239 for versioned backups? If not I would like to close the bug.

Comment 10 Vagner Farias 2021-02-24 13:34:09 UTC
From my perspective, this bug can be closed. Documentation bugzillas (bz1916851 and bz1921668) were created to address the problem described in comment#0.