Bug 2228928

Summary: tripleo_ansible clobbers settings that ReaR saves into etc/rear/rescue.conf
Product: Red Hat OpenStack Reporter: Pavel Cahyna <pcahyna>
Component: tripleo-ansibleAssignee: Fernando Díaz <fdiazbra>
Status: ASSIGNED --- QA Contact: Joe H. Rahme <jhakimra>
Severity: high Docs Contact:
Priority: high    
Version: 17.1 (Wallaby)CC: drosenfe, omcgonag
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pavel Cahyna 2023-08-03 16:21:04 UTC
Description of problem:

When debugging the issue described in https://bugzilla.redhat.com/show_bug.cgi?id=2222899#c19 I found out that the UEFI bootloader settings ( USING_UEFI_BOOTLOADER= and UEFI_BOOTLOADER= ) are not properly read from /etc/rear/rescue.conf during rear recovery. It turned out that the file contains this instead:

# This configuration file is generated automatically
# by the backup_and_restore role part of TripleO
# Ansible. Do not edit this file, all changes
# will be lost. Refer to the following URL for
# more information and implementation details:
# https://opendev.org/openstack/tripleo-ansible

BACKUP_PROG_OPTIONS+=( --anchored --xattrs-include='*.*' --xattrs )

and it is present in the system that was backed up. That's not how this file should be used. ReaR creates this file in the rescue image with settings that it has detected, but if the file is present in the system where the image is being produced, it overrides the file in the image. So, this file should not be present in the original system (AFAICT, it is even undocumented, the manual page documents only /etc/rear/local.conf and /etc/rear/site.conf). As a result, the settings that ReaR has autodetected are lost. Moreover, the line in /etc/rear/rescue.conf is useless, because the very same setting is present in /etc/rear/local.conf: 

BACKUP_PROG_OPTIONS+=( --anchored --xattrs-include='*.*' --xattrs )

so the tar arguments during file restore are then duplicated, as can be seen from the recovery debug log:
dd if=/var/tmp/rear.hfo8hrTDR3hREJr/outputfs/osp17-1r1-controller-2/backup.tar.gz | tar --block-number --totals --verbose --anchored --anchored --xattrs-include=*.* --xattrs --anchored --xattrs-include=*.* --xattrs --exclude-from=/var/tmp/rear.hfo8hrTDR3hREJr/tmp/restore-exclude-list.txt --gzip -C /mnt/local/ -x -f -
(fortunately, tar accepts this).

Version-Release number of selected component (if applicable):

The problem has existed since 0.3.0 according to Git: https://opendev.org/openstack/tripleo-ansible/src/tag/0.3.0/tripleo_ansible/roles/backup-and-restore/templates/rescue.conf.j2

How reproducible:

Not sure, I don't have the environment myself.

Steps to Reproduce:
1. Backup and recover controller on an UEFI machine

Actual results:

At the end of recovery, ReaR prints

WARNING:
For this system
RedHatEnterpriseServer/9 on Linux-i386 (based on Fedora/9/i386)
there is no code to install a boot loader on the recovered system
or the code that we have failed to install the boot loader correctly.
Please contribute appropriate code to the Relax-and-Recover project,
see http://relax-and-recover.org/development/
Take a look at the scripts in /usr/share/rear/finalize - for example
for PC architectures like x86 and x86_64 see the script
/usr/share/rear/finalize/Linux-i386/660_install_grub2.sh
and for POWER architectures like ppc64le see the script
/usr/share/rear/finalize/Linux-ppc64le/660_install_grub2.sh
---------------------------------------------------
|  IF YOU DO NOT INSTALL A BOOT LOADER MANUALLY,  |
|  THEN YOUR SYSTEM WILL NOT BE ABLE TO BOOT.     |
---------------------------------------------------
You can use 'chroot /mnt/local bash --login'
to change into the recovered system and
manually install a boot loader therein.


Expected results:

Recovery completes without warnings.

Additional info:

The problematic code is here:

https://opendev.org/openstack/tripleo-ansible/src/commit/e281ae7624774d71f22fbb993af967ed1ec08780/tripleo_ansible/roles/backup_and_restore/tasks/setup_rear.yml#L118

A customer has hit this in bz2222899