1843585 – In case of NETFS_KEEP_OLD_BACKUP_COPY=1 and a failure happens during the backup, the old backup is removed also

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1843585 - In case of NETFS_KEEP_OLD_BACKUP_COPY=1 and a failure happens during the backup, the old backup is removed also

Summary: In case of NETFS_KEEP_OLD_BACKUP_COPY=1 and a failure happens during the back...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	rear
Sub Component:
Version:	7.9
Hardware:	Unspecified
OS:	Linux
Priority:	urgent
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Pavel Cahyna
QA Contact:	David Jež
Docs Contact:
URL:
Whiteboard:
Depends On:	1958247
Blocks:
TreeView+	depends on / blocked

Reported:	2020-06-03 15:28 UTC by Welterlen Benoit
Modified:	2022-01-11 17:42 UTC (History)
CC List:	7 users (show)
Fixed In Version:	rear-2.4-14.el7_9
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1958247 (view as bug list)
Environment:
Last Closed:	2022-01-11 17:36:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	rear rear issues 2611	None	closed	ReaR configured with netfs backup can delete other/old folders on NFS at failure	2021-06-18 16:59:00 UTC
Github	rear rear issues 465	None	closed	rear configured with netfs backup deletes other folders on NFS at failure	2021-05-07 13:53:24 UTC
Github	rear rear pull 2625	None	closed	Fix backup removal in exit task and cleanup handling of URL mountpoints	2021-06-18 16:59:04 UTC
Red Hat Product Errata	RHBA-2022:0069	None	None	None	2022-01-11 17:36:13 UTC

Description Welterlen Benoit 2020-06-03 15:28:21 UTC

Description of problem:
If we set NETFS_KEEP_OLD_BACKUP_COPY=1 and an error/interruption after output/default/100_mount_output_path.sh occurs, the exit task will rm -rF all the content of outputfs if the umount does not happen, including the old backup.



Version-Release number of selected component (if applicable):
rear-2.4-10.el7_7.x86_64 but also in RHEL 8 and community 

How reproducible:
Always

Steps to Reproduce:
1. Configure NETFS_KEEP_OLD_BACKUP_COPY=1
2. If the is an error/interruption after output/default/100_mount_output_path.sh
3. If the umount of outputfs fails
=> all the content of outputfs will be removed, including the old backup that should be kept

Actual results:
...
2020-06-02 23:15:43.320857512 Running exit tasks
+++ Print 'Running exit tasks'
+++ local exit_task=
+++ for exit_task in '"${EXIT_TASKS[@]}"'
+++ Debug 'Exit task '\''umount -f -v '\''/tmp/rear.YRMrdIdLjweC2ZG/outputfs'\'' >&2'\'''
+++ test 1
+++ Log 'Exit task '\''umount -f -v '\''/tmp/rear.YRMrdIdLjweC2ZG/outputfs'\'' >&2'\'''
+++ echo '2020-06-02 23:15:43.324236704 Exit task '\''umount -f -v '\''/tmp/rear.YRMrdIdLjweC2ZG/outputfs'\'' >&2'\'''
2020-06-02 23:15:43.324236704 Exit task 'umount -f -v '/tmp/rear.YRMrdIdLjweC2ZG/outputfs' >&2'
+++ eval 'umount -f -v '\''/tmp/rear.YRMrdIdLjweC2ZG/outputfs'\'' >&2'
++++ umount -f -v /tmp/rear.YRMrdIdLjweC2ZG/outputfs
umount.nfs4: /tmp/rear.YRMrdIdLjweC2ZG/outputfs: device is busy
/tmp/rear.YRMrdIdLjweC2ZG/outputfs: nfs4 mount point detected
/tmp/rear.YRMrdIdLjweC2ZG/outputfs: umount failed
+++ for exit_task in '"${EXIT_TASKS[@]}"'
+++ Debug 'Exit task '\''rm -Rf -v /tmp/rear.YRMrdIdLjweC2ZG/outputfs >&2'\'''
+++ test 1
+++ Log 'Exit task '\''rm -Rf -v /tmp/rear.YRMrdIdLjweC2ZG/outputfs >&2'\'''
+++ echo '2020-06-02 23:15:43.333794357 Exit task '\''rm -Rf -v /tmp/rear.YRMrdIdLjweC2ZG/outputfs >&2'\'''
2020-06-02 23:15:43.333794357 Exit task 'rm -Rf -v /tmp/rear.YRMrdIdLjweC2ZG/outputfs >&2'
+++ eval 'rm -Rf -v /tmp/rear.YRMrdIdLjweC2ZG/outputfs >&2'
++++ rm -Rf -v /tmp/rear.YRMrdIdLjweC2ZG/outputfs
removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76/VERSION'
removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76/backup.log'
removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76/rear-rhel76.iso'
removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76/selinux.autorelabel'
removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76/backup.tar.gz'
removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76/README'
removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76/rear-rhel76.log'
removed directory: '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76'
removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76.old/VERSION'
removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76.old/backup.log'
removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76.old/rear-rhel76.iso'
removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76.old/selinux.autorelabel'
removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76.old/backup.tar.gz'
removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76.old/README'
removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76.old/rear-rhel76.log'
removed directory: '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76.old'
rm: cannot remove '/tmp/rear.YRMrdIdLjweC2ZG/outputfs': Device or resource busy
...

Expected results:
Only the new partial content should be removed, not the old backup.
The "rm -Rf" exit task should be added later or concern only a subdirectory where the new stuff is copied.

Additional info:
Also checked with the github version

Comment 4 Chris Williams 2020-11-11 21:49:35 UTC

Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7.
From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. 

From the RHEL life cycle page:
https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase
"During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available."

If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes:
https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook  

Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. 

Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns.  

[0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7

Comment 5 Chris Williams 2020-11-11 23:16:52 UTC

Apologies for the inadvertent closure.

Comment 8 Pavel Cahyna 2021-05-05 19:01:54 UTC

Thank you for the detailed report. After some effort, I was able to reproduce the problem. The window for the problem is quite narrow, as the output directory gets mounted and unmounted several times in the process (prep, output, backup) and only the output stage suffers from the problem, and it usually is quite short (unless the rescue ISO is unusually big). backup and prep stages do not suffer from the problem, as there are multiple copies of the cleanup code and only one of them is problematic. But when the window is hit, it leads to a serious loss of backup data indeed. The problem was introduced in ReaR 1.18 with this commit: https://github.com/rear/rear/commit/4e9c2a1b05f87762fb06355cf959b24eacc21f62, see the discussion in https://github.com/rear/rear/pull/782.

I will clone the bug to RHEL 8 as well.

Comment 9 Pavel Cahyna 2021-05-07 13:53:24 UTC

Note that the bug is a bit more serious than the description that reads "If we set NETFS_KEEP_OLD_BACKUP_COPY=1 ... rm -rf all the content of outputfs if the umount does not happen, including the old backup" suggests. The outputfs can contain more valuable data than the old backup: as noted in upstream issue 465, data from other machines sharing the same NFS directory will be removed as well (and ReaR encourages this usage, as it places its backups under directories named after the host that created them). This will be a problem even if NETFS_KEEP_OLD_BACKUP_COPY is not set.

Comment 10 Pavel Cahyna 2021-06-07 19:08:52 UTC

Pull request fixing this problem posted for upstream review, awaiting reply.

As noted in https://github.com/rear/rear/issues/2611#issuecomment-854916232, the issue is actually more serious than the description "If we set NETFS_KEEP_OLD_BACKUP_COPY=1 and an error/interruption after output/default/100_mount_output_path.sh occurs, the exit task will rm -Rf all the content of outputfs if the umount does not happen, including the old backup" suggests. It would seem that two conditions must be met for the error to happen: error/interruption during the output stage and umount not happening. Actually, if the umount command fails (for example because the mount point is in use), it satisfies both conditions: umount will not happen and it will be caught as an error leading to program termination and the bug. So the bug is actually more likely than it seems. (Also, as mentioned in the previous comment, it concerns backups from other machines if they share the same NFS directory, not just old backups from the same machine.)

Comment 11 Pavel Cahyna 2021-06-18 16:59:26 UTC

Upstream review finished, PR merged. Will work on backporting the changes and preparing a Zstream update.

Comment 22 errata-xmlrpc 2022-01-11 17:36:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (rear bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0069

Note You need to log in before you can comment on or make changes to this bug.