Bug 1843585
| Summary: | In case of NETFS_KEEP_OLD_BACKUP_COPY=1 and a failure happens during the backup, the old backup is removed also | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Welterlen Benoit <bwelterl> | |
| Component: | rear | Assignee: | Pavel Cahyna <pcahyna> | |
| Status: | CLOSED ERRATA | QA Contact: | David Jež <djez> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 7.9 | CC: | amkulkar, djez, fkrska, jreznik, ovasik, pcahyna, qe-baseos-apps | |
| Target Milestone: | rc | Keywords: | Reopened, Triaged, ZStream | |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
|
| Hardware: | Unspecified | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | rear-2.4-14.el7_9 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1958247 (view as bug list) | Environment: | ||
| Last Closed: | 2022-01-11 17:36:05 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1958247 | |||
| Bug Blocks: | ||||
Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7. From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. From the RHEL life cycle page: https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase "During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available." If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes: https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns. [0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7 Apologies for the inadvertent closure. Thank you for the detailed report. After some effort, I was able to reproduce the problem. The window for the problem is quite narrow, as the output directory gets mounted and unmounted several times in the process (prep, output, backup) and only the output stage suffers from the problem, and it usually is quite short (unless the rescue ISO is unusually big). backup and prep stages do not suffer from the problem, as there are multiple copies of the cleanup code and only one of them is problematic. But when the window is hit, it leads to a serious loss of backup data indeed. The problem was introduced in ReaR 1.18 with this commit: https://github.com/rear/rear/commit/4e9c2a1b05f87762fb06355cf959b24eacc21f62, see the discussion in https://github.com/rear/rear/pull/782. I will clone the bug to RHEL 8 as well. Note that the bug is a bit more serious than the description that reads "If we set NETFS_KEEP_OLD_BACKUP_COPY=1 ... rm -rf all the content of outputfs if the umount does not happen, including the old backup" suggests. The outputfs can contain more valuable data than the old backup: as noted in upstream issue 465, data from other machines sharing the same NFS directory will be removed as well (and ReaR encourages this usage, as it places its backups under directories named after the host that created them). This will be a problem even if NETFS_KEEP_OLD_BACKUP_COPY is not set. Pull request fixing this problem posted for upstream review, awaiting reply. As noted in https://github.com/rear/rear/issues/2611#issuecomment-854916232, the issue is actually more serious than the description "If we set NETFS_KEEP_OLD_BACKUP_COPY=1 and an error/interruption after output/default/100_mount_output_path.sh occurs, the exit task will rm -Rf all the content of outputfs if the umount does not happen, including the old backup" suggests. It would seem that two conditions must be met for the error to happen: error/interruption during the output stage and umount not happening. Actually, if the umount command fails (for example because the mount point is in use), it satisfies both conditions: umount will not happen and it will be caught as an error leading to program termination and the bug. So the bug is actually more likely than it seems. (Also, as mentioned in the previous comment, it concerns backups from other machines if they share the same NFS directory, not just old backups from the same machine.) Upstream review finished, PR merged. Will work on backporting the changes and preparing a Zstream update. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (rear bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0069 |
Description of problem: If we set NETFS_KEEP_OLD_BACKUP_COPY=1 and an error/interruption after output/default/100_mount_output_path.sh occurs, the exit task will rm -rF all the content of outputfs if the umount does not happen, including the old backup. Version-Release number of selected component (if applicable): rear-2.4-10.el7_7.x86_64 but also in RHEL 8 and community How reproducible: Always Steps to Reproduce: 1. Configure NETFS_KEEP_OLD_BACKUP_COPY=1 2. If the is an error/interruption after output/default/100_mount_output_path.sh 3. If the umount of outputfs fails => all the content of outputfs will be removed, including the old backup that should be kept Actual results: ... 2020-06-02 23:15:43.320857512 Running exit tasks +++ Print 'Running exit tasks' +++ local exit_task= +++ for exit_task in '"${EXIT_TASKS[@]}"' +++ Debug 'Exit task '\''umount -f -v '\''/tmp/rear.YRMrdIdLjweC2ZG/outputfs'\'' >&2'\''' +++ test 1 +++ Log 'Exit task '\''umount -f -v '\''/tmp/rear.YRMrdIdLjweC2ZG/outputfs'\'' >&2'\''' +++ echo '2020-06-02 23:15:43.324236704 Exit task '\''umount -f -v '\''/tmp/rear.YRMrdIdLjweC2ZG/outputfs'\'' >&2'\''' 2020-06-02 23:15:43.324236704 Exit task 'umount -f -v '/tmp/rear.YRMrdIdLjweC2ZG/outputfs' >&2' +++ eval 'umount -f -v '\''/tmp/rear.YRMrdIdLjweC2ZG/outputfs'\'' >&2' ++++ umount -f -v /tmp/rear.YRMrdIdLjweC2ZG/outputfs umount.nfs4: /tmp/rear.YRMrdIdLjweC2ZG/outputfs: device is busy /tmp/rear.YRMrdIdLjweC2ZG/outputfs: nfs4 mount point detected /tmp/rear.YRMrdIdLjweC2ZG/outputfs: umount failed +++ for exit_task in '"${EXIT_TASKS[@]}"' +++ Debug 'Exit task '\''rm -Rf -v /tmp/rear.YRMrdIdLjweC2ZG/outputfs >&2'\''' +++ test 1 +++ Log 'Exit task '\''rm -Rf -v /tmp/rear.YRMrdIdLjweC2ZG/outputfs >&2'\''' +++ echo '2020-06-02 23:15:43.333794357 Exit task '\''rm -Rf -v /tmp/rear.YRMrdIdLjweC2ZG/outputfs >&2'\''' 2020-06-02 23:15:43.333794357 Exit task 'rm -Rf -v /tmp/rear.YRMrdIdLjweC2ZG/outputfs >&2' +++ eval 'rm -Rf -v /tmp/rear.YRMrdIdLjweC2ZG/outputfs >&2' ++++ rm -Rf -v /tmp/rear.YRMrdIdLjweC2ZG/outputfs removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76/VERSION' removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76/backup.log' removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76/rear-rhel76.iso' removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76/selinux.autorelabel' removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76/backup.tar.gz' removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76/README' removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76/rear-rhel76.log' removed directory: '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76' removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76.old/VERSION' removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76.old/backup.log' removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76.old/rear-rhel76.iso' removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76.old/selinux.autorelabel' removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76.old/backup.tar.gz' removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76.old/README' removed '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76.old/rear-rhel76.log' removed directory: '/tmp/rear.YRMrdIdLjweC2ZG/outputfs/rhel76.old' rm: cannot remove '/tmp/rear.YRMrdIdLjweC2ZG/outputfs': Device or resource busy ... Expected results: Only the new partial content should be removed, not the old backup. The "rm -Rf" exit task should be added later or concern only a subdirectory where the new stuff is copied. Additional info: Also checked with the github version