Bug 1673908
| Summary: | foreman-maintain backup online fails on backup-config-files under Satellite load | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Pavel Moravec <pmoravec> | ||||
| Component: | Satellite Maintain | Assignee: | Amit Upadhye <aupadhye> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Vladimír Sedmík <vsedmik> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 6.4 | CC: | ahumbe, akapse, apatel, arahaman, aupadhye, baitken, bkearney, cdonnell, dgross, ehelms, gscarbor, hyu, inecas, janarula, kagarwal, kgaikwad, ktordeur, lukasz.olszak, mbacovsk, mmccune, momran, mshimura, ofalk, patalber, pcreech, peter.vreman, riehecky, rvdwees, tonay, vgunasek, vhernand, vsedmik, wclark, wpinheir | ||||
| Target Milestone: | 6.7.0 | Keywords: | PrioBumpField, PrioBumpGSS, Reopened, Triaged | ||||
| Target Release: | Unused | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | rubygem-foreman_maintain-0.5.3 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1756046 (view as bug list) | Environment: | |||||
| Last Closed: | 2020-04-28 14:05:07 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1122832 | ||||||
| Attachments: |
|
||||||
|
Description
Pavel Moravec
2019-02-08 12:32:59 UTC
Just for the record, using --whitelist="backup-config-files" option will fully skip collecting config_files.tar.gz, so that is no workaround here. I was looking at possible ways on how to implement this. We can either let pass backup of configs with exit code <= 1. Or let tar skip these changing directories like /var/lib/qpidd/.qpidd/qls and /var/lib/candlepin/hornetq/journal. For online backup (only) and config files it seems to be safe to not to fail on tar exit_code 1 as we already use '--ignore-failed-read'. Excluding the changing directories may be error prone as it may be tricky not to miss any. We can detect and print warning that some file changed during the backup. The file list could be captured in log. I need to check if the last foreman-maintain log is a part of the archive. --- Possible exit codes of GNU tar are summarized in the following table: 0 `Successful termination'. 1 `Some files differ'. If tar was invoked with `--compare' (`--diff', `-d') command line option, this means that some files in the archive differ from their disk counterparts (see section Comparing Archive Members with the File System). If tar was given `--create', `--append' or `--update' option, this exit code means that some files were changed while being archived and so the resulting archive does not contain the exact copy of the file set. 2 `Fatal error'. This means that some fatal, unrecoverable error occurred. --- Created redmine issue https://projects.theforeman.org/issues/26017 from this bug As far as I know /var/lib/qpidd/ is important for full Satellite restore, so it doesn't make sense to skip it or produce just a warning when later on backup is not useful. (In reply to Martin Bacovsky from comment #4) > I was looking at possible ways on how to implement this. We can either let > pass backup of configs with exit code <= 1. Or let tar skip these changing > directories like /var/lib/qpidd/.qpidd/qls and > /var/lib/candlepin/hornetq/journal. > For online backup (only) and config files it seems to be safe to not to fail > on tar exit_code 1 as we already use '--ignore-failed-read'. Excluding the > changing directories may be error prone as it may be tricky not to miss any. > We can detect and print warning that some file changed during the backup. > The file list could be captured in log. I need to check if the last > foreman-maintain log is a part of the archive. > > --- > Possible exit codes of GNU tar are summarized in the following table: > > 0 > `Successful termination'. > > 1 > `Some files differ'. If tar was invoked with `--compare' (`--diff', `-d') > command line option, this means that some files in the archive differ from > their disk counterparts (see section Comparing Archive Members with the File > System). If tar was given `--create', `--append' or `--update' option, this > exit code means that some files were changed while being archived and so the > resulting archive does not contain the exact copy of the file set. Nice idea: so a patch like: --- a/definitions/features/tar.rb +++ b/definitions/features/tar.rb @@ -56,7 +56,7 @@ class Features::Tar < ForemanMaintain::F tar_command << options.fetch(:files, '*') end - execute!(tar_command.join(' ')) + execute!(tar_command.join(' '), :valid_exit_statuses => [0, 1]) end # rubocop:enable Metrics/AbcSize, Metrics/MethodLength Shall work..? Testing it.. (In reply to Pavel Moravec from comment #8) > (In reply to Martin Bacovsky from comment #4) > > I was looking at possible ways on how to implement this. We can either let > > pass backup of configs with exit code <= 1. Or let tar skip these changing > > directories like /var/lib/qpidd/.qpidd/qls and > > /var/lib/candlepin/hornetq/journal. > > For online backup (only) and config files it seems to be safe to not to fail > > on tar exit_code 1 as we already use '--ignore-failed-read'. Excluding the > > changing directories may be error prone as it may be tricky not to miss any. > > We can detect and print warning that some file changed during the backup. > > The file list could be captured in log. I need to check if the last > > foreman-maintain log is a part of the archive. > > > > --- > > Possible exit codes of GNU tar are summarized in the following table: > > > > 0 > > `Successful termination'. > > > > 1 > > `Some files differ'. If tar was invoked with `--compare' (`--diff', `-d') > > command line option, this means that some files in the archive differ from > > their disk counterparts (see section Comparing Archive Members with the File > > System). If tar was given `--create', `--append' or `--update' option, this > > exit code means that some files were changed while being archived and so the > > resulting archive does not contain the exact copy of the file set. > > Nice idea: so a patch like: > > --- a/definitions/features/tar.rb > +++ b/definitions/features/tar.rb > @@ -56,7 +56,7 @@ class Features::Tar < ForemanMaintain::F > tar_command << options.fetch(:files, '*') > end > > - execute!(tar_command.join(' ')) > + execute!(tar_command.join(' '), :valid_exit_statuses => [0, 1]) > end > # rubocop:enable Metrics/AbcSize, Metrics/MethodLength > > Shall work..? > > Testing it.. OK, different place for that option: --- a/definitions/procedures/backup/config_files.rb +++ b/definitions/procedures/backup/config_files.rb @@ -20,7 +20,7 @@ module Procedures::Backup configs = config_files.join(' ') execute!("tar --selinux --create --gzip --file=#{tarball} " \ "--listed-incremental=#{increments} --ignore-failed-read " \ - "#{configs}") + "#{configs}", :valid_exit_statuses => [0, 1]) end end This works! So the bug happens also for offline backup, indeed. Since Stopping services really happens after collecting the "config" files, under what term we collect also some /var files (for qpidd but also for candlepin). The solution is to stop services first, and then collect the files. That is already fixed in upstream via https://github.com/theforeman/foreman_maintain/pull/248 . Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/26017 has been resolved. I have hit the same problem and applying the recommended whitelisting i noticed the skipping. For this skipping i created a dedicated BZ https://bugzilla.redhat.com/show_bug.cgi?id=1733239 See also relevant https://bugzilla.redhat.com/show_bug.cgi?id=1738498 . The previously attached PR resolves an issue with offline backups. To resolve this issue with online backups, we need: https://github.com/theforeman/foreman_maintain/pull/253 Created attachment 1618745 [details]
Hotfix RPM
Attached hotfix RPM
Hotfix RPM is created, see above attachment. Installation instructions: # rpm -Uvh rubygem-foreman_maintain-0.3.5-3.HOTFIXRHBZ1673908.el7sat.noarch.rpm This hotfix resolves BZ1738498 as well. Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/26017 has been resolved. Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/26017 has been resolved. Hello Team, My customer would like to know if this can be resolved soon: "Hello Team, We still wait for a fix for this ticket. Our backups are failing. If there is no way to tell tar to ignore the /var/lib/qpidd/ directory when files are removed during backups, why you do not : - ask for a dedicated filesystem on this directory - so that a LVM snapshot is done on this FS as for the other SAT FS's when services are stopped Please escalate if needed." I wanted to present this to you from the customer. Thank you. --Patrick Just to ensure what needs to be changed: Both directories: /var/lib/qpidd /var/lib/candlepin/activemq-artemis should be skipped during backup (while /var/lib/candlepin/c* files shall be backed up). Since the later directory is affected the same way and it is used in Satellite just for the same purposes like katello_event_queue . Upstream bug assigned to apatel Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:1454 This bug is not mentioned in the RHSA release. Can anyone confirm this is actually fixed in 6.7? See the comment from QE above: https://bugzilla.redhat.com/show_bug.cgi?id=1673908#c65 this is shipped in rubygem-foreman_maintain-0.5.3-1.el7sat.noarch which is available for all Satellite customers. *** Bug 1738498 has been marked as a duplicate of this bug. *** The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |