Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1673908 - foreman-maintain backup online fails on backup-config-files under Satellite load
Summary: foreman-maintain backup online fails on backup-config-files under Satellite load
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Satellite Maintain
Version: 6.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: 6.7.0
Assignee: Amit Upadhye
QA Contact: Vladimír Sedmík
URL:
Whiteboard:
: 1738498 (view as bug list)
Depends On:
Blocks: 1122832
TreeView+ depends on / blocked
 
Reported: 2019-02-08 12:32 UTC by Pavel Moravec
Modified: 2024-12-20 18:48 UTC (History)
34 users (show)

Fixed In Version: rubygem-foreman_maintain-0.5.3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1756046 (view as bug list)
Environment:
Last Closed: 2020-04-28 14:05:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Hotfix RPM (119.74 KB, application/x-rpm)
2019-09-24 22:05 UTC, wclark
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 26017 0 Normal Closed foreman-maintain backup online fails on backup-config-files under Satellite load 2021-01-29 18:49:21 UTC
Github theforeman foreman_maintain pull 248 0 'None' closed Fixes #26610 - offline backup config file fix 2021-01-29 18:49:21 UTC
Github theforeman foreman_maintain pull 253/files 0 None None None 2020-10-05 14:03:14 UTC
Red Hat Knowledge Base (Solution) 3912031 0 Supportability None On a busy Red Hat Satellite, online backup failes with errors due to changing files 2019-05-22 19:57:20 UTC

Description Pavel Moravec 2019-02-08 12:32:59 UTC
Description of problem:
When Satellite is under a load, foreman-maintain backup online fails in collecting config files. It is due to the fact that "config" contains also files/dirs changing over time, like:

/var/lib/candlepin/hornetq/journal/ (on 6.3 and older)
/var/lib/candlepin/activemq-artemis/journal/ (on 6.4 and newer)
/var/lib/qpidd/.qpidd/qls/jrnl2/

The first pair is due to candlepin sending events to qpidd and temporarily storing them in hornetQ/ActiveMQ broker internally before the send.

The later changes its content when either below action triggers a new journal file is used or returned back to empty file pool:
- katello_event_queue gets a message from candlepin, or LOCE fetches a message from it
- many *resource* queues get updated when pulp tasks get progressed
- pulp.agent.* queues get updated when an katello-agent task is created or applied

Neither of those possibly corrupted data is essential for building in-house reproducer, but they can be beneficial to know for troubleshooting.

Ideally, foreman-maintain shall get over such issue just with warning (by default), not failing.


Version-Release number of selected component (if applicable):
rubygem-foreman_maintain-0.2.11-1


How reproducible:
100% within some time


Steps to Reproduce:
1. Generate heavier load of candlepin events (re-register systems frequently, attach/detach subscriptions etc.)
2. Or generate more pulp tasks (frequent repo sync that will be no-op at the end, CV publishing new version without a change,..)
3. foreman-maintain backup online -y /tmp/satellite-backup
(call it more times)


Actual results:
backup fails with errors like:

tar: /var/lib/candlepin/hornetq/journal/hornetq-data-497608.hq: file changed as we read it
tar: /var/lib/qpidd/.qpidd/qls/jrnl2/katello_event_queue/c2bc8b9e-8155-4b69-87f2-8f6a61df06b3.jrnl: File removed before we read it
tar: /var/lib/qpidd/.qpidd/qls/jrnl2/katello_event_queue/c2c98b9c-1027-4bb0-8487-18e05f615218.jrnl: File removed before we read it
tar: /var/lib/qpidd/.qpidd/qls/jrnl2/pulp.agent.04c39f77-df2c-4e08-9d61-d0825dbd14d8: Warning: Cannot open: No such file or directory
tar: /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/9e1b017b-ddad-41aa-a070-cacc908c4c7e.jrnl: File removed before we read it
tar: /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/bdc9192a-02bf-4234-92bb-eaafb707c2b8.jrnl: File removed before we read it
tar: /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/c0db25dd-2217-498d-b001-b9364a191e6a.jrnl: File removed before we read it
tar: /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/c3791d4a-c0cf-4a10-8020-afc476d34d98.jrnl: File removed before we read it
tar: /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/c3c08240-b57e-4603-a1bf-dff69179a8a2.jrnl: File removed before we read it
tar: /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/c43ef88f-a800-456f-b754-5d6650d49e94.jrnl: File removed before we read it
tar: /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/dd16a6c9-5692-4800-bddf-9230ba4900d7.jrnl: File removed before we read it
tar: /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/f9c23c90-f6e1-4392-b17d-81eeff45cc24.jrnl: File removed before we read it
tar: /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/in_use/ba6a2877-df47-4dee-bae1-e021912c2c75.jrnl: File removed before we read it
tar: /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/in_use/bc237c99-61d0-485b-a225-93a7817fd50d.jrnl: File removed before we read it
tar: /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/in_use/c0695e17-f134-49e3-bc1a-1394580d2d4e.jrnl: File removed before we read it
tar: /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/in_use/c2bc8b9e-8155-4b69-87f2-8f6a61df06b3.jrnl: File removed before we read it
tar: /var/lib/qpidd/.qpidd/qls/p001/efp/2048k/in_use/c2c98b9c-1027-4bb0-8487-18e05f615218.jrnl: File removed before we read it

or similar


Expected results:
Default behaviour should be posting such info and a WARNING, but continuing in backup. And let user to decide if that warnings matter.


Additional info:
implementing bz 1673797 is kind of workaround here, in some situations

Comment 3 Pavel Moravec 2019-02-08 12:42:46 UTC
Just for the record, using --whitelist="backup-config-files" option will fully skip collecting config_files.tar.gz, so that is no workaround here.

Comment 4 Martin Bacovsky 2019-02-08 16:09:08 UTC
I was looking at possible ways on how to implement this. We can either let pass backup of configs with exit code <= 1. Or let tar skip these changing directories like /var/lib/qpidd/.qpidd/qls and /var/lib/candlepin/hornetq/journal.  
For online backup (only) and config files it seems to be safe to not to fail on tar exit_code 1 as we already use '--ignore-failed-read'. Excluding the changing directories may be error prone as it may be tricky not to miss any. 
We can detect and print warning that some file changed during the backup. The file list could be captured in log. I need to check if the last foreman-maintain log is a part of the archive. 

---
Possible exit codes of GNU tar are summarized in the following table:

0
`Successful termination'.

1
`Some files differ'. If tar was invoked with `--compare' (`--diff', `-d') command line option, this means that some files in the archive differ from their disk counterparts (see section Comparing Archive Members with the File System). If tar was given `--create', `--append' or `--update' option, this exit code means that some files were changed while being archived and so the resulting archive does not contain the exact copy of the file set.

2
`Fatal error'. This means that some fatal, unrecoverable error occurred.
---

Comment 5 Martin Bacovsky 2019-02-08 16:10:31 UTC
Created redmine issue https://projects.theforeman.org/issues/26017 from this bug

Comment 6 lukasz.olszak 2019-02-11 12:29:12 UTC
As far as I know /var/lib/qpidd/ is important for full Satellite restore, so it doesn't make sense to skip it or produce just a warning when later on backup is not useful.

Comment 8 Pavel Moravec 2019-03-29 13:40:18 UTC
(In reply to Martin Bacovsky from comment #4)
> I was looking at possible ways on how to implement this. We can either let
> pass backup of configs with exit code <= 1. Or let tar skip these changing
> directories like /var/lib/qpidd/.qpidd/qls and
> /var/lib/candlepin/hornetq/journal.  
> For online backup (only) and config files it seems to be safe to not to fail
> on tar exit_code 1 as we already use '--ignore-failed-read'. Excluding the
> changing directories may be error prone as it may be tricky not to miss any. 
> We can detect and print warning that some file changed during the backup.
> The file list could be captured in log. I need to check if the last
> foreman-maintain log is a part of the archive. 
> 
> ---
> Possible exit codes of GNU tar are summarized in the following table:
> 
> 0
> `Successful termination'.
> 
> 1
> `Some files differ'. If tar was invoked with `--compare' (`--diff', `-d')
> command line option, this means that some files in the archive differ from
> their disk counterparts (see section Comparing Archive Members with the File
> System). If tar was given `--create', `--append' or `--update' option, this
> exit code means that some files were changed while being archived and so the
> resulting archive does not contain the exact copy of the file set.

Nice idea: so a patch like:

--- a/definitions/features/tar.rb
+++ b/definitions/features/tar.rb
@@ -56,7 +56,7 @@ class Features::Tar < ForemanMaintain::F
       tar_command << options.fetch(:files, '*')
     end
 
-    execute!(tar_command.join(' '))
+    execute!(tar_command.join(' '), :valid_exit_statuses => [0, 1])
   end
   # rubocop:enable Metrics/AbcSize, Metrics/MethodLength
 
Shall work..?

Testing it..

Comment 9 Pavel Moravec 2019-03-29 18:36:29 UTC
(In reply to Pavel Moravec from comment #8)
> (In reply to Martin Bacovsky from comment #4)
> > I was looking at possible ways on how to implement this. We can either let
> > pass backup of configs with exit code <= 1. Or let tar skip these changing
> > directories like /var/lib/qpidd/.qpidd/qls and
> > /var/lib/candlepin/hornetq/journal.  
> > For online backup (only) and config files it seems to be safe to not to fail
> > on tar exit_code 1 as we already use '--ignore-failed-read'. Excluding the
> > changing directories may be error prone as it may be tricky not to miss any. 
> > We can detect and print warning that some file changed during the backup.
> > The file list could be captured in log. I need to check if the last
> > foreman-maintain log is a part of the archive. 
> > 
> > ---
> > Possible exit codes of GNU tar are summarized in the following table:
> > 
> > 0
> > `Successful termination'.
> > 
> > 1
> > `Some files differ'. If tar was invoked with `--compare' (`--diff', `-d')
> > command line option, this means that some files in the archive differ from
> > their disk counterparts (see section Comparing Archive Members with the File
> > System). If tar was given `--create', `--append' or `--update' option, this
> > exit code means that some files were changed while being archived and so the
> > resulting archive does not contain the exact copy of the file set.
> 
> Nice idea: so a patch like:
> 
> --- a/definitions/features/tar.rb
> +++ b/definitions/features/tar.rb
> @@ -56,7 +56,7 @@ class Features::Tar < ForemanMaintain::F
>        tar_command << options.fetch(:files, '*')
>      end
>  
> -    execute!(tar_command.join(' '))
> +    execute!(tar_command.join(' '), :valid_exit_statuses => [0, 1])
>    end
>    # rubocop:enable Metrics/AbcSize, Metrics/MethodLength
>  
> Shall work..?
> 
> Testing it..

OK, different place for that option:

--- a/definitions/procedures/backup/config_files.rb
+++ b/definitions/procedures/backup/config_files.rb
@@ -20,7 +20,7 @@ module Procedures::Backup
         configs = config_files.join(' ')
         execute!("tar --selinux --create --gzip --file=#{tarball} " \
           "--listed-incremental=#{increments} --ignore-failed-read " \
-          "#{configs}")
+          "#{configs}", :valid_exit_statuses => [0, 1])
       end
     end



This works!

Comment 13 Pavel Moravec 2019-05-17 09:52:32 UTC
So the bug happens also for offline backup, indeed. Since Stopping services really happens after collecting the "config" files, under what term we collect also some /var files (for qpidd but also for candlepin). The solution is to stop services first, and then collect the files.

That is already fixed in upstream via https://github.com/theforeman/foreman_maintain/pull/248 .

Comment 14 Bryan Kearney 2019-05-18 00:02:51 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/26017 has been resolved.

Comment 17 Peter Vreman 2019-07-25 13:55:17 UTC
I have hit the same problem and applying the recommended whitelisting i noticed the skipping. For this skipping i created a dedicated BZ https://bugzilla.redhat.com/show_bug.cgi?id=1733239

Comment 18 Pavel Moravec 2019-08-07 10:34:25 UTC
See also relevant https://bugzilla.redhat.com/show_bug.cgi?id=1738498 .

Comment 22 wclark 2019-09-24 17:48:05 UTC
The previously attached PR resolves an issue with offline backups.

To resolve this issue with online backups, we need: https://github.com/theforeman/foreman_maintain/pull/253

Comment 23 wclark 2019-09-24 22:05:50 UTC
Created attachment 1618745 [details]
Hotfix RPM

Attached hotfix RPM

Comment 24 wclark 2019-09-24 22:07:32 UTC
Hotfix RPM is created, see above attachment.

Installation instructions:

# rpm -Uvh rubygem-foreman_maintain-0.3.5-3.HOTFIXRHBZ1673908.el7sat.noarch.rpm

This hotfix resolves BZ1738498 as well.

Comment 34 Bryan Kearney 2019-10-21 14:03:27 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/26017 has been resolved.

Comment 36 Bryan Kearney 2019-10-21 16:02:57 UTC
Moving this bug to POST for triage into Satellite 6 since the upstream issue https://projects.theforeman.org/issues/26017 has been resolved.

Comment 54 patalber 2020-02-12 21:40:30 UTC
Hello Team,

My customer would like to know if this can be resolved soon:

"Hello Team,

We still wait for a fix for this ticket.
Our backups are failing.
If there is no way to tell tar to ignore the /var/lib/qpidd/ directory when files are removed during backups, why you do not :
 - ask for a dedicated filesystem on this directory
 - so that a LVM snapshot is done on this FS as for the other SAT FS's 
when services are stopped

Please escalate if needed."

I wanted to present this to you from the customer. Thank you.

--Patrick

Comment 58 Pavel Moravec 2020-03-03 11:38:45 UTC
Just to ensure what needs to be changed: Both directories:

/var/lib/qpidd
/var/lib/candlepin/activemq-artemis

should be skipped during backup (while /var/lib/candlepin/c* files shall be backed up). Since the later directory is affected the same way and it is used in Satellite just for the same purposes like katello_event_queue .

Comment 63 Bryan Kearney 2020-03-11 06:03:28 UTC
Upstream bug assigned to apatel

Comment 66 Bryan Kearney 2020-04-14 13:39:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1454

Comment 67 Blair Aitken 2020-04-28 13:51:01 UTC
This bug is not mentioned in the RHSA release. Can anyone confirm this is actually fixed in 6.7?

Comment 68 Mike McCune 2020-04-28 14:05:07 UTC
See the comment from QE above:

https://bugzilla.redhat.com/show_bug.cgi?id=1673908#c65

this is shipped in rubygem-foreman_maintain-0.5.3-1.el7sat.noarch which is available for all Satellite customers.

Comment 69 Amit Upadhye 2021-04-14 11:12:19 UTC
*** Bug 1738498 has been marked as a duplicate of this bug. ***

Comment 70 Red Hat Bugzilla 2024-04-14 04:25:09 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.