Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1738498

Summary: Collecting /var/lib/qpidd in Procedures::Backup::ConfigFiles can cause an incoherent backup is created
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: Satellite MaintainAssignee: Amit Upadhye <aupadhye>
Status: CLOSED DUPLICATE QA Contact: Lucie Vrtelova <lvrtelov>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.5.0CC: apatel, aupadhye, jpathan, kgaikwad, ofalk, wclark
Target Milestone: UnspecifiedKeywords: Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-14 11:12:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Hotfix RPM none

Description Pavel Moravec 2019-08-07 10:33:08 UTC
Description of problem:
Just theoretical use case / scenario, but I can come up with particular reproducer (esp. for QE).

Assume foreman-maintain backup (either online or offline) happens when qpidd is changing a content of its durable queue, as triggered by activity like:
- a new Content Host is (un)registered / new pulp.agent.* queue is being created/deleted
- a pulp task is created or changed status (so resource_manager or reserved_resource_worker-* queues change content)
- a candlepin event is received from candlepin or consumed by LOCE task
- few other activities affecting pulp.task or celery queues

There is a concurrency bug as follows:
- foreman-maintain executes Procedures::Backup::ConfigFiles at very early stage, causing /var/lib/qpidd (denoted as a part of pulp config_files) is archived
- now, some activity described in previous paragraph happens, causing /var/lib/qpidd changes its content
- even now, Satellite is put to maintenance mode and services stopped

IMHO /var/lib/qpidd should be collected at the same stage like /var/lib/pulp (BUT in either case, even with --skip-pulp-content, "just" at that stage of backup process). Since /var/lib/qpidd is not a static congifuration but varying data that are worth to be collected while services are stopped.

(this BZ is applicable even after https://bugzilla.redhat.com/show_bug.cgi?id=1673908 is fixed, sadly I realize this scenario even now - it is possible the codefix for bz1673908 will become redundant after this fix :( )


Version-Release number of selected component (if applicable):
Sat6.5


How reproducible:
??? with some probability


Steps to Reproduce:
1. Register many content hosts and start goferd on them concurrently, or create many pulp tasks concurrently
2. Meantime, call foreman-maintain backup (online or offline, doesnt matter)
3. Once the backup stops services, stop the activity from 1.
4. Once backup completes, compare content of backed-up /var/lib/qpidd with real /var/lib/qpidd


Actual results:
4. shows difference (while comparison of e.g. postgres data shows no diff). That  could mean an incoherent backup has been created.


Expected results:
4. to show no diff


Additional info:
The incoherent backup might not matter but it also can matter. E.g. a pulp task can be lost, candlepin event can be lost, or pulp.agent.* queue can be orphaned or oppositely not created.

In all such cases, there is a workaround (trigger new pulp task, katello:reimport, align pulp.agent.* queues per DB (there is KCS for that), so the current behaviour is not fatal. It just prevents identifying and workarounding those issues when recovering from an incoherent backup.

Comment 5 wclark 2019-09-24 22:10:11 UTC
Created attachment 1618746 [details]
Hotfix RPM

Hotfix RPM is created, see above attachment.

Installation instructions:

# rpm -Uvh rubygem-foreman_maintain-0.3.5-3.HOTFIXRHBZ1673908.el7sat.noarch.rpm

This hotfix resolves BZ1673908 as well.

Comment 8 Amit Upadhye 2021-04-14 11:12:19 UTC

*** This bug has been marked as a duplicate of bug 1673908 ***