1738498 – Collecting /var/lib/qpidd in Procedures::Backup::ConfigFiles can cause an incoherent backup is created

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1738498 - Collecting /var/lib/qpidd in Procedures::Backup::ConfigFiles can cause an incoherent backup is created

Summary: Collecting /var/lib/qpidd in Procedures::Backup::ConfigFiles can cause an inc...

Keywords:
Status:	CLOSED DUPLICATE of bug 1673908
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Satellite Maintain
Sub Component:
Version:	6.5.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	Unspecified
Assignee:	Amit Upadhye
QA Contact:	Lucie Vrtelova
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-07 10:33 UTC by Pavel Moravec
Modified:	2023-09-07 20:22 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-04-14 11:12:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Hotfix RPM (119.74 KB, application/x-rpm) 2019-09-24 22:10 UTC, wclark	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	theforeman foreman_maintain pull 248	0	'None'	closed	Fixes #26610 - offline backup config file fix	2020-11-10 05:59:02 UTC

Description Pavel Moravec 2019-08-07 10:33:08 UTC

Description of problem:
Just theoretical use case / scenario, but I can come up with particular reproducer (esp. for QE).

Assume foreman-maintain backup (either online or offline) happens when qpidd is changing a content of its durable queue, as triggered by activity like:
- a new Content Host is (un)registered / new pulp.agent.* queue is being created/deleted
- a pulp task is created or changed status (so resource_manager or reserved_resource_worker-* queues change content)
- a candlepin event is received from candlepin or consumed by LOCE task
- few other activities affecting pulp.task or celery queues

There is a concurrency bug as follows:
- foreman-maintain executes Procedures::Backup::ConfigFiles at very early stage, causing /var/lib/qpidd (denoted as a part of pulp config_files) is archived
- now, some activity described in previous paragraph happens, causing /var/lib/qpidd changes its content
- even now, Satellite is put to maintenance mode and services stopped

IMHO /var/lib/qpidd should be collected at the same stage like /var/lib/pulp (BUT in either case, even with --skip-pulp-content, "just" at that stage of backup process). Since /var/lib/qpidd is not a static congifuration but varying data that are worth to be collected while services are stopped.

(this BZ is applicable even after https://bugzilla.redhat.com/show_bug.cgi?id=1673908 is fixed, sadly I realize this scenario even now - it is possible the codefix for bz1673908 will become redundant after this fix :( )

Version-Release number of selected component (if applicable):
Sat6.5

How reproducible:
??? with some probability

Steps to Reproduce:
1. Register many content hosts and start goferd on them concurrently, or create many pulp tasks concurrently
2. Meantime, call foreman-maintain backup (online or offline, doesnt matter)
3. Once the backup stops services, stop the activity from 1.
4. Once backup completes, compare content of backed-up /var/lib/qpidd with real /var/lib/qpidd

Actual results:
4. shows difference (while comparison of e.g. postgres data shows no diff). That could mean an incoherent backup has been created.

Expected results:
4. to show no diff

Additional info:
The incoherent backup might not matter but it also can matter. E.g. a pulp task can be lost, candlepin event can be lost, or pulp.agent.* queue can be orphaned or oppositely not created.

In all such cases, there is a workaround (trigger new pulp task, katello:reimport, align pulp.agent.* queues per DB (there is KCS for that), so the current behaviour is not fatal. It just prevents identifying and workarounding those issues when recovering from an incoherent backup.

Comment 5 wclark 2019-09-24 22:10:11 UTC

Created attachment 1618746 [details]
Hotfix RPM

Hotfix RPM is created, see above attachment.

Installation instructions:

# rpm -Uvh rubygem-foreman_maintain-0.3.5-3.HOTFIXRHBZ1673908.el7sat.noarch.rpm

This hotfix resolves BZ1673908 as well.

Comment 8 Amit Upadhye 2021-04-14 11:12:19 UTC


*** This bug has been marked as a duplicate of bug 1673908 ***

Note You need to log in before you can comment on or make changes to this bug.