1326038 – [RFE] Run a health check of the capsule before configuration and syncing begins

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1326038 - [RFE] Run a health check of the capsule before configuration and syncing begins

Summary: [RFE] Run a health check of the capsule before configuration and syncing begins

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Content Management
Sub Component:
Version:	6.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	Unspecified
Assignee:	John Mitsch
QA Contact:	jcallaha
Docs Contact:
URL:	http://projects.theforeman.org/issues...
Whiteboard:
Depends On:
Blocks:	1327292
TreeView+	depends on / blocked

Reported:	2016-04-11 16:00 UTC by Bryan Kearney
Modified:	2019-09-25 21:18 UTC (History)
CC List:	5 users (show)
Fixed In Version:	rubygem-katello-3.0.0.78-1
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-09-15 13:58:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
dynflow (91.16 KB, image/png) 2016-09-09 23:39 UTC, jcallaha	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	13194	0	None	None	None	2016-04-22 16:30:18 UTC

Description Bryan Kearney 2016-04-11 16:00:14 UTC

Comment 1 Bryan Kearney 2016-04-11 16:00:16 UTC

Created from redmine issue http://projects.theforeman.org/issues/13194

Comment 2 Bryan Kearney 2016-04-11 16:00:18 UTC

Upstream bug assigned to jomitsch

Comment 3 Bryan Kearney 2016-04-11 18:11:04 UTC

Moving to POST since upstream bug http://projects.theforeman.org/issues/13194 has been closed
-------------
John Mitsch
Applied in changeset commit:katello|e09678cda14c8a0721f540a49d5af10b46acb45b.

Comment 5 John Mitsch 2016-05-18 19:01:48 UTC

Not having this incorporated in 6.2 is blocking a capsule sync. Can we merge this in 6.2 code?

Comment 6 John Mitsch 2016-05-18 19:02:22 UTC

Not having this incorporated in 6.2 is blocking a capsule sync. Can we merge this in 6.2 code?

Comment 7 John Mitsch 2016-05-18 20:26:04 UTC

To test this make sure capsules are syncing properly and when pulp is shut off on a capsule (systemctl stop pulp_resource_manager pulp_celerybeat pulp_workers), the task errors with "There was an issue with the backend service pulp: .... "

Comment 9 Brad Buckingham 2016-05-18 22:07:32 UTC

This bug is required to support bug 1327292.

Comment 11 jcallaha 2016-09-09 23:34:38 UTC

Failed QA in Satellite 6.2.2 Snap 1.1.
After stopping the pulp processes on the capsules (see below). The sync successfully kicks off. It made it to about 81% and hung there for several hours. When investigating the process, you can see that the task was waiting for the capsule's pulp to start. See attached screenshot.


[root@cloud-qe-4 ~]# systemctl stop pulp_resource_manager pulp_celerybeat pulp_workers
[root@cloud-qe-4 ~]# systemctl status pulp_resource_manager
● pulp_resource_manager.service - Pulp Resource Manager
   Loaded: loaded (/usr/lib/systemd/system/pulp_resource_manager.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Fri 2016-09-09 17:41:05 EDT; 3min 39s ago
 Main PID: 8155 (code=exited, status=0/SUCCESS)

Sep 09 17:24:57 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._queue_reserved_task[a36debbf-0a62-46ef-9...b22518ca7]
Sep 09 17:24:57 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[a36debbf-0a62-46ef-9069-b99b22518ca...956s: None
Sep 09 17:27:16 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._queue_reserved_task[2615c332-708c-4cb9-b...4a65e534e]
Sep 09 17:27:16 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[2615c332-708c-4cb9-bccb-ec64a65e534...992s: None
Sep 09 17:28:28 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._queue_reserved_task[5ab60c28-241c-42a6-b...45450a1c1]
Sep 09 17:28:28 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[5ab60c28-241c-42a6-bc5c-fa745450a1c...958s: None
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopping Pulp Resource Manager...
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8155]: worker: Warm shutdown (MainProcess)
Sep 09 17:41:04 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8155]: resource_manager.lab.eng.bos.redhat.com ready.
Sep 09 17:41:05 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopped Pulp Resource Manager.
Hint: Some lines were ellipsized, use -l to show in full.
[root@cloud-qe-4 ~]# systemctl status pulp_celerybeat
● pulp_celerybeat.service - Pulp's Celerybeat
   Loaded: loaded (/usr/lib/systemd/system/pulp_celerybeat.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Fri 2016-09-09 17:41:05 EDT; 3min 58s ago
 Main PID: 8090 (code=exited, status=0/SUCCESS)

Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopping Pulp's Celerybeat...
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: celery beat v3.1.11 (Cipater) is starting.
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: __    -    ... __   -        _
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: Configuration ->
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: . broker -> qpid://cloud-qe-4.idmqe.lab.eng.bos.redhat.com:5671//
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: . loader -> celery.loaders.app.AppLoader
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: . scheduler -> pulp.server.async.scheduler.Scheduler
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: . logfile -> [stderr]@%INFO
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: . maxinterval -> now (0s)
Sep 09 17:41:05 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopped Pulp's Celerybeat.
[root@cloud-qe-4 ~]# systemctl status pulp_workers
● pulp_workers.service - Pulp Celery Workers
   Loaded: loaded (/usr/lib/systemd/system/pulp_workers.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Fri 2016-09-09 17:41:05 EDT; 4min 10s ago
  Process: 10720 ExecStop=/usr/bin/python -m pulp.server.async.manage_workers stop (code=exited, status=0/SUCCESS)
 Main PID: 8189 (code=exited, status=0/SUCCESS)

Sep 09 15:13:26 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Starting Pulp Celery Workers...
Sep 09 15:13:26 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Started Pulp Celery Workers.
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopping Pulp Celery Workers...
Sep 09 17:41:05 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopped Pulp Celery Workers.

Comment 12 jcallaha 2016-09-09 23:39:36 UTC

Created attachment 1199669 [details]
dynflow

Comment 13 John Mitsch 2016-09-13 15:26:40 UTC

Discussing this failure with our pulp team, the issue is really with our existing pulp services check. It will still return an OK status even if some pulp services are down. We are in the process of updating this.

The changes in this BZ are still useful as they will tell if a capsule is online.

I suggest we update the testing for this BZ to:

Make sure capsules are syncing properly, when apache is shut off on a capsule (systemctl stop httpd), the task errors with "There was an issue with the backend service pulp: .... "

and update the pulp services check in a separate BZ

Comment 14 Mike McCune 2016-09-13 17:40:25 UTC

moving back ON_QA with restricted scenario above

Comment 15 John Mitsch 2016-09-13 17:47:57 UTC

Opened up a new bug for the pulp services check

https://bugzilla.redhat.com/show_bug.cgi?id=1375691

Comment 16 jcallaha 2016-09-13 20:29:18 UTC

Verified in Satellite 6.2.2 Snap 1.1

Moving to verified with the split to a new bug. The health check correctly determines when a capsule can't be communicated with.

After turning off httpd, i get the following error message "Connection refused - connect(2) for "wolverine.idmqe.lab.eng.bos.redhat.com" port 443"

Comment 17 Bryan Kearney 2016-09-15 13:58:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1885

Note You need to log in before you can comment on or make changes to this bug.