Bug 1326038 - [RFE] Run a health check of the capsule before configuration and syncing begins
Summary: [RFE] Run a health check of the capsule before configuration and syncing begins
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Content Management
Version: 6.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: Unspecified
Assignee: John Mitsch
QA Contact: jcallaha
URL: http://projects.theforeman.org/issues...
Whiteboard:
Depends On:
Blocks: 1327292
TreeView+ depends on / blocked
 
Reported: 2016-04-11 16:00 UTC by Bryan Kearney
Modified: 2019-09-25 21:18 UTC (History)
5 users (show)

Fixed In Version: rubygem-katello-3.0.0.78-1
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-15 13:58:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
dynflow (91.16 KB, image/png)
2016-09-09 23:39 UTC, jcallaha
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 13194 0 None None None 2016-04-22 16:30:18 UTC

Description Bryan Kearney 2016-04-11 16:00:14 UTC

Comment 1 Bryan Kearney 2016-04-11 16:00:16 UTC
Created from redmine issue http://projects.theforeman.org/issues/13194

Comment 2 Bryan Kearney 2016-04-11 16:00:18 UTC
Upstream bug assigned to jomitsch

Comment 3 Bryan Kearney 2016-04-11 18:11:04 UTC
Moving to POST since upstream bug http://projects.theforeman.org/issues/13194 has been closed
-------------
John Mitsch
Applied in changeset commit:katello|e09678cda14c8a0721f540a49d5af10b46acb45b.

Comment 5 John Mitsch 2016-05-18 19:01:48 UTC
Not having this incorporated in 6.2 is blocking a capsule sync. Can we merge this in 6.2 code?

Comment 6 John Mitsch 2016-05-18 19:02:22 UTC
Not having this incorporated in 6.2 is blocking a capsule sync. Can we merge this in 6.2 code?

Comment 7 John Mitsch 2016-05-18 20:26:04 UTC
To test this make sure capsules are syncing properly and when pulp is shut off on a capsule (systemctl stop pulp_resource_manager pulp_celerybeat pulp_workers), the task errors with "There was an issue with the backend service pulp: .... "

Comment 9 Brad Buckingham 2016-05-18 22:07:32 UTC
This bug is required to support bug 1327292.

Comment 11 jcallaha 2016-09-09 23:34:38 UTC
Failed QA in Satellite 6.2.2 Snap 1.1.
After stopping the pulp processes on the capsules (see below). The sync successfully kicks off. It made it to about 81% and hung there for several hours. When investigating the process, you can see that the task was waiting for the capsule's pulp to start. See attached screenshot.


[root@cloud-qe-4 ~]# systemctl stop pulp_resource_manager pulp_celerybeat pulp_workers
[root@cloud-qe-4 ~]# systemctl status pulp_resource_manager
● pulp_resource_manager.service - Pulp Resource Manager
   Loaded: loaded (/usr/lib/systemd/system/pulp_resource_manager.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Fri 2016-09-09 17:41:05 EDT; 3min 39s ago
 Main PID: 8155 (code=exited, status=0/SUCCESS)

Sep 09 17:24:57 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._queue_reserved_task[a36debbf-0a62-46ef-9...b22518ca7]
Sep 09 17:24:57 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[a36debbf-0a62-46ef-9069-b99b22518ca...956s: None
Sep 09 17:27:16 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._queue_reserved_task[2615c332-708c-4cb9-b...4a65e534e]
Sep 09 17:27:16 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[2615c332-708c-4cb9-bccb-ec64a65e534...992s: None
Sep 09 17:28:28 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._queue_reserved_task[5ab60c28-241c-42a6-b...45450a1c1]
Sep 09 17:28:28 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[5ab60c28-241c-42a6-bc5c-fa745450a1c...958s: None
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopping Pulp Resource Manager...
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8155]: worker: Warm shutdown (MainProcess)
Sep 09 17:41:04 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8155]: resource_manager.lab.eng.bos.redhat.com ready.
Sep 09 17:41:05 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopped Pulp Resource Manager.
Hint: Some lines were ellipsized, use -l to show in full.
[root@cloud-qe-4 ~]# systemctl status pulp_celerybeat
● pulp_celerybeat.service - Pulp's Celerybeat
   Loaded: loaded (/usr/lib/systemd/system/pulp_celerybeat.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Fri 2016-09-09 17:41:05 EDT; 3min 58s ago
 Main PID: 8090 (code=exited, status=0/SUCCESS)

Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopping Pulp's Celerybeat...
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: celery beat v3.1.11 (Cipater) is starting.
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: __    -    ... __   -        _
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: Configuration ->
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: . broker -> qpid://cloud-qe-4.idmqe.lab.eng.bos.redhat.com:5671//
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: . loader -> celery.loaders.app.AppLoader
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: . scheduler -> pulp.server.async.scheduler.Scheduler
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: . logfile -> [stderr]@%INFO
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: . maxinterval -> now (0s)
Sep 09 17:41:05 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopped Pulp's Celerybeat.
[root@cloud-qe-4 ~]# systemctl status pulp_workers
● pulp_workers.service - Pulp Celery Workers
   Loaded: loaded (/usr/lib/systemd/system/pulp_workers.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Fri 2016-09-09 17:41:05 EDT; 4min 10s ago
  Process: 10720 ExecStop=/usr/bin/python -m pulp.server.async.manage_workers stop (code=exited, status=0/SUCCESS)
 Main PID: 8189 (code=exited, status=0/SUCCESS)

Sep 09 15:13:26 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Starting Pulp Celery Workers...
Sep 09 15:13:26 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Started Pulp Celery Workers.
Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopping Pulp Celery Workers...
Sep 09 17:41:05 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopped Pulp Celery Workers.

Comment 12 jcallaha 2016-09-09 23:39:36 UTC
Created attachment 1199669 [details]
dynflow

Comment 13 John Mitsch 2016-09-13 15:26:40 UTC
Discussing this failure with our pulp team, the issue is really with our existing pulp services check. It will still return an OK status even if some pulp services are down. We are in the process of updating this.

The changes in this BZ are still useful as they will tell if a capsule is online.

I suggest we update the testing for this BZ to:

Make sure capsules are syncing properly, when apache is shut off on a capsule (systemctl stop httpd), the task errors with "There was an issue with the backend service pulp: .... "

and update the pulp services check in a separate BZ

Comment 14 Mike McCune 2016-09-13 17:40:25 UTC
moving back ON_QA with restricted scenario above

Comment 15 John Mitsch 2016-09-13 17:47:57 UTC
Opened up a new bug for the pulp services check

https://bugzilla.redhat.com/show_bug.cgi?id=1375691

Comment 16 jcallaha 2016-09-13 20:29:18 UTC
Verified in Satellite 6.2.2 Snap 1.1

Moving to verified with the split to a new bug. The health check correctly determines when a capsule can't be communicated with.

After turning off httpd, i get the following error message "Connection refused - connect(2) for "wolverine.idmqe.lab.eng.bos.redhat.com" port 443"

Comment 17 Bryan Kearney 2016-09-15 13:58:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1885


Note You need to log in before you can comment on or make changes to this bug.