Created from redmine issue http://projects.theforeman.org/issues/13194
Upstream bug assigned to jomitsch
Moving to POST since upstream bug http://projects.theforeman.org/issues/13194 has been closed ------------- John Mitsch Applied in changeset commit:katello|e09678cda14c8a0721f540a49d5af10b46acb45b.
Not having this incorporated in 6.2 is blocking a capsule sync. Can we merge this in 6.2 code?
To test this make sure capsules are syncing properly and when pulp is shut off on a capsule (systemctl stop pulp_resource_manager pulp_celerybeat pulp_workers), the task errors with "There was an issue with the backend service pulp: .... "
This bug is required to support bug 1327292.
Failed QA in Satellite 6.2.2 Snap 1.1. After stopping the pulp processes on the capsules (see below). The sync successfully kicks off. It made it to about 81% and hung there for several hours. When investigating the process, you can see that the task was waiting for the capsule's pulp to start. See attached screenshot. [root@cloud-qe-4 ~]# systemctl stop pulp_resource_manager pulp_celerybeat pulp_workers [root@cloud-qe-4 ~]# systemctl status pulp_resource_manager ● pulp_resource_manager.service - Pulp Resource Manager Loaded: loaded (/usr/lib/systemd/system/pulp_resource_manager.service; enabled; vendor preset: disabled) Active: inactive (dead) since Fri 2016-09-09 17:41:05 EDT; 3min 39s ago Main PID: 8155 (code=exited, status=0/SUCCESS) Sep 09 17:24:57 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._queue_reserved_task[a36debbf-0a62-46ef-9...b22518ca7] Sep 09 17:24:57 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[a36debbf-0a62-46ef-9069-b99b22518ca...956s: None Sep 09 17:27:16 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._queue_reserved_task[2615c332-708c-4cb9-b...4a65e534e] Sep 09 17:27:16 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[2615c332-708c-4cb9-bccb-ec64a65e534...992s: None Sep 09 17:28:28 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._queue_reserved_task[5ab60c28-241c-42a6-b...45450a1c1] Sep 09 17:28:28 cloud-qe-4.idmqe.lab.eng.bos.redhat.com pulp[8155]: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[5ab60c28-241c-42a6-bc5c-fa745450a1c...958s: None Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopping Pulp Resource Manager... Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8155]: worker: Warm shutdown (MainProcess) Sep 09 17:41:04 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8155]: resource_manager.lab.eng.bos.redhat.com ready. Sep 09 17:41:05 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopped Pulp Resource Manager. Hint: Some lines were ellipsized, use -l to show in full. [root@cloud-qe-4 ~]# systemctl status pulp_celerybeat ● pulp_celerybeat.service - Pulp's Celerybeat Loaded: loaded (/usr/lib/systemd/system/pulp_celerybeat.service; enabled; vendor preset: disabled) Active: inactive (dead) since Fri 2016-09-09 17:41:05 EDT; 3min 58s ago Main PID: 8090 (code=exited, status=0/SUCCESS) Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopping Pulp's Celerybeat... Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: celery beat v3.1.11 (Cipater) is starting. Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: __ - ... __ - _ Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: Configuration -> Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: . broker -> qpid://cloud-qe-4.idmqe.lab.eng.bos.redhat.com:5671// Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: . loader -> celery.loaders.app.AppLoader Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: . scheduler -> pulp.server.async.scheduler.Scheduler Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: . logfile -> [stderr]@%INFO Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com celery[8090]: . maxinterval -> now (0s) Sep 09 17:41:05 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopped Pulp's Celerybeat. [root@cloud-qe-4 ~]# systemctl status pulp_workers ● pulp_workers.service - Pulp Celery Workers Loaded: loaded (/usr/lib/systemd/system/pulp_workers.service; enabled; vendor preset: disabled) Active: inactive (dead) since Fri 2016-09-09 17:41:05 EDT; 4min 10s ago Process: 10720 ExecStop=/usr/bin/python -m pulp.server.async.manage_workers stop (code=exited, status=0/SUCCESS) Main PID: 8189 (code=exited, status=0/SUCCESS) Sep 09 15:13:26 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Starting Pulp Celery Workers... Sep 09 15:13:26 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Started Pulp Celery Workers. Sep 09 17:41:03 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopping Pulp Celery Workers... Sep 09 17:41:05 cloud-qe-4.idmqe.lab.eng.bos.redhat.com systemd[1]: Stopped Pulp Celery Workers.
Created attachment 1199669 [details] dynflow
Discussing this failure with our pulp team, the issue is really with our existing pulp services check. It will still return an OK status even if some pulp services are down. We are in the process of updating this. The changes in this BZ are still useful as they will tell if a capsule is online. I suggest we update the testing for this BZ to: Make sure capsules are syncing properly, when apache is shut off on a capsule (systemctl stop httpd), the task errors with "There was an issue with the backend service pulp: .... " and update the pulp services check in a separate BZ
moving back ON_QA with restricted scenario above
Opened up a new bug for the pulp services check https://bugzilla.redhat.com/show_bug.cgi?id=1375691
Verified in Satellite 6.2.2 Snap 1.1 Moving to verified with the split to a new bug. The health check correctly determines when a capsule can't be communicated with. After turning off httpd, i get the following error message "Connection refused - connect(2) for "wolverine.idmqe.lab.eng.bos.redhat.com" port 443"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1885