Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
On a customer data (internally reproduced), candlepin stops responding on requests. The cause is hung ExpiredPoolsJob after logging:
Updating stacked entitlements for 1661 consumers...
usually followed by:
- a postgres connection leak ("remaining connection slots are reserved for non-replication superuser connections")
- a postgres deadlock
followed 5-20 minutes after the Job kick off.
Version-Release number of selected component (if applicable):
Sat6.5 / candlepin-2.5.17-1.el7sat.noarch
How reproducible:
100% on the customer data
Steps to Reproduce:
1. Have the customer candlepin's DB (reproducer to be provided)
2. Add to candlepin.conf line like:
pinsetter.org.candlepin.pinsetter.tasks.ExpiredPoolsJob.schedule=0 40 * * * ? *
where the "40" is the minute when ExpiredPoolsJob should be kicked off (so one does not need to wait for full hour to default kick off)
3. Follow https://access.redhat.com/solutions/2955931 just to enable curl requests to check candlepin responsiveness (maybe "hammer ping" is sufficient)
4. Monitor /var/log/candlepin/candlepin.log for "ExpiredPoolsJob"
5. Sometimes check candlepin responds to requests like:
curl -u admin:admin -ks https://127.0.0.1:8443/candlepin/consumers/e26929e7-5249-47ce-b5c2-035b05c2456f
6. (havent seen on reproducer yet) monitor postgres logs for a deadlock and/or no free connections log entry
7. monitor CPU usage of tomcat process
Actual results:
4. ExpiredPoolsJob is started but never terminates, like:
2020-01-20 16:40:00,065 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.pinsetter.tasks.KingpinJob - Starting job: org.candlepin.pinsetter.tasks.ExpiredPoolsJob
2020-01-20 16:40:00,203 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Attempting to delete 10 pools...
2020-01-20 16:40:00,203 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Fetching related pools and entitlements...
2020-01-20 16:40:08,960 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Locked 11 pools for deletion...
2020-01-20 16:42:13,429 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Revoking 11271 entitlements...
2020-01-20 16:42:54,000 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Entitlements successfully revoked
2020-01-20 16:42:54,000 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Deleting 11 pools...
2020-01-20 16:42:56,258 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Pools successfully deleted
2020-01-20 16:42:56,258 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Updating entitlement counts on remaining, affected pools...
2020-01-20 16:43:04,344 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Entitlement counts successfully updated for 0 pools and 11242 consumers
2020-01-20 16:43:04,601 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Updating stacked entitlements for 1661 consumers...
(and no "Job completed:")
Sometimes (like on the reproducer just recently), the job can fail with:
2020-01-20 17:16:17,304 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.pinsetter.tasks.KingpinJob - Job completed: time=2177239
2020-01-20 17:16:17,305 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] ERROR org.quartz.core.JobRunShell - Job cron group.ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8 threw an unhandled Exception:
java.lang.NullPointerException: null
at org.candlepin.policy.SystemPurposeComplianceRules.getStatus(SystemPurposeComplianceRules.java:143)
at org.candlepin.controller.CandlepinPoolManager.deletePools(CandlepinPoolManager.java:2389)
at com.google.inject.persist.jpa.JpaLocalTxnInterceptor.invoke(JpaLocalTxnInterceptor.java:62)
at org.candlepin.controller.CandlepinPoolManager.deletePools(CandlepinPoolManager.java:2143)
at org.candlepin.controller.CandlepinPoolManager.cleanupExpiredPoolsImpl(CandlepinPoolManager.java:466)
at com.google.inject.persist.jpa.JpaLocalTxnInterceptor.invoke(JpaLocalTxnInterceptor.java:70)
at org.candlepin.controller.CandlepinPoolManager.cleanupExpiredPools(CandlepinPoolManager.java:435)
at org.candlepin.pinsetter.tasks.ExpiredPoolsJob.toExecute(ExpiredPoolsJob.java:49)
at org.candlepin.pinsetter.tasks.KingpinJob.execute(KingpinJob.java:115)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
2020-01-20 17:16:17,306 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] ERROR org.quartz.core.ErrorLogger - Job (cron group.ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8 threw an exception.
org.quartz.SchedulerException: Job threw an unhandled exception.
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: java.lang.NullPointerException: null
at org.candlepin.policy.SystemPurposeComplianceRules.getStatus(SystemPurposeComplianceRules.java:143)
at org.candlepin.controller.CandlepinPoolManager.deletePools(CandlepinPoolManager.java:2389)
at com.google.inject.persist.jpa.JpaLocalTxnInterceptor.invoke(JpaLocalTxnInterceptor.java:62)
at org.candlepin.controller.CandlepinPoolManager.deletePools(CandlepinPoolManager.java:2143)
at org.candlepin.controller.CandlepinPoolManager.cleanupExpiredPoolsImpl(CandlepinPoolManager.java:466)
at com.google.inject.persist.jpa.JpaLocalTxnInterceptor.invoke(JpaLocalTxnInterceptor.java:70)
at org.candlepin.controller.CandlepinPoolManager.cleanupExpiredPools(CandlepinPoolManager.java:435)
at org.candlepin.pinsetter.tasks.ExpiredPoolsJob.toExecute(ExpiredPoolsJob.java:49)
at org.candlepin.pinsetter.tasks.KingpinJob.execute(KingpinJob.java:115)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
... 1 common frames omitted
2020-01-20 17:16:17,308 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] ERROR org.candlepin.pinsetter.core.PinsetterJobListener - Job [ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8] failed.
org.quartz.JobExecutionException: org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: java.lang.NullPointerException]
at org.quartz.core.JobRunShell.run(JobRunShell.java:218)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: org.quartz.SchedulerException: Job threw an unhandled exception.
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
... 1 common frames omitted
Caused by: java.lang.NullPointerException: null
at org.candlepin.policy.SystemPurposeComplianceRules.getStatus(SystemPurposeComplianceRules.java:143)
at org.candlepin.controller.CandlepinPoolManager.deletePools(CandlepinPoolManager.java:2389)
at com.google.inject.persist.jpa.JpaLocalTxnInterceptor.invoke(JpaLocalTxnInterceptor.java:62)
at org.candlepin.controller.CandlepinPoolManager.deletePools(CandlepinPoolManager.java:2143)
at org.candlepin.controller.CandlepinPoolManager.cleanupExpiredPoolsImpl(CandlepinPoolManager.java:466)
at com.google.inject.persist.jpa.JpaLocalTxnInterceptor.invoke(JpaLocalTxnInterceptor.java:70)
at org.candlepin.controller.CandlepinPoolManager.cleanupExpiredPools(CandlepinPoolManager.java:435)
at org.candlepin.pinsetter.tasks.ExpiredPoolsJob.toExecute(ExpiredPoolsJob.java:49)
at org.candlepin.pinsetter.tasks.KingpinJob.execute(KingpinJob.java:115)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
... 1 common frames omitted
5. the curl command hangs after a time (20-ish minutes after Job started, I expect - havent seen on my reproducer, just at customer)
6. either no below connection error or deadlock is logged (this probably requires some specific load / requests fired, havent seen on my reproducer, just sometimes in customer logs):
2020-01-16 23:06:00 CST FATAL: remaining connection slots are reserved for non-replication superuser connections
2020-01-17 21:02:48 CST ERROR: deadlock detected
2020-01-17 21:02:48 CST DETAIL: Process 177429 waits for ShareLock on transaction 1842161269; blocked by process 217155.
Process 217155 waits for ShareLock on transaction 1842136013; blocked by process 177429.
Process 177429: update cp_consumer set created=$1, updated=$2, annotations=$3, autoheal=$4, complianceStatusHash=$5, cont_acc_cert_id=$6, content_access_mode=$7, entitlement_count=$8, entitlementStatus=$9, environment_id=$10, consumer_idcert_id=$11, keyPair_id=$12, lastCheckin=$13, name=$14, owner_id=$15, releaseVer=$16, sp_role=$17, serviceLevel=$18, sp_status=$19, sp_status_hash=$20, type_id=$21, sp_usage=$22, username=$23, uuid=$24 where id=$25
Process 217155: delete from cp_entitlement where id=$1
7. CPU slowly grows over time
Expected results:
4. Job completes in a reasonable time
5. No requests hung
6. No postgres connection leak or deadlock
7. no CPU growth
Additional info:
After discussion within Engineering we are going to make a change to Satellite so the ExpiredPoolsJob only runs 1x every 24 hours. That will be included in this BZ.
Verified in 6.7 snap 14
[root@dhcp-2-119 ~]# grep Expired /etc/candlepin/candlepin.conf
# Quartz schedule notation for how often to run the ExpiredPoolsJob
pinsetter.org.candlepin.pinsetter.tasks.ExpiredPoolsJob.schedule=0 0 0 * * ?
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2020:1454
Description of problem: On a customer data (internally reproduced), candlepin stops responding on requests. The cause is hung ExpiredPoolsJob after logging: Updating stacked entitlements for 1661 consumers... usually followed by: - a postgres connection leak ("remaining connection slots are reserved for non-replication superuser connections") - a postgres deadlock followed 5-20 minutes after the Job kick off. Version-Release number of selected component (if applicable): Sat6.5 / candlepin-2.5.17-1.el7sat.noarch How reproducible: 100% on the customer data Steps to Reproduce: 1. Have the customer candlepin's DB (reproducer to be provided) 2. Add to candlepin.conf line like: pinsetter.org.candlepin.pinsetter.tasks.ExpiredPoolsJob.schedule=0 40 * * * ? * where the "40" is the minute when ExpiredPoolsJob should be kicked off (so one does not need to wait for full hour to default kick off) 3. Follow https://access.redhat.com/solutions/2955931 just to enable curl requests to check candlepin responsiveness (maybe "hammer ping" is sufficient) 4. Monitor /var/log/candlepin/candlepin.log for "ExpiredPoolsJob" 5. Sometimes check candlepin responds to requests like: curl -u admin:admin -ks https://127.0.0.1:8443/candlepin/consumers/e26929e7-5249-47ce-b5c2-035b05c2456f 6. (havent seen on reproducer yet) monitor postgres logs for a deadlock and/or no free connections log entry 7. monitor CPU usage of tomcat process Actual results: 4. ExpiredPoolsJob is started but never terminates, like: 2020-01-20 16:40:00,065 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.pinsetter.tasks.KingpinJob - Starting job: org.candlepin.pinsetter.tasks.ExpiredPoolsJob 2020-01-20 16:40:00,203 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Attempting to delete 10 pools... 2020-01-20 16:40:00,203 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Fetching related pools and entitlements... 2020-01-20 16:40:08,960 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Locked 11 pools for deletion... 2020-01-20 16:42:13,429 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Revoking 11271 entitlements... 2020-01-20 16:42:54,000 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Entitlements successfully revoked 2020-01-20 16:42:54,000 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Deleting 11 pools... 2020-01-20 16:42:56,258 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Pools successfully deleted 2020-01-20 16:42:56,258 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Updating entitlement counts on remaining, affected pools... 2020-01-20 16:43:04,344 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Entitlement counts successfully updated for 0 pools and 11242 consumers 2020-01-20 16:43:04,601 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.controller.CandlepinPoolManager - Updating stacked entitlements for 1661 consumers... (and no "Job completed:") Sometimes (like on the reproducer just recently), the job can fail with: 2020-01-20 17:16:17,304 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] INFO org.candlepin.pinsetter.tasks.KingpinJob - Job completed: time=2177239 2020-01-20 17:16:17,305 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] ERROR org.quartz.core.JobRunShell - Job cron group.ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8 threw an unhandled Exception: java.lang.NullPointerException: null at org.candlepin.policy.SystemPurposeComplianceRules.getStatus(SystemPurposeComplianceRules.java:143) at org.candlepin.controller.CandlepinPoolManager.deletePools(CandlepinPoolManager.java:2389) at com.google.inject.persist.jpa.JpaLocalTxnInterceptor.invoke(JpaLocalTxnInterceptor.java:62) at org.candlepin.controller.CandlepinPoolManager.deletePools(CandlepinPoolManager.java:2143) at org.candlepin.controller.CandlepinPoolManager.cleanupExpiredPoolsImpl(CandlepinPoolManager.java:466) at com.google.inject.persist.jpa.JpaLocalTxnInterceptor.invoke(JpaLocalTxnInterceptor.java:70) at org.candlepin.controller.CandlepinPoolManager.cleanupExpiredPools(CandlepinPoolManager.java:435) at org.candlepin.pinsetter.tasks.ExpiredPoolsJob.toExecute(ExpiredPoolsJob.java:49) at org.candlepin.pinsetter.tasks.KingpinJob.execute(KingpinJob.java:115) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) 2020-01-20 17:16:17,306 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] ERROR org.quartz.core.ErrorLogger - Job (cron group.ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8 threw an exception. org.quartz.SchedulerException: Job threw an unhandled exception. at org.quartz.core.JobRunShell.run(JobRunShell.java:213) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) Caused by: java.lang.NullPointerException: null at org.candlepin.policy.SystemPurposeComplianceRules.getStatus(SystemPurposeComplianceRules.java:143) at org.candlepin.controller.CandlepinPoolManager.deletePools(CandlepinPoolManager.java:2389) at com.google.inject.persist.jpa.JpaLocalTxnInterceptor.invoke(JpaLocalTxnInterceptor.java:62) at org.candlepin.controller.CandlepinPoolManager.deletePools(CandlepinPoolManager.java:2143) at org.candlepin.controller.CandlepinPoolManager.cleanupExpiredPoolsImpl(CandlepinPoolManager.java:466) at com.google.inject.persist.jpa.JpaLocalTxnInterceptor.invoke(JpaLocalTxnInterceptor.java:70) at org.candlepin.controller.CandlepinPoolManager.cleanupExpiredPools(CandlepinPoolManager.java:435) at org.candlepin.pinsetter.tasks.ExpiredPoolsJob.toExecute(ExpiredPoolsJob.java:49) at org.candlepin.pinsetter.tasks.KingpinJob.execute(KingpinJob.java:115) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) ... 1 common frames omitted 2020-01-20 17:16:17,308 [thread=QuartzScheduler_Worker-7] [job=ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8, org=, csid=] ERROR org.candlepin.pinsetter.core.PinsetterJobListener - Job [ExpiredPoolsJob-0406c674-ac1c-4d09-b27b-4ccbc67da3e8] failed. org.quartz.JobExecutionException: org.quartz.SchedulerException: Job threw an unhandled exception. [See nested exception: java.lang.NullPointerException] at org.quartz.core.JobRunShell.run(JobRunShell.java:218) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) Caused by: org.quartz.SchedulerException: Job threw an unhandled exception. at org.quartz.core.JobRunShell.run(JobRunShell.java:213) ... 1 common frames omitted Caused by: java.lang.NullPointerException: null at org.candlepin.policy.SystemPurposeComplianceRules.getStatus(SystemPurposeComplianceRules.java:143) at org.candlepin.controller.CandlepinPoolManager.deletePools(CandlepinPoolManager.java:2389) at com.google.inject.persist.jpa.JpaLocalTxnInterceptor.invoke(JpaLocalTxnInterceptor.java:62) at org.candlepin.controller.CandlepinPoolManager.deletePools(CandlepinPoolManager.java:2143) at org.candlepin.controller.CandlepinPoolManager.cleanupExpiredPoolsImpl(CandlepinPoolManager.java:466) at com.google.inject.persist.jpa.JpaLocalTxnInterceptor.invoke(JpaLocalTxnInterceptor.java:70) at org.candlepin.controller.CandlepinPoolManager.cleanupExpiredPools(CandlepinPoolManager.java:435) at org.candlepin.pinsetter.tasks.ExpiredPoolsJob.toExecute(ExpiredPoolsJob.java:49) at org.candlepin.pinsetter.tasks.KingpinJob.execute(KingpinJob.java:115) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) ... 1 common frames omitted 5. the curl command hangs after a time (20-ish minutes after Job started, I expect - havent seen on my reproducer, just at customer) 6. either no below connection error or deadlock is logged (this probably requires some specific load / requests fired, havent seen on my reproducer, just sometimes in customer logs): 2020-01-16 23:06:00 CST FATAL: remaining connection slots are reserved for non-replication superuser connections 2020-01-17 21:02:48 CST ERROR: deadlock detected 2020-01-17 21:02:48 CST DETAIL: Process 177429 waits for ShareLock on transaction 1842161269; blocked by process 217155. Process 217155 waits for ShareLock on transaction 1842136013; blocked by process 177429. Process 177429: update cp_consumer set created=$1, updated=$2, annotations=$3, autoheal=$4, complianceStatusHash=$5, cont_acc_cert_id=$6, content_access_mode=$7, entitlement_count=$8, entitlementStatus=$9, environment_id=$10, consumer_idcert_id=$11, keyPair_id=$12, lastCheckin=$13, name=$14, owner_id=$15, releaseVer=$16, sp_role=$17, serviceLevel=$18, sp_status=$19, sp_status_hash=$20, type_id=$21, sp_usage=$22, username=$23, uuid=$24 where id=$25 Process 217155: delete from cp_entitlement where id=$1 7. CPU slowly grows over time Expected results: 4. Job completes in a reasonable time 5. No requests hung 6. No postgres connection leak or deadlock 7. no CPU growth Additional info: