Description of problem:
While patching the system using Katello Agent, errata management
tasks/actions hang for more than 14 hours (then we cancelled them).
It was cause by qpidd/qdrouterd on a satellite server not having
capacity for all the clients we have (we need to increase max open files
and fs.aio-max-nr), but this bug is about that infinite hang of a task.
Version-Release number of selected component (if applicable):
Red Hat Satellite (build: 6.6.0 Beta)
Version 6.6 © 2019 Red Hat Inc.
How reproducible: Always
Steps to Reproduce:
1. Login to Satellite operational Portal
2. Navigate to Hosts -> Content Hosts -> Select multiple hosts -> Click on Select Action -> Manage Errata -> Select all the errata avaiable -> Click on Install Selected -> via katello agent -> Done
3. In order to check the status of the task, go to Monitor -> Tasks -> here you will see one task named as "Bulk action" which is showing pending.
Actual results: The task is stuck and it started at 14 hours ago (we tried with group of 10 hosts and about 1/2 failed, remaining hanged).
Expected results: The bulk action should not be stuck. It should be a success or a failure.
Additional info: Katello Agent is installed on all the hosts.
Applying the following tunings resolved the situation/problem:
clearing the needinfo as no request was made.
We definitely have a '500' connected agents limit in 6.6 with default settings. I configured ~700 agent containers to connect to a Satellite 6.6 server and it maxes out at 500:
# qpid-stat -q --ssl-certificate=/etc/pki/pulp/qpid/client.crt -b amqps://localhost:5671 |grep pulp.agent | wc -l
# docker ps | wc -l
new connection attempts result in:
Sep 28 11:37:13 ci-vm-10-0-150-175.hosted.upshift.rdu2.redhat.com goferd: [ERROR][worker-0] gofer.messaging.adapter.connect:33 - connect: proton+amqps://sat-r220-09.lab.eng.rdu2.redhat.com:5647, failed: Connection amqps://sat-r220-09.lab.eng.rdu2.redhat.com:5647 disconnected: Condition('amqp:resource-limit-exceeded', 'local-idle-timeout expired')
the fix is to add these 2 configurations to /etc/foreman-installer/custom-hiera.yaml
run 'satellite-installer' and restart. Once applied, clients are able to connect:
Sep 28 11:58:18 ci-vm-10-0-150-175.hosted.upshift.rdu2.redhat.com goferd: [INFO][pulp.agent.70dc6424-48d7-43bf-92a0-f465df9eea89] gofer.messaging.adapter.connect:30 - connected: proton+amqps://sat-r220-09.lab.eng.rdu2.redhat.com:5647
This is covered in the 6.5 and 6.6 Tuning Guide:
as well as the Tuning Profiles documented:
Going to close this out as NOTABUG as it is documented in our tuning guides
note, fs.aio-max-nr is not required tuning for 500 gofer/katello-agent clients, just the open_file_limit.