Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1732729

Summary: [BUG] Errata Management action/tasks hang
Product: Red Hat Satellite Reporter: Imaan <ikaur>
Component: Errata ManagementAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED NOTABUG QA Contact: Perry Gagne <pgagne>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.6.0CC: bbuckingham, bkearney, mmccune
Target Milestone: Unspecified   
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-28 16:12:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Imaan 2019-07-24 08:46:42 UTC
Description of problem:

While patching the system using Katello Agent, errata management
tasks/actions hang for more than 14 hours (then we cancelled them).
It was cause by qpidd/qdrouterd on a satellite server not having
capacity for all the clients we have (we need to increase max open files
and fs.aio-max-nr), but this bug is about that infinite hang of a task.

Version-Release number of selected component (if applicable):

Red Hat Satellite (build: 6.6.0 Beta)
Version 6.6 © 2019 Red Hat Inc.

How reproducible: Always

Steps to Reproduce:
 
1. Login to Satellite operational Portal

2. Navigate to Hosts -> Content Hosts -> Select multiple hosts -> Click on Select Action -> Manage Errata -> Select all the errata avaiable -> Click on Install Selected -> via katello agent -> Done

3. In order to check the status of the task, go to Monitor -> Tasks -> here you will see one task named as "Bulk action" which is showing pending.

Actual results: The task is stuck and it started at 14 hours ago (we tried with group of 10 hosts and about 1/2 failed, remaining hanged).

Expected results: The bulk action should not be stuck. It should be a success or a failure.

Additional info: Katello Agent is installed on all the hosts.

Comment 7 Bryan Kearney 2019-09-05 15:33:46 UTC
clearing the needinfo as no request was made.

Comment 8 Mike McCune 2019-09-28 16:12:07 UTC
We definitely have a '500' connected agents limit in 6.6 with default settings. I configured ~700 agent containers to connect to a Satellite 6.6 server and it maxes out at 500:


# qpid-stat -q --ssl-certificate=/etc/pki/pulp/qpid/client.crt -b amqps://localhost:5671 |grep pulp.agent | wc -l
503

# docker ps | wc -l
700

new connection attempts result in:

Sep 28 11:37:13 ci-vm-10-0-150-175.hosted.upshift.rdu2.redhat.com goferd[13653]: [ERROR][worker-0] gofer.messaging.adapter.connect:33 - connect: proton+amqps://sat-r220-09.lab.eng.rdu2.redhat.com:5647, failed: Connection amqps://sat-r220-09.lab.eng.rdu2.redhat.com:5647 disconnected: Condition('amqp:resource-limit-exceeded', 'local-idle-timeout expired')

the fix is to add these 2 configurations to /etc/foreman-installer/custom-hiera.yaml

qpid::open_file_limit: 65536
qpid::router::open_file_limit: 150100

run 'satellite-installer' and restart. Once applied, clients are able to connect:

Sep 28 11:58:18 ci-vm-10-0-150-175.hosted.upshift.rdu2.redhat.com goferd[13653]: [INFO][pulp.agent.70dc6424-48d7-43bf-92a0-f465df9eea89] gofer.messaging.adapter.connect:30 - connected: proton+amqps://sat-r220-09.lab.eng.rdu2.redhat.com:5647

This is covered in the 6.5 and 6.6 Tuning Guide:

https://access.redhat.com/solutions/4224211

as well as the Tuning Profiles documented:

https://github.com/RedHatSatellite/satellite-support/tree/master/tuning-profiles

Going to close this out as NOTABUG as it is documented in our tuning guides

Comment 9 Mike McCune 2019-09-28 16:12:49 UTC
note, fs.aio-max-nr is not required tuning for 500 gofer/katello-agent clients, just the open_file_limit.