Bug 1732729 - [BUG] Errata Management action/tasks hang
Summary: [BUG] Errata Management action/tasks hang
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Errata Management
Version: 6.6.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: Unspecified
Assignee: satellite6-bugs
QA Contact: Perry Gagne
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-24 08:46 UTC by Imaan
Modified: 2019-09-28 16:12 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-28 16:12:07 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Imaan 2019-07-24 08:46:42 UTC
Description of problem:

While patching the system using Katello Agent, errata management
tasks/actions hang for more than 14 hours (then we cancelled them).
It was cause by qpidd/qdrouterd on a satellite server not having
capacity for all the clients we have (we need to increase max open files
and fs.aio-max-nr), but this bug is about that infinite hang of a task.

Version-Release number of selected component (if applicable):

Red Hat Satellite (build: 6.6.0 Beta)
Version 6.6 © 2019 Red Hat Inc.

How reproducible: Always

Steps to Reproduce:
 
1. Login to Satellite operational Portal

2. Navigate to Hosts -> Content Hosts -> Select multiple hosts -> Click on Select Action -> Manage Errata -> Select all the errata avaiable -> Click on Install Selected -> via katello agent -> Done

3. In order to check the status of the task, go to Monitor -> Tasks -> here you will see one task named as "Bulk action" which is showing pending.

Actual results: The task is stuck and it started at 14 hours ago (we tried with group of 10 hosts and about 1/2 failed, remaining hanged).

Expected results: The bulk action should not be stuck. It should be a success or a failure.

Additional info: Katello Agent is installed on all the hosts.

Comment 7 Bryan Kearney 2019-09-05 15:33:46 UTC
clearing the needinfo as no request was made.

Comment 8 Mike McCune 2019-09-28 16:12:07 UTC
We definitely have a '500' connected agents limit in 6.6 with default settings. I configured ~700 agent containers to connect to a Satellite 6.6 server and it maxes out at 500:


# qpid-stat -q --ssl-certificate=/etc/pki/pulp/qpid/client.crt -b amqps://localhost:5671 |grep pulp.agent | wc -l
503

# docker ps | wc -l
700

new connection attempts result in:

Sep 28 11:37:13 ci-vm-10-0-150-175.hosted.upshift.rdu2.redhat.com goferd[13653]: [ERROR][worker-0] gofer.messaging.adapter.connect:33 - connect: proton+amqps://sat-r220-09.lab.eng.rdu2.redhat.com:5647, failed: Connection amqps://sat-r220-09.lab.eng.rdu2.redhat.com:5647 disconnected: Condition('amqp:resource-limit-exceeded', 'local-idle-timeout expired')

the fix is to add these 2 configurations to /etc/foreman-installer/custom-hiera.yaml

qpid::open_file_limit: 65536
qpid::router::open_file_limit: 150100

run 'satellite-installer' and restart. Once applied, clients are able to connect:

Sep 28 11:58:18 ci-vm-10-0-150-175.hosted.upshift.rdu2.redhat.com goferd[13653]: [INFO][pulp.agent.70dc6424-48d7-43bf-92a0-f465df9eea89] gofer.messaging.adapter.connect:30 - connected: proton+amqps://sat-r220-09.lab.eng.rdu2.redhat.com:5647

This is covered in the 6.5 and 6.6 Tuning Guide:

https://access.redhat.com/solutions/4224211

as well as the Tuning Profiles documented:

https://github.com/RedHatSatellite/satellite-support/tree/master/tuning-profiles

Going to close this out as NOTABUG as it is documented in our tuning guides

Comment 9 Mike McCune 2019-09-28 16:12:49 UTC
note, fs.aio-max-nr is not required tuning for 500 gofer/katello-agent clients, just the open_file_limit.


Note You need to log in before you can comment on or make changes to this bug.