Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1997225 - Noticed "event_queue_error: type=delete_host_agent_queue, object_id=XX" error logging during concurrent host build/rebuild/re-registration/deletion in Satellite 6.10
Summary: Noticed "event_queue_error: type=delete_host_agent_queue, object_id=XX" error...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Hosts - Content
Version: 6.10.0
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: 6.10.0
Assignee: Lucy Fu
QA Contact: Stephen Wadeley
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-24 16:19 UTC by Sayan Das
Modified: 2021-11-16 14:13 UTC (History)
4 users (show)

Fixed In Version: tfm-rubygem-katello-4.1.1.22-1,tfm-rubygem-katello-4.1.1.28-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-16 14:13:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 33348 0 None None None 2021-08-25 15:47:55 UTC
Red Hat Product Errata RHSA-2021:4702 0 None None None 2021-11-16 14:13:35 UTC

Description Sayan Das 2021-08-24 16:19:48 UTC
Description of problem:

While doing certain host build/rebuild/re-registration/deletion-related tasks, I have noticed the following errors in production.log at certain times.

2021-08-24T21:15:40 [E|kat|25e729cf] event_queue_error: type=delete_host_agent_queue, object_id=21
2021-08-24T21:15:40 [E|kat|25e729cf] Connection refused - connect(2) for "localhost" port 5671
2021-08-24T21:15:40 [E|kat|25e729cf] /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `initialize'
 25e729cf | /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `new'
 25e729cf | /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `connect'
 25e729cf | /opt/theforeman/tfm/root/usr/share/gems/gems/katello-4.1.1.11/app/lib/katello/qpid/connection.rb:18:in `on_container_start'


Version-Release number of selected component (if applicable):

Satellite 6.10 [ snap 13 ]


How reproducible:

Not always


Steps to Reproduce:
1. Install a satellite 6 and set it up for system deployment.
2. Deploy some RHEL 7 hosts concurrently from Satellite server and wait for their build to get completed.
3. Perform some re-registrations, host deletion etc
4. Execute --> grep 'Connection refused - connect(2) for "localhost"' /var/log/foreman/production.log -B3 -A3 


Actual results:

# grep 'Connection refused - connect(2) for "localhost"' /var/log/foreman/production.log -B3 -A3
2021-08-24T18:45:23 [I|app|312ab092] ForemanWebhooks::EventSubscriber: host_updated.event.foreman event received
2021-08-24T18:45:23 [I|app|312ab092] Completed 201 Created in 1144ms (ActiveRecord: 298.3ms | Allocations: 225772)
2021-08-24T18:50:19 [E|kat|25e729cf] event_queue_error: type=delete_host_agent_queue, object_id=20
2021-08-24T18:50:19 [E|kat|25e729cf] Connection refused - connect(2) for "localhost" port 5671
2021-08-24T18:50:19 [E|kat|25e729cf] /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `initialize'
 25e729cf | /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `new'
 25e729cf | /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `connect'
--
2021-08-24T20:02:54 [I|app|f39c138b] Processing by NotificationRecipientsController#index as JSON
2021-08-24T20:02:54 [I|app|f39c138b] Completed 200 OK in 46ms (Views: 0.2ms | ActiveRecord: 2.9ms | Allocations: 17943)
2021-08-24T20:05:13 [E|kat|25e729cf] event_queue_error: type=delete_host_agent_queue, object_id=20
2021-08-24T20:05:13 [E|kat|25e729cf] Connection refused - connect(2) for "localhost" port 5671
2021-08-24T20:05:13 [E|kat|25e729cf] /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `initialize'
 25e729cf | /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `new'
 25e729cf | /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `connect'



Expected results:

These should not occur at all.


Additional info:

* I don't have any qpidd service running and based on the Engineering team, this scenario is occurring when satellite attempts to delete qpid queue when katello agent is disabled

Comment 4 Pavel Moravec 2021-08-25 07:21:33 UTC
I was able to reproduce it *once* by (re-)registering a host via the curl Register Host feature. Just one Host was in charge, dont recall particular parameters used (few tests executed, hit just once).

Comment 5 Pavel Moravec 2021-08-25 08:01:21 UTC
IMHO deterministic reproducer: register a host (with AK, could be via WebUI Register Host), and unregister it (or forcefully re-register it).

Then wait 10 minutes (this is tricky, see https://github.com/Katello/katello/blob/master/app/services/katello/registration_manager.rb#L268 for reasoning).

Simply katello just has some leftovers from qpid katello event queues that need to be removed:

1) https://github.com/Katello/katello/blob/master/app/services/katello/registration_manager.rb#L130
- remove this command / line as there is no katello agent queue

2) remove whole https://github.com/Katello/katello/blob/master/app/services/katello/registration_manager.rb#L264-L270 as being dead code

3) remove whole https://github.com/Katello/katello/blob/master/app/models/katello/events/delete_host_agent_queue.rb as being dead code

4) remove https://github.com/Katello/katello/blob/master/lib/katello/engine.rb#L236 as being orphaned / not-existing action

(not sure if whole https://github.com/Katello/katello/blob/master/app/services/katello/agent/dispatcher.rb and some tests referring to delete_client_queue should not be removed as well)


I *think* the bug is not severe as the unregister call flow just triggers an event and continues successfully further. And processing the particular event fails (with the backtrace), but the  Katello::EventQueue itself isnt affected - it still can process further events.

Comment 6 Lucy Fu 2021-08-25 15:47:54 UTC
Created redmine issue https://projects.theforeman.org/issues/33348 from this bug

Comment 7 Jonathon Turel 2021-08-25 20:08:32 UTC
Hey Pavel! None of the code you referred to is dead yet. In new Satellite 6.10 installations the katello agent infrastructure is only _disabled_ by default. For users upgrading, it will still be enabled and we still support it. If users want, they can run the installer with a flag to enable the agent bits. In a few Satellite releases, once we have a proper replacement for katello agent, then we'll remove all of that old stuff you pointed out!

This bug is still valid => when the agent infrastructure is disabled we don't need to worry about deleting the agent queue.

Comment 8 Pavel Moravec 2021-08-25 20:50:26 UTC
(In reply to Jonathon Turel from comment #7)
> Hey Pavel! None of the code you referred to is dead yet. In new Satellite
> 6.10 installations the katello agent infrastructure is only _disabled_ by
> default. For users upgrading, it will still be enabled and we still support
> it. If users want, they can run the installer with a flag to enable the
> agent bits. In a few Satellite releases, once we have a proper replacement
> for katello agent, then we'll remove all of that old stuff you pointed out!
> 
> This bug is still valid => when the agent infrastructure is disabled we
> don't need to worry about deleting the agent queue.

Ah I see, thanks for the explanation. So the registration_manager.rb#L130 should be called conditionally when the feature is enabled, only, I guess.

Comment 9 Sayan Das 2021-08-26 11:18:13 UTC
So now I can see that 10mins delay and map my action with it.

* Yesterday I had created a system from my sat 6.10

* Today, at "2021-08-26T16:30:11" i deleted the same system and it finished at 2021-08-26T16:30:18

* The traceback reported in BZ, popped up approx after 10 mins i.e. at "2021-08-26T16:40:13"

Comment 15 errata-xmlrpc 2021-11-16 14:13:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.10 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4702


Note You need to log in before you can comment on or make changes to this bug.