Hide Forgot
Description of problem: While doing certain host build/rebuild/re-registration/deletion-related tasks, I have noticed the following errors in production.log at certain times. 2021-08-24T21:15:40 [E|kat|25e729cf] event_queue_error: type=delete_host_agent_queue, object_id=21 2021-08-24T21:15:40 [E|kat|25e729cf] Connection refused - connect(2) for "localhost" port 5671 2021-08-24T21:15:40 [E|kat|25e729cf] /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `initialize' 25e729cf | /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `new' 25e729cf | /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `connect' 25e729cf | /opt/theforeman/tfm/root/usr/share/gems/gems/katello-4.1.1.11/app/lib/katello/qpid/connection.rb:18:in `on_container_start' Version-Release number of selected component (if applicable): Satellite 6.10 [ snap 13 ] How reproducible: Not always Steps to Reproduce: 1. Install a satellite 6 and set it up for system deployment. 2. Deploy some RHEL 7 hosts concurrently from Satellite server and wait for their build to get completed. 3. Perform some re-registrations, host deletion etc 4. Execute --> grep 'Connection refused - connect(2) for "localhost"' /var/log/foreman/production.log -B3 -A3 Actual results: # grep 'Connection refused - connect(2) for "localhost"' /var/log/foreman/production.log -B3 -A3 2021-08-24T18:45:23 [I|app|312ab092] ForemanWebhooks::EventSubscriber: host_updated.event.foreman event received 2021-08-24T18:45:23 [I|app|312ab092] Completed 201 Created in 1144ms (ActiveRecord: 298.3ms | Allocations: 225772) 2021-08-24T18:50:19 [E|kat|25e729cf] event_queue_error: type=delete_host_agent_queue, object_id=20 2021-08-24T18:50:19 [E|kat|25e729cf] Connection refused - connect(2) for "localhost" port 5671 2021-08-24T18:50:19 [E|kat|25e729cf] /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `initialize' 25e729cf | /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `new' 25e729cf | /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `connect' -- 2021-08-24T20:02:54 [I|app|f39c138b] Processing by NotificationRecipientsController#index as JSON 2021-08-24T20:02:54 [I|app|f39c138b] Completed 200 OK in 46ms (Views: 0.2ms | ActiveRecord: 2.9ms | Allocations: 17943) 2021-08-24T20:05:13 [E|kat|25e729cf] event_queue_error: type=delete_host_agent_queue, object_id=20 2021-08-24T20:05:13 [E|kat|25e729cf] Connection refused - connect(2) for "localhost" port 5671 2021-08-24T20:05:13 [E|kat|25e729cf] /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `initialize' 25e729cf | /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `new' 25e729cf | /opt/theforeman/tfm/root/usr/share/gems/gems/qpid_proton-0.33.0/lib/core/container.rb:130:in `connect' Expected results: These should not occur at all. Additional info: * I don't have any qpidd service running and based on the Engineering team, this scenario is occurring when satellite attempts to delete qpid queue when katello agent is disabled
I was able to reproduce it *once* by (re-)registering a host via the curl Register Host feature. Just one Host was in charge, dont recall particular parameters used (few tests executed, hit just once).
IMHO deterministic reproducer: register a host (with AK, could be via WebUI Register Host), and unregister it (or forcefully re-register it). Then wait 10 minutes (this is tricky, see https://github.com/Katello/katello/blob/master/app/services/katello/registration_manager.rb#L268 for reasoning). Simply katello just has some leftovers from qpid katello event queues that need to be removed: 1) https://github.com/Katello/katello/blob/master/app/services/katello/registration_manager.rb#L130 - remove this command / line as there is no katello agent queue 2) remove whole https://github.com/Katello/katello/blob/master/app/services/katello/registration_manager.rb#L264-L270 as being dead code 3) remove whole https://github.com/Katello/katello/blob/master/app/models/katello/events/delete_host_agent_queue.rb as being dead code 4) remove https://github.com/Katello/katello/blob/master/lib/katello/engine.rb#L236 as being orphaned / not-existing action (not sure if whole https://github.com/Katello/katello/blob/master/app/services/katello/agent/dispatcher.rb and some tests referring to delete_client_queue should not be removed as well) I *think* the bug is not severe as the unregister call flow just triggers an event and continues successfully further. And processing the particular event fails (with the backtrace), but the Katello::EventQueue itself isnt affected - it still can process further events.
Created redmine issue https://projects.theforeman.org/issues/33348 from this bug
Hey Pavel! None of the code you referred to is dead yet. In new Satellite 6.10 installations the katello agent infrastructure is only _disabled_ by default. For users upgrading, it will still be enabled and we still support it. If users want, they can run the installer with a flag to enable the agent bits. In a few Satellite releases, once we have a proper replacement for katello agent, then we'll remove all of that old stuff you pointed out! This bug is still valid => when the agent infrastructure is disabled we don't need to worry about deleting the agent queue.
(In reply to Jonathon Turel from comment #7) > Hey Pavel! None of the code you referred to is dead yet. In new Satellite > 6.10 installations the katello agent infrastructure is only _disabled_ by > default. For users upgrading, it will still be enabled and we still support > it. If users want, they can run the installer with a flag to enable the > agent bits. In a few Satellite releases, once we have a proper replacement > for katello agent, then we'll remove all of that old stuff you pointed out! > > This bug is still valid => when the agent infrastructure is disabled we > don't need to worry about deleting the agent queue. Ah I see, thanks for the explanation. So the registration_manager.rb#L130 should be called conditionally when the feature is enabled, only, I guess.
So now I can see that 10mins delay and map my action with it. * Yesterday I had created a system from my sat 6.10 * Today, at "2021-08-26T16:30:11" i deleted the same system and it finished at 2021-08-26T16:30:18 * The traceback reported in BZ, popped up approx after 10 mins i.e. at "2021-08-26T16:40:13"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Satellite 6.10 Release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4702