Bug 1997225
Summary: | Noticed "event_queue_error: type=delete_host_agent_queue, object_id=XX" error logging during concurrent host build/rebuild/re-registration/deletion in Satellite 6.10 | ||
---|---|---|---|
Product: | Red Hat Satellite | Reporter: | Sayan Das <saydas> |
Component: | Hosts - Content | Assignee: | Lucy Fu <lufu> |
Status: | CLOSED ERRATA | QA Contact: | Stephen Wadeley <swadeley> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.10.0 | CC: | inecas, jturel, pmoravec, zhunting |
Target Milestone: | 6.10.0 | Keywords: | Triaged |
Target Release: | Unused | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | tfm-rubygem-katello-4.1.1.22-1,tfm-rubygem-katello-4.1.1.28-1 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-11-16 14:13:28 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Sayan Das
2021-08-24 16:19:48 UTC
I was able to reproduce it *once* by (re-)registering a host via the curl Register Host feature. Just one Host was in charge, dont recall particular parameters used (few tests executed, hit just once). IMHO deterministic reproducer: register a host (with AK, could be via WebUI Register Host), and unregister it (or forcefully re-register it). Then wait 10 minutes (this is tricky, see https://github.com/Katello/katello/blob/master/app/services/katello/registration_manager.rb#L268 for reasoning). Simply katello just has some leftovers from qpid katello event queues that need to be removed: 1) https://github.com/Katello/katello/blob/master/app/services/katello/registration_manager.rb#L130 - remove this command / line as there is no katello agent queue 2) remove whole https://github.com/Katello/katello/blob/master/app/services/katello/registration_manager.rb#L264-L270 as being dead code 3) remove whole https://github.com/Katello/katello/blob/master/app/models/katello/events/delete_host_agent_queue.rb as being dead code 4) remove https://github.com/Katello/katello/blob/master/lib/katello/engine.rb#L236 as being orphaned / not-existing action (not sure if whole https://github.com/Katello/katello/blob/master/app/services/katello/agent/dispatcher.rb and some tests referring to delete_client_queue should not be removed as well) I *think* the bug is not severe as the unregister call flow just triggers an event and continues successfully further. And processing the particular event fails (with the backtrace), but the Katello::EventQueue itself isnt affected - it still can process further events. Created redmine issue https://projects.theforeman.org/issues/33348 from this bug Hey Pavel! None of the code you referred to is dead yet. In new Satellite 6.10 installations the katello agent infrastructure is only _disabled_ by default. For users upgrading, it will still be enabled and we still support it. If users want, they can run the installer with a flag to enable the agent bits. In a few Satellite releases, once we have a proper replacement for katello agent, then we'll remove all of that old stuff you pointed out! This bug is still valid => when the agent infrastructure is disabled we don't need to worry about deleting the agent queue. (In reply to Jonathon Turel from comment #7) > Hey Pavel! None of the code you referred to is dead yet. In new Satellite > 6.10 installations the katello agent infrastructure is only _disabled_ by > default. For users upgrading, it will still be enabled and we still support > it. If users want, they can run the installer with a flag to enable the > agent bits. In a few Satellite releases, once we have a proper replacement > for katello agent, then we'll remove all of that old stuff you pointed out! > > This bug is still valid => when the agent infrastructure is disabled we > don't need to worry about deleting the agent queue. Ah I see, thanks for the explanation. So the registration_manager.rb#L130 should be called conditionally when the feature is enabled, only, I guess. So now I can see that 10mins delay and map my action with it. * Yesterday I had created a system from my sat 6.10 * Today, at "2021-08-26T16:30:11" i deleted the same system and it finished at 2021-08-26T16:30:18 * The traceback reported in BZ, popped up approx after 10 mins i.e. at "2021-08-26T16:40:13" Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Satellite 6.10 Release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:4702 |