Bug 2249736

Summary: "Actions::Katello::Applicability::Hosts::BulkGenerate" tasks are processed in the default queue instead of hosts_queue causing congestion.
Product: Red Hat Satellite Reporter: Hao Chang Yu <hyu>
Component: Hosts - ContentAssignee: Hao Chang Yu <hyu>
Status: CLOSED ERRATA QA Contact: visawant
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.13.6CC: aruzicka, iballou, rlavi, visawant, zhunting
Target Milestone: 6.15.0Keywords: Patch, Performance, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-04-23 17:15:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hao Chang Yu 2023-11-15 04:38:01 UTC
Description of problem:
"Actions::Katello::Applicability::Hosts::BulkGenerate" tasks can cause congestion in the Dynflow "default" queue, especially with this performance bug 2203077. The tasks are supposed to process by the "hosts_queue" worker but they go to the default queue instead.

This is causing slowness to other tasks which need to be processed by the default queue worker. Content view publish task is hanging waiting the task status to be polled from Pulp.

How reproducible:
Easy

Steps to Reproduce:
1. Go to Web UI -> Hosts -> Content Hosts -> click in to the host -> Content -> Errata -> click "recalculate" 
2. Then on the Satellite command line, run the following command

watch -n 0.1 systemctl status dynflow-sidekiq

Actual results:
You should notice that 1 thread is busy like below:

# systemctl status dynflow-sidekiq
● dynflow-sidekiq - Foreman jobs daemon - worker-1 on sidekiq
   Loaded: loaded (/usr/lib/systemd/system/dynflow-sidekiq@.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2023-11-15 00:40:26 UTC; 3h 14min ago
     Docs: https://theforeman.org
 Main PID: xxxxx (sidekiq)
   Status: "Everything ready for world: 57b50c94-dbfe-4271-8033-76bc74138212"
    Tasks: 15 (limit: 126231)
   Memory: 1.5G
   CGroup: /system.slice/system-dynflow\x2dsidekiq.slice/dynflow-sidekiq
           └─1104255 sidekiq 6.3.1  [1 of 5 busy] <==================

Expected results:
Expect to see 1 thread to busy in "worker-hosts-queue-1" worker.


Additional info:
Setting['host_tasks_workers_pool_size'] is used during the initialization but Katello settings are still not loaded at that time. Therefore, the value is always 0 and the condition will never met.

# In lib/katello/engine.rb
~~~
    initializer "katello.register_actions", :before => :finisher_hook do |_app|
      ForemanTasks.dynflow.require!
      if (Setting.table_exists? rescue(false)) && Setting['host_tasks_workers_pool_size'].to_i > 0  <===============================
        ForemanTasks.dynflow.config.queues.add(HOST_TASKS_QUEUE, :pool_size => Setting['host_tasks_workers_pool_size'])
      end
~~~

We can also see the below warning in the production.log
~~~
2023-11-14T07:30:27 [W|app|] Setting host_tasks_workers_pool_size has no definition, please define it before using
~~~


# In "/usr/share/foreman/app/registries/foreman/setting_manager.rb" we can see the plugin settings will only be fully loaded in "config.to_prepare" state.
~~~
Rails.application.config.to_prepare do
  Foreman::SettingManager.validations.setup!
end
~~~

Comment 2 Adam Ruzicka 2023-11-15 09:05:47 UTC
Since we moved to sidekiq, the queues are defined externally in workers' configurations. We can manipulate ForemanTasks.dynflow.config.queues all we want, but it should have no effect whatsoever.

Comment 3 Hao Chang Yu 2023-11-15 09:33:43 UTC
(In reply to Adam Ruzicka from comment #2)
> Since we moved to sidekiq, the queues are defined externally in workers'
> configurations. We can manipulate ForemanTasks.dynflow.config.queues all we
> want, but it should have no effect whatsoever.

If you are saying passing the “-C” in the sidekiq command, it is already configure like that in the Satellite but it is not working.

Based on my test, it is indeed working after “ForemanTasks.dynflow.config.queues.add” is called so not why it has effect here.

Comment 4 Adam Ruzicka 2023-11-15 09:42:44 UTC
I see it now. Apparently dynflow needs to know that the queue exists, which the patch you suggested seems to do. The pool_size part is is the bit that gets ignored since the move to sidekiq.

Comment 5 Hao Chang Yu 2023-11-15 10:32:32 UTC
(In reply to Adam Ruzicka from comment #4)
> I see it now. Apparently dynflow needs to know that the queue exists, which
> the patch you suggested seems to do. The pool_size part is is the bit that
> gets ignored since the move to sidekiq.

Great! That will make the fix much easier. We will just need to remove the setting table check and pass any values for the pool_size.

Comment 6 Hao Chang Yu 2023-11-15 10:52:08 UTC
(In reply to Hao Chang Yu from comment #5)
> (In reply to Adam Ruzicka from comment #4)
> > I see it now. Apparently dynflow needs to know that the queue exists, which
> > the patch you suggested seems to do. The pool_size part is is the bit that
> > gets ignored since the move to sidekiq.
> 
> Great! That will make the fix much easier. We will just need to remove the
> setting table check and pass any values for the pool_size.

Great! Passing without pool size worked.

~~~
ForemanTasks.dynflow.config.queues.add(HOST_TASKS_QUEUE)
~~~

Comment 7 Bryan Kearney 2023-11-22 08:02:42 UTC
Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/36921 has been resolved.

Comment 12 errata-xmlrpc 2024-04-23 17:15:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.15.0 release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:2010