Bug 2154184

Summary: Disabling "Capsule batch tasks" makes all Ansible role jobs to fail - forever
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: Tasks PluginAssignee: Adam Ruzicka <aruzicka>
Status: CLOSED ERRATA QA Contact: Peter Ondrejka <pondrejk>
Severity: high Docs Contact:
Priority: high    
Version: 6.11.4CC: aruzicka, pcreech, torkil
Target Milestone: 6.13.0Keywords: Triaged
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: puppet-foreman_proxy-24.0.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-03 13:23:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pavel Moravec 2022-12-16 07:48:45 UTC
Description of problem:
Once we disable "Capsule batch tasks", new Ansible role jobs start to fail. Re-enabling the settings does *not* help. One needs to restart dynflow-* services to restore it.


Version-Release number of selected component (if applicable):
Sat 6.11.4


How reproducible:
100%


Steps to Reproduce:
1. Have a job:
Job Category: Ansible Playbook
Job Template: Ansible Roles - Ansible Default
Search Query: (doesnt matter, try it e.g. against 1 host)

2. Run the job to see it works well.
3. Disable the batching: WebUI -> Administer -> Settings -> Foreman Tasks -> "Allow Capsule batch tasks" = No.
4. Rerun the job.
5. Enable the Capsule batching back.
6. Rerun the job.
7. Restart dynflow services:

satellite-maintain service restart --only=dynflow-sidekiq@orchestrator,dynflow-sidekiq@worker-1,dynflow-sidekiq@worker-2,dynflow-sidekiq@worker-hosts-queue-1

8. Rerun the job.


Actual results:
4. fails
6. *also* fails


Expected results:
Both 4 and 6 to succeed.


Additional info:
In case Capsule batching is required for this type of jobs, and step 4 is expected to fail, then:
1) we should have it documented (or ideally Sat should warn/prevent triggering such job?)
2) re-enabling the batching should restore it (i.e. step 6 must work well)

Comment 1 Adam Ruzicka 2023-01-02 15:56:59 UTC
This seems to be fixed on 6.13. I don't have any other Satellite at hand right now, but I'll let you know once I get my hands on some

> Once we disable "Capsule batch tasks"

I'm not disputing in any way that this should work. If there's a tunable, it should work no matter what it is tuned to. But just out of curiosity, what was the intent behind disabling it?

> One needs to restart dynflow-* services to restore it.

That's certainly unexpected.

Comment 2 Adam Ruzicka 2023-01-03 08:59:23 UTC
There are two issues here at play:

1) On 6.11.4, when batch triggering is off it makes smart proxy start
ansible-playbook (instead of ansible-runner), which fails with the following

[DEPRECATION WARNING]: [defaults]callback_whitelist option, normalizing names 
to new standard, use callbacks_enabled instead. This feature will be removed 
from ansible-core in version 2.15. Deprecation warnings can be disabled by 
setting deprecation_warnings=False in ansible.cfg.
ERROR! Invalid callback for stdout specified: yaml
Exit status: 1

Ansible-runner seems to be able to tolerate this stdout_callback setting. On
6.13 we stopped setting it altogether.

Whether this is an issue or not seems to depend on version of ansible, core
2.12.2 seems to be fine, core 2.13.3 breaks.

Temporary workaround would be
sed -i 's/^stdout_callback/# stdout_callback/' ~foreman-proxy/.ansible.cfg
but note the installer will stomp over this when it runs. Considering we already
stopped setting it in 6.13, we should be able to stop setting in all currently
supported releases.

2) Satellite is not able to detect when a setting changes from a non-default to
default value and if such a change happens, it continues using the non-default
one until services are restarted. This affects all releases from 6.14 onwards.

Comment 3 Torkil Svensgaard 2023-01-03 09:05:35 UTC
(In reply to Adam Ruzicka from comment #1)
> > Once we disable "Capsule batch tasks"
> 
> I'm not disputing in any way that this should work. If there's a tunable, it
> should work no matter what it is tuned to. But just out of curiosity, what
> was the intent behind disabling it?

These two:

Bug 2156532 - Misleading job invocation details when running ansible roles in bulk
> https://bugzilla.redhat.com/show_bug.cgi?id=2156532

Bug 2156522 - Misleading job status in the new host UI when running jobs in bulk
> https://bugzilla.redhat.com/show_bug.cgi?id=2156522

Mvh.

Torkil

Comment 4 Adam Ruzicka 2023-01-03 10:30:08 UTC
Created BZ #2157869 for part 2 as described in #2. Let's keep this BZ to only track part 1 as described in #2.

> Misleading job invocation details when running ansible roles in bulk

Yes, setting batch triggering could help with that, alternatively you could leave batch triggering on and set batch size to 1.

> Misleading job status in the new host UI when running jobs in bulk

The batch triggering setting will not have any effect on this.

Comment 6 Adam Ruzicka 2023-01-16 12:38:02 UTC
Part 1 as described in comment #2 should be fixed in foreman_proxy puppet modules 24.0.0. As far as I know it should be already present in a snap.

Comment 7 Peter Ondrejka 2023-01-23 13:39:48 UTC
Verified on Satellite 6.14 using steps from the problem description, Ansible roles now execute correctly with the capsule batch setting set to "no"

Comment 10 errata-xmlrpc 2023-05-03 13:23:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.13 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2097