Bug 2124051

Summary: Ansible-type REX jobs are still delegated by satellite 6.12 to be executed via an external Capsule 6.12 even if the ansible feature is not enabled on the same
Product: Red Hat Satellite Reporter: Sayan Das <saydas>
Component: Remote ExecutionAssignee: Adam Ruzicka <aruzicka>
Status: CLOSED ERRATA QA Contact: Peter Ondrejka <pondrejk>
Severity: high Docs Contact:
Priority: medium    
Version: 6.12.0CC: aruzicka, pcreech
Target Milestone: 6.12.0Keywords: Regression, Triaged
Target Release: Unused   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: rubygem-foreman_remote_execution-7.2.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-16 13:35:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sayan Das 2022-09-04 09:40:38 UTC
Description of problem:

A newly installed capsule 6.12 will only have scripts feature enabled but not Ansible. 

So running an Ansible based job on a host via the same Capsule fails as expected but The error messages are not understandable from a user's poin of view.

It should probably say something like, "Cannot execute the job as the required feature is missing" or something similar.


Version-Release number of selected component (if applicable):

Satellite Capsule 6.12 ( Probably Capsule 6.11 as well )


How reproducible:

Always


Steps to Reproduce:

1. Install a Satellite and an external capsule 6.12

2. Note that by default the external capsule does not have Ansible feature enabled.

3. Sync some data in satellite and capsule.

4. Register a system with capsule via global registration method

5. Turn on the "Prefer registered through Capsule for remote execution" setting in Administer --> Settings --> Content tab of Satellite/

6. Run an Ansible-based job on the target server (ensuring that REX smart-proxy is selected as capsule) and let the job fail.



Actual results:

If i see the Job result , i will be able to see these lines printed:

############
 1:
Initialization error: RestClient::NotFound - 404 Not Found
   2:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
   3:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
   4:
Initialization error: RestClient::NotFound - 404 Not Found
   5:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
   6:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
   7:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
   8:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
   9:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
  10:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
  11:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
  12:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
  13:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
  14:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
  15:
Initialization error: RestClient::NotFound - 404 Not Found
  16:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
  17:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
  18:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
  19:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
  20:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
  21:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
  22:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
  23:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
  24:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
  25:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
  26:
Error loading data from Capsule: RestClient::NotFound - 404 Not Found
  27:
Initialization error: RestClient::NotFound - 404 Not Found
  28:
Error loading data from Capsule: NoMethodError - undefined method `code' for "404 Not Found":String
  29:
Did you mean?  encode
  30:
Error loading data from Capsule: NoMethodError - undefined method `code' for "404 Not Found":String
  31:
Did you mean?  encode
############


They are neither meaningful nor exactly reflecting that it has something to do with the Ansible feature missing. 


Expected results:

Rather than showing a series of *odd* error messages, Show something like :

*The job cannot be executed as the required feature is missing on the target capsule capsule.example.com"


Additional info:

NA

Comment 1 Sayan Das 2022-09-04 09:42:24 UTC
I see a 6.11 bug was filed i.e. https://bugzilla.redhat.com/show_bug.cgi?id=2111701 but The request there is different and the root cause is unidentified.

My bug has a clear root cause and is more of an RFE to improve the error handling.

Comment 2 Adam Ruzicka 2022-09-12 09:06:29 UTC
> *The job cannot be executed as the required feature is missing on the target capsule capsule.example.com"

We cannot really say that, because we do not know that.

From where I'm standing it looks like we have (at least) 3 different BZs about the same issue - this one, https://bugzilla.redhat.com/show_bug.cgi?id=2111701 and https://bugzilla.redhat.com/show_bug.cgi?id=2106700 .

What all three have in common is that a subset of a job gets routed to a capsule, which for some reason cannot process it an immediately gives back 404. Previously, we failed to propagate this error properly so Satellite thought the parts of the job were still running on the capsules and when you looked at the live output, you got a 404 again, but it did not kill the job.

Now that https://bugzilla.redhat.com/show_bug.cgi?id=2106700 is fixed, the parts of the job for which capsule gives 404 fail straight away. If the ask here is just to improve the messaging, then I'd say the fix done in BZ #2106700 is as good as it can be.

However, if the "Prefer registered through Capsule for remote execution" setting makes the job to be routed to that capsule *always*, even if it doesn't have the necessary feature then that is a thing we should address. Is there a setup available where I could take a look at this?

Comment 3 Sayan Das 2022-09-12 09:24:35 UTC
Sure. Let me share the details in the next private comment.

Comment 5 Peter Ondrejka 2022-10-04 19:40:18 UTC
Verified on Satellite 6.12 snap 13.

Using the scenario described in the problem description, the ansible rex job is now executed through internal capsule that has the Ansible feature on, even though the host is registered to the other caps and the "Prefer registered through Capsule..." setting is on.

Comment 9 errata-xmlrpc 2022-11-16 13:35:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.12 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8506