Bug 1867399

Summary: Receptor-satellite isn't able to deal with jobs where all the hosts are unknown to satellite
Product: Red Hat Satellite Reporter: Jitendra Yejare <jyejare>
Component: RH Cloud - Cloud ConnectorAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: Lukáš Hellebrandt <lhellebr>
Severity: medium Docs Contact:
Priority: high    
Version: 6.8.0CC: aruzicka, pcreech
Target Milestone: 6.8.0Keywords: Regression, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tfm-rubygem-foreman_remote_execution-3.3.6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 13:05:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jitendra Yejare 2020-08-09 16:18:22 UTC
Description of problem:
When the host's list are not known to satellite, the receptor throws an error `node work Error encountered while handling the response, replying with an error message ('hosts')`.

ERROR 2020-08-07 11:17:40,596 node work Error encountered while handling the response, replying with an error message ('hosts')
ERROR 2020-08-07 11:17:40,597 node work ['  File "/usr/lib/python3.6/site-packages/receptor/work.py", line 100, in handle\n    work_exec.result()\n', '  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 425, in result\n    return self.__get_result()\n', '  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result\n    raise self._exception\n', '  File "/usr/lib64/python3.6/concurrent/futures/thread.py", line 56, in run\n    result = self.fn(*self.args, **self.kwargs)\n', '  File "/usr/lib/python3.6/site-packages/receptor_satellite/worker.py", line 303, in execute\n    run(Run.from_raw(queue, payload, satellite_api, logger).start())\n', '  File "/usr/lib/python3.6/site-packages/receptor_satellite/worker.py", line 294, in run\n    return loop.run_until_complete(coroutine)\n', '  File "/usr/lib64/python3.6/asyncio/base_events.py", line 484, in run_until_complete\n    return future.result()\n', '  File "/usr/lib/python3.6/site-packages/receptor_satellite/worker.py", line 226, in start\n    self.update_hosts(response["body"]["targeting"]["hosts"])\n'

Version-Release number of selected component (if applicable):
Satellite 6.8 snap 11

How reproducible:
Always

Steps to Reproduce:
1
2.
3.

Actual results:
Receptor-satellite isn't able to deal with jobs where all the hosts are unknown to satellite

Expected results:
Receptor-satellite shouldn't throw an error message for the hosts that are not known to the satellite.

Additional info:
Note: I used fake receptor and data.json to repro this issue.

Comment 1 Jitendra Yejare 2020-08-09 16:21:54 UTC
To repro, I used https to satellite and the ca_file in fake config.conf

```
url = https://satellite.com:443
ca_file=/etc/pki/tls/certs/ca-bundle.crt
```

Comment 2 Brad Buckingham 2020-08-10 14:05:50 UTC
Is this a regression from 6.7?

Does this error occur only when all hosts are 'unknown' vs a subset of the hosts?

Comment 3 Adam Ruzicka 2020-08-10 14:29:15 UTC
> Is this a regression from 6.7?

Most likely yes. This is most likely changed due to recent changes in REX. The api we use used to return an empty array when there were no hosts, now it skips the entire entry in response, but receptor-satellite expects the key to be there. I expect this to be a oneliner fix

> Does this error occur only when all hosts are 'unknown' vs a subset of the hosts?

Only when all of the hosts are unknown. The only scenario where I can see this happening in the real world is if someone did the whole fifi setup dance, found something to remediate in cloud, deleted the host from satellite and then tried to remediate the thing on the deleted host.

Comment 4 Brad Buckingham 2020-08-10 14:38:53 UTC
Thanks Adam!

Comment 5 Adam Ruzicka 2020-08-11 09:12:54 UTC
Created redmine issue http://projects.theforeman.org/issues/30628 from this bug

Comment 6 Lukáš Hellebrandt 2020-09-15 15:14:05 UTC
Verified with Sat 6.8 snap 15.

Used real-world scenario described in comment 3. A job invocation was created and it's shown as "Succeeded" on 0 systems - which is exactly what happened.

Comment 9 errata-xmlrpc 2020-10-27 13:05:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.8 release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4366

Comment 10 errata-xmlrpc 2020-10-27 13:08:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.8 release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4366