Bug 1867399 - Receptor-satellite isn't able to deal with jobs where all the hosts are unknown to satellite
Summary: Receptor-satellite isn't able to deal with jobs where all the hosts are unkno...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: RH Cloud - Cloud Connector
Version: 6.8.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: 6.8.0
Assignee: satellite6-bugs
QA Contact: Lukáš Hellebrandt
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-09 16:18 UTC by Jitendra Yejare
Modified: 2020-10-27 13:08 UTC (History)
2 users (show)

Fixed In Version: tfm-rubygem-foreman_remote_execution-3.3.6
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 13:05:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Foreman Issue Tracker 30628 0 High Closed Receptor-satellite isn't able to deal with jobs where all the hosts are unknown to satellite 2020-12-02 18:01:28 UTC
Red Hat Product Errata RHSA-2020:4366 0 None None None 2020-10-27 13:08:21 UTC

Description Jitendra Yejare 2020-08-09 16:18:22 UTC
Description of problem:
When the host's list are not known to satellite, the receptor throws an error `node work Error encountered while handling the response, replying with an error message ('hosts')`.

ERROR 2020-08-07 11:17:40,596 node work Error encountered while handling the response, replying with an error message ('hosts')
ERROR 2020-08-07 11:17:40,597 node work ['  File "/usr/lib/python3.6/site-packages/receptor/work.py", line 100, in handle\n    work_exec.result()\n', '  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 425, in result\n    return self.__get_result()\n', '  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result\n    raise self._exception\n', '  File "/usr/lib64/python3.6/concurrent/futures/thread.py", line 56, in run\n    result = self.fn(*self.args, **self.kwargs)\n', '  File "/usr/lib/python3.6/site-packages/receptor_satellite/worker.py", line 303, in execute\n    run(Run.from_raw(queue, payload, satellite_api, logger).start())\n', '  File "/usr/lib/python3.6/site-packages/receptor_satellite/worker.py", line 294, in run\n    return loop.run_until_complete(coroutine)\n', '  File "/usr/lib64/python3.6/asyncio/base_events.py", line 484, in run_until_complete\n    return future.result()\n', '  File "/usr/lib/python3.6/site-packages/receptor_satellite/worker.py", line 226, in start\n    self.update_hosts(response["body"]["targeting"]["hosts"])\n'

Version-Release number of selected component (if applicable):
Satellite 6.8 snap 11

How reproducible:
Always

Steps to Reproduce:
1
2.
3.

Actual results:
Receptor-satellite isn't able to deal with jobs where all the hosts are unknown to satellite

Expected results:
Receptor-satellite shouldn't throw an error message for the hosts that are not known to the satellite.

Additional info:
Note: I used fake receptor and data.json to repro this issue.

Comment 1 Jitendra Yejare 2020-08-09 16:21:54 UTC
To repro, I used https to satellite and the ca_file in fake config.conf

```
url = https://satellite.com:443
ca_file=/etc/pki/tls/certs/ca-bundle.crt
```

Comment 2 Brad Buckingham 2020-08-10 14:05:50 UTC
Is this a regression from 6.7?

Does this error occur only when all hosts are 'unknown' vs a subset of the hosts?

Comment 3 Adam Ruzicka 2020-08-10 14:29:15 UTC
> Is this a regression from 6.7?

Most likely yes. This is most likely changed due to recent changes in REX. The api we use used to return an empty array when there were no hosts, now it skips the entire entry in response, but receptor-satellite expects the key to be there. I expect this to be a oneliner fix

> Does this error occur only when all hosts are 'unknown' vs a subset of the hosts?

Only when all of the hosts are unknown. The only scenario where I can see this happening in the real world is if someone did the whole fifi setup dance, found something to remediate in cloud, deleted the host from satellite and then tried to remediate the thing on the deleted host.

Comment 4 Brad Buckingham 2020-08-10 14:38:53 UTC
Thanks Adam!

Comment 5 Adam Ruzicka 2020-08-11 09:12:54 UTC
Created redmine issue http://projects.theforeman.org/issues/30628 from this bug

Comment 6 Lukáš Hellebrandt 2020-09-15 15:14:05 UTC
Verified with Sat 6.8 snap 15.

Used real-world scenario described in comment 3. A job invocation was created and it's shown as "Succeeded" on 0 systems - which is exactly what happened.

Comment 9 errata-xmlrpc 2020-10-27 13:05:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.8 release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4366

Comment 10 errata-xmlrpc 2020-10-27 13:08:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Satellite 6.8 release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4366


Note You need to log in before you can comment on or make changes to this bug.