Bug 1930641

Summary: when I execute remediation from cloud.redhat.com on 1 host, it executes on Sat but on c.r.c it says "running" even after 10 minues
Product: Red Hat Satellite Reporter: Jan Hutař <jhutar>
Component: RH Cloud - Cloud ConnectorAssignee: Adam Ruzicka <aruzicka>
Status: CLOSED ERRATA QA Contact: Lukáš Hellebrandt <lhellebr>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.9.0CC: aruzicka, pcreech, zhunting
Target Milestone: 6.9.0Keywords: Regression, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python3-receptor-satellite-1.3.2 tfm-rubygem-foreman_remote_execution-4.2.3-1,foreman-2.3.1.19-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-21 13:11:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1884237    

Comment 2 Lukáš Hellebrandt 2021-02-22 15:27:52 UTC
I've just reproduced on Sat 6.9 snap 13 with receptor-0.6.3-1.el7ar.noarch.

Comment 3 Lukáš Hellebrandt 2021-02-22 16:24:14 UTC
==> /var/log/messages <==
Feb 22 11:18:25 <shortname> smart-proxy: <IP> - - [22/Feb/2021:11:18:25 EST] "GET /dynflow/tasks/79bc87b5-d071-4ae6-b6a2-cbb880801601/status HTTP/1.1" 200 12074
Feb 22 11:18:25 <shortname> smart-proxy: - -> /dynflow/tasks/79bc87b5-d071-4ae6-b6a2-cbb880801601/status
Feb 22 11:18:25 <shortname> receptor: ERROR 2021-02-22 11:18:25,427 ed608fd7-5bee-4700-a765-3df864502d81 work Error encountered while handling the response, replying with an error message ('id')
Feb 22 11:18:25 <shortname> receptor: ERROR 2021-02-22 11:18:25,428 ed608fd7-5bee-4700-a765-3df864502d81 work ['  File "/usr/lib/python3.6/site-packages/receptor/work.py", line 100, in handle\n    work_exec.result()\n', '  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 425, in result\n    return self.__get_result()\n', '  File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result\n    raise self._exception\n', '  File "/usr/lib64/python3.6/concurrent/futures/thread.py", line 56, in run\n    result = self.fn(*self.args, **self.kwargs)\n', '  File "/usr/lib/python3.6/site-packages/receptor_satellite/worker.py", line 61, in execute\n    run(Run.from_raw(queue, payload, satellite_api, logger).run())\n', '  File "/usr/lib/python3.6/site-packages/receptor_satellite/worker.py", line 52, in run\n    return loop.run_until_complete(coroutine)\n', '  File "/usr/lib64/python3.6/asyncio/base_events.py", line 484, in run_until_complete\n    return future.result()\n', '  File "/usr/lib/python3.6/site-packages/receptor_satellite/run.py", line 82, in run\n    if await self.polling_loop():\n', '  File "/usr/lib/python3.6/site-packages/receptor_satellite/run.py", line 105, in polling_loop\n    host = self.running[host_output["id"]]\n']

Comment 5 Patrick Creech 2021-03-10 16:36:17 UTC
Adam,  I'm not seeing any artifacts for python-recepter-satellite 1.3.2 upstream.  Looks like the 1.3.2 tag isn't in the repo.  Can we get a release for it?

Comment 6 Adam Ruzicka 2021-03-10 17:02:41 UTC
Forgot to push, it is out now

Comment 7 Lukáš Hellebrandt 2021-03-15 15:26:07 UTC
FailedQA using Sat 6.9 snap 17.

When running a remediation, regardless of job invocation result, after ~ a minute CRC shows the status of Failed and the following is shown on the receptor machine:

```
# journalctl -fu receptor@*
[...]
Mar 15 11:11:58 <FQDN> receptor[307]: ERROR 2021-03-15 11:11:58,866 <uuid1> run Playbook run <uuid2> encountered error 'Field 'id' not recognized for searching!', aborting.
# grep 'scoped_search :on => :id' ~foreman/app/models/concerns/hostext/search.rb; echo $?
1
```

Comment 8 Adam Ruzicka 2021-03-15 15:27:50 UTC
This seems to be missing a cherrypick of https://projects.theforeman.org/issues/31931

Comment 10 Lukáš Hellebrandt 2021-03-19 14:06:40 UTC
Verified with Sat 6.9 snap 18.

Tried:
1) passed remediation
2) failed remediation due to broken connection between receptor and satellite
3) failed remediation due to issues on the remediated host
4) passed remediation after that remediation failed previously (i.e. issue got fixed in the meantime)

In all cases, status correctly transitions first to Pending (even in case 4) and the to either Passed or Failed.

Comment 13 errata-xmlrpc 2021-04-21 13:11:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.9 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1313