Bug 1965023
| Summary: | Expired manifest leads to 403 during finalizing cloud connector setup | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Paul Dudley <pdudley> |
| Component: | RH Cloud - Cloud Connector | Assignee: | satellite6-bugs <satellite6-bugs> |
| Status: | CLOSED WONTFIX | QA Contact: | Lukáš Hellebrandt <lhellebr> |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.9.0 | CC: | aruzicka, ehelms, pcreech, sshtein |
| Target Milestone: | Unspecified | Keywords: | Triaged |
| Target Release: | Unused | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | foreman_rh_cloud_5.0.32 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-10-28 18:04:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
As a workaround, you should be able to set skip_satellite_org_id_list parameter on the satellite host to a list of organization ids which have expired manifest to skip them. Thanks for the idea Adam, I'll include that as a possible workaround for the issue in a kcs.
I noticed as well that even though a subsequent run after the Org 2 manifest has been corrected receptor does not appear to be updated. Running the cloud connector playbook again will correct the certs in /etc/receptor/rh_accountnumber/ but the service still shows failures:
~~~
● receptor - Receptor Node for rh_accountnumber
Loaded: loaded (/etc/systemd/system/receptor@.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2021-04-05 06:11:07 EDT; 1 months 21 days ago
Main PID: 1287 (receptor)
CGroup: /system.slice/system-receptor.slice/receptor
└─1287 /usr/bin/python3 /usr/bin/receptor -c /etc/receptor/rh_accountnumber/receptor.conf -d /var/data/receptor/rh_accountnumber node
May 27 11:19:52 hostname.example.com receptor[1287]: aiohttp.client_exceptions.WSServerHandshakeError: 403, message='Invalid response status', url=URL('wss://cert.cloud.redhat.com/wss/
receptor-controller/gateway')
May 27 11:19:57 hostname.example.com receptor[1287]: ERROR 2021-05-27 11:19:57,122 aa4c5445-c8a0-4170-874b-45ed889d7a40 ws ws.connect
May 27 11:19:57 hostname.example.com receptor[1287]: Traceback (most recent call last):
May 27 11:19:57 hostname.example.com receptor[1287]: File "/usr/lib/python3.6/site-packages/receptor/connection/ws.py", line 53, in connect
May 27 11:19:57 hostname.example.com receptor[1287]: proxy=proxy, proxy_auth=proxy_auth
May 27 11:19:57 hostname.example.com receptor[1287]: File "/usr/lib64/python3.6/site-packages/aiohttp/client.py", line 1012, in __aenter__
May 27 11:19:57 hostname.example.com receptor[1287]: self._resp = await self._coro
May 27 11:19:57 hostname.example.com receptor[1287]: File "/usr/lib64/python3.6/site-packages/aiohttp/client.py", line 738, in _ws_connect
May 27 11:19:57 hostname.example.com receptor[1287]: headers=resp.headers)
May 27 11:19:57 hostname.example.com receptor[1287]: aiohttp.client_exceptions.WSServerHandshakeError: 403, message='Invalid response status', url=URL('wss://cert.cloud.redhat.com/wss/receptor-controller/gateway')
~~~
First I tried removing all content in /var/data/receptor and then in /etc/receptor, but the service still fails after these. Disabling the service with `systemctl disable --now receptor` and running the playbook after now allows the service to run without error:
~~~
● receptor - Receptor Node for rh_accountnumber
Loaded: loaded (/etc/systemd/system/receptor@.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2021-05-27 11:23:20 EDT; 26s ago
Main PID: 12049 (receptor)
CGroup: /system.slice/system-receptor.slice/receptor
└─12049 /usr/bin/python3 /usr/bin/receptor -c /etc/receptor/rh_accountnumber/receptor.conf -d /var/data/receptor/rh_accountnumber node
May 27 11:23:20 hostname.example.com systemd[1]: Started Receptor Node for rh_accountnumber.
~~~
Please let me know if these behaviors require a different BZ to look into, or if they are covered under the workings of this one.
Thanks!
Please file a new one please. Currently receptor just dumps the certs where they should be and ensure the service is running. From systemd's point of view, the service is running even if the certs are invalid, so subsequent runs correct the certs, but the service never doesn't get restarted. Created bz 1986467 regarding receptor restart. I think this shouldn't be ON_QA. Where is it fixed? RHC is going to replace Receptor but it hasn't happened yet in snap 15.0. Is the issue fixed in RHC? If yes, RHC is not in snap yet => No reason for ON_QA. Is the issue fixed in Receptor? If yes, it doesn't matter for 6.11 since Receptor is going to be dropped => No reason for ON_QA. Failing this on snap 15.0 since there's not even anything to verify yet. I believe the simplified flow through the RH cloud plugin now has a pre-flight check which should be able to discover this issue in advance, but I'll defer to Shim for details. Upon review of our valid but aging backlog the Satellite Team has concluded that this Bugzilla does not meet the criteria for a resolution in the near term, and are planning to close in a month. This message may be a repeat of a previous update and the bug is again being considered to be closed. If you have any concerns about this, please contact your Red Hat Account team. Thank you. Upon review of our valid but aging backlog the Satellite Team has concluded that this Bugzilla does not meet the criteria for a resolution in the near term, and are planning to close in a month. This message may be a repeat of a previous update and the bug is again being considered to be closed. If you have any concerns about this, please contact your Red Hat Account team. Thank you. Thank you for your interest in Red Hat Satellite. We have evaluated this request, and while we recognize that it is a valid request, we do not expect this to be implemented in the product in the foreseeable future. This is due to other priorities for the product, and not a reflection on the request itself. We are therefore closing this out as WONTFIX. If you have any concerns about this feel free to contact your Red Hat Account Team. Thank you. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |
Cloud connector job fails with the following error: ~~~ TASK [project-receptor.satellite_receptor_installer : Identify Satellite source type] *** fatal: [hostname.example.com]: FAILED! => {"changed": false, "connection": "close", "content": "<HTML><HEAD><TITLE>Error</TITLE></HEAD><BODY>\nAn error occurred while processing your request.<p>\nReference #52.af5dda17.1621963418.6b0fa56a\n</BODY></HTML>\n", "content_length": "176", "content_type": "text/html", "date": "Tue, 25 May 2021 17:23:38 GMT", "elapsed": 0, "expires": "Tue, 25 May 2021 17:23:38 GMT", "mime_version": "1.0", "msg": "Status code was 403 and not [200]: HTTP Error 403: Forbidden", "redirected": false, "server": "AkamaiGHost", "status": 403, "url": "https://cert.cloud.redhat.com/api/sources/v2.0/source_types?name=satellite"} ~~~ Version-Release number of selected component (if applicable): Satellite 6.9 Steps to Reproduce: 1. Create two organizations 2. Upload non-expired manifest to Org 1 3. Upload expired manifest to Org 2 4. Run cloud connector job from Org 1 Actual results: Cloud connector fails completely Expected results: Since Org 1 has a working valid manifest and Org 2 does not, it would be ideal if Org 1 were separated completely from Org 2 and they would succeed or fail independently from one another. Additional info: This error is due to the nature of the receptor setup. A cert and key is created for each Organization upload based on account rather than Satellite Org, so it is possible to have one `/etc/receptor` location for two Orgs to upload from. If the Org lower on the chain in processing (Org 2 for example) had an expired manifest that cert and key will be used and cloud connector will fail as seen above.