Bug 1717469
Summary: | Introspection failed: Could not establish a connection to the Zaqar websocket | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Filip Hubík <fhubik> | ||||||
Component: | python-tripleoclient | Assignee: | RHOS Maint <rhos-maint> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Sasha Smolyak <ssmolyak> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 14.0 (Rocky) | CC: | apetrich, bdobreli, beth.white, ccamacho, dvd, hbrock, jslagle, jstransk, mburns, mschuppe, nchandek | ||||||
Target Milestone: | --- | Keywords: | Reopened, Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1719265 (view as bug list) | Environment: | |||||||
Last Closed: | 2019-07-17 10:58:20 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1719354 | ||||||||
Bug Blocks: | 1719265 | ||||||||
Attachments: |
|
Description
Filip Hubík
2019-06-05 14:30:33 UTC
That seems odd. Could you verify that there's a OS_CACERT set in the stackrc? something like export OS_CACERT="/etc/pki/ca-trust/source/anchors/cm-local-ca.pem" Do other actions work or is it just the "overcloud node import" that fails? Also could you try restarting the mistral and zaqar processes to cover the low hanging fruit? Thanks to jschlueter's tip, I was able to workaround this specific issue downgrading python-websocket-client-0.32.0-116.el7, though nodes are still not able to reach manageable state so far. Upgrading back to python-websocket-client-0.56.0-1.git3c25814.el7 reproduces the error again. As for OS_CACERT, it is not set, see attached stackrc file. Restart of any mistral or zaqar container doesn't have any effect. As for other commands, not sure what is meant here, but ironic service seems to be responding to "openstack baremetal node xyz" commands in both cases. Created attachment 1577836 [details]
stackrc_osp14_zaqarfail
Created attachment 1577837 [details]
instackenv.json_osp14_zaqarfail
Thank you for the files and comment. Just to make sense. RHEL 7.6 updated python-websocketclient that is not compatible with our deployment as it stands. https://bugzilla.redhat.com/show_bug.cgi?id=1702715#c12 has a workaround for the issue until the problem is fixed. so I'm closing this bug as a duplicate of the main bug. *** This bug has been marked as a duplicate of bug 1702715 *** Note, when I try to: stack@uc $ export OS_CACERT="/etc/pki/ca-trust/source/anchors/cm-local-ca.pem" stack@uc $ openstack overcloud node import --instance-boot-option=local /home/stack/instackenv.json Failed to discover available identity versions when contacting https://192.168.24.2:13000/. Attempting to parse version from URL. Could not find versioned identity endpoints when attempting to authenticate. Please check that your auth_url is correct. SSL exception connecting to https://192.168.24.2:13000/: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:618) what about export WEBSOCKET_CLIENT_CA_BUNDLE="/etc/pki/ca-trust/source/anchors/cm-local-ca.pem" ? Could you try that please? setting a more appropriate duplicate bug *** This bug has been marked as a duplicate of bug 1714205 *** So the workaround works but you have to use the rpm cert. (undercloud) [stack@undercloud-0 ~]$ export WEBSOCKET_CLIENT_CA_BUNDLE=/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem (undercloud) [stack@undercloud-0 ~]$ openstack overcloud node import --instance-boot-option=local /home/stack/instackenv.json Waiting for messages on queue 'tripleo' with no timeout. Yes, the "export WEBSOCKET_CLIENT_CA_BUNDLE" (appended to ~/stackrc file) trick workarounds the issue and deployment passes (IR w/a merged https://review.gerrithub.io/c/redhat-openstack/infrared/+/457097), however it looks like we are still missing related changes: 1) in case this is python-websocket's fault - this BZ should be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1702715 2) in case this is tripleo's fault - I assume we should have https://review.opendev.org/#/c/633024 included (Rocky), but so far I don't see these changes on undercloud node directly, maybe I am looking in wrong place? @Adriano afaik this can not be duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1714205 since that one is targeted against OSP13, for OSP14 fixes and package tracking we need separate BZ (this one) afaik. Considering there is none development in 1) (https://bugzilla.redhat.com/show_bug.cgi?id=1702715) and it is closed_errata + we have significant development around 2) I assume this should be reopened and track required fixes specific to OSP14 only. Hi folks, the final issue with this BZ is tracked in [1], the culprit of this issue was/is with the cert bundle checking, in the new version is dropped the default certs path. The workaround is to use WEBSOCKET_CLIENT_CA_BUNDLE=/etc/pki/tls/certs/ca-bundle.crt with the correct path. There is an async deliver of this fix for several OSP versions tracked same therein [1]. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1719354 |