Bug 1906448 - Deploy using virtualmedia with provisioning network disabled fails - 'Failed to connect to the agent' in ironic-conductor log
Summary: Deploy using virtualmedia with provisioning network disabled fails - 'Failed ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Dmitry Tantsur
QA Contact: Lubov
URL:
Whiteboard:
Depends On:
Blocks: 1975531
TreeView+ depends on / blocked
 
Reported: 2020-12-10 14:32 UTC by Lubov
Modified: 2021-11-02 05:31 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Prevent the bare metal provisioning from failing if there is a small (up to 1 hour) clock skew between the control plane and a host being deployed.
Clone Of:
: 1975531 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:41:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ironic-conductor.log from bootstrap (2.63 MB, text/plain)
2020-12-10 14:32 UTC, Lubov
no flags Details
openshift_install.log (210.05 KB, text/plain)
2020-12-10 14:33 UTC, Lubov
no flags Details
ironic-api log (2.30 MB, application/gzip)
2020-12-10 14:58 UTC, Lubov
no flags Details
ironic-inspector.log (277.46 KB, text/plain)
2020-12-10 14:59 UTC, Lubov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 766498 0 None MERGED Generate TLS certificates with validity time in the past 2021-02-18 12:33:28 UTC
Red Hat Knowledge Base (Solution) 6137612 0 None None None 2021-06-24 08:39:36 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:42:14 UTC

Description Lubov 2020-12-10 14:32:07 UTC
Created attachment 1738220 [details]
ironic-conductor.log from bootstrap

Version:
./openshift-baremetal-install version
./openshift-baremetal-install 4.7.0-0.nightly-2020-12-09-112139
built from commit 35d7aa255a6a849aab00d60b8c406a06d25c495c
release image registry.svc.ci.openshift.org/ocp/release@sha256:235c68dd2e120be1eb65ddeb747e0a2cd241de5405b55797576e0393e618e00e

Platform:
IPI Barmetal

What happened?
Deploy using redfish-virtualmedia with provisionNetwork disabled failed on virt emulation of IPI BM OCP

On bootstrap in ironic-conductor log (attached) reported many errors like:
ERROR ironic.drivers.modules.agent_client [-] Failed to connect to the agent running on node d7c322f0-0354-4008-92b4-f49fb2201001 for invoking command clean.get_clean_steps. Error: HTTPSConnectionPool(host='192.168.123.126', port=9999): Max retries exceeded with url: /v1/commands/?wait=true&agent_token=gU8ziSSacl_G14jmnW3zOxcRQ_gmt0M9Ue-3gWTiWfo (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)): requests.exceptions.SSLError: HTTPSConnectionPool(host='192.168.123.126', port=9999): Max retries exceeded with url: /v1/commands/?wait=true&agent_token=gU8ziSSacl_G14jmnW3zOxcRQ_gmt0M9Ue-3gWTiWfo (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),))


What did you expect to happen?
Deploy succeed

How to reproduce it (as minimally and precisely as possible)?
Run OCP deploy on disconnected env using redfish-virtualmedia and provisionNetwork disabled

Anything else we need to know?

#Enter text here.

Comment 1 Lubov 2020-12-10 14:33:05 UTC
Created attachment 1738222 [details]
openshift_install.log

Comment 2 Lubov 2020-12-10 14:58:35 UTC
Created attachment 1738225 [details]
ironic-api log

Comment 3 Lubov 2020-12-10 14:59:43 UTC
Created attachment 1738226 [details]
ironic-inspector.log

Comment 4 Dmitry Tantsur 2020-12-10 15:18:36 UTC
You may have a clock synchronization issue. Your certificate has: Not Before: Dec 10 12:11:08 2020 GMT. The request according to the conductor logs happens at 2020-12-10 12:11:07.314. I wonder if we should allow some discrepancy until we get proper NTP support..

Comment 6 Lubov 2020-12-14 14:03:33 UTC
After NTP adjustment on both hypervisor provisionhost deploy on masters passed. Lowering the bz priority to allow Dmitry's fix to enter

Comment 7 Dmitry Tantsur 2021-02-03 13:38:29 UTC
Should be available in 4.7 already. Note that the implementation only allows a clock skew of 1 hour (I'm pretty sure you'll have other issues if you have a larger clock skew).

Comment 9 Lubov 2021-02-04 12:03:00 UTC
verified on 4.7.0-0.nightly-2021-02-04-054537

Comment 12 errata-xmlrpc 2021-02-24 15:41:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.