Bug 1902653
Summary: | [BM][IPI] Master deployment failed: No valid host was found. Reason: No conductor service registered which supports driver redfish for conductor group | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Yurii Prokulevych <yprokule> | |
Component: | Bare Metal Hardware Provisioning | Assignee: | Derek Higgins <derekh> | |
Bare Metal Hardware Provisioning sub component: | ironic | QA Contact: | Lubov <lshilin> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | medium | CC: | beth.white, derekh, jhou, kquinn, lshilin, mcornea, rkant, shardy, weinliu, ydalal | |
Version: | 4.7 | Keywords: | Triaged | |
Target Milestone: | --- | Flags: | derekh:
needinfo-
|
|
Target Release: | 4.7.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Previously on some systems, the installer would communicate with ironic before it was ready and fail. This is now prevented.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1917481 (view as bug list) | Environment: | ||
Last Closed: | 2021-02-24 15:36:33 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1917481 |
Description
Yurii Prokulevych
2020-11-30 09:48:19 UTC
Looking at the logs I suspect we may need to make the driver check in the terraform provider (and potentially BMO) more robust: https://github.com/openshift-metal3/terraform-provider-ironic/blob/master/ironic/provider.go#L368 2020-11-30 07:39:23.468 38 DEBUG ironic.common.hash_ring [req-5004b5b9-ce6b-4016-92d7-fbc52ff23766 bootstrap-user - - - -] Finished rebuilding hash rings, available drivers are :fake-hardware, :idrac, :ipmi ring /usr/lib/python3.6/site-packages/ironic/common/hash_ring.py:61 2020-11-30 07:39:23.468 38 DEBUG ironic.api.expose [req-5004b5b9-ce6b-4016-92d7-fbc52ff23766 bootstrap-user - - - -] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver redfish for conductor group "". format_exception /usr/lib/python3.6/site-packages/ironic/api/expose.py:184^[[00m Then later we see irmc show up, but not yet redfish: 2020-11-30 07:39:23.485 39 DEBUG ironic.common.hash_ring [req-39c8210c-df22-4ce4-8087-72626a72ec50 bootstrap-user - - - -] Finished rebuilding hash rings, available drivers are :fake-hardware, :idrac, :ipmi, :irmc ring /usr/lib/python3.6/site-packages/ironic/common/hash_ring.py:61 2020-11-30 07:39:23.504 36 DEBUG ironic.common.hash_ring [req-1611e54a-6576-4b4e-8806-80df0da09e57 bootstrap-user - - - -] Finished rebuilding hash rings, available drivers are :fake-hardware, :idrac, :ipmi, :irmc ring /usr/lib/python3.6/site-packages/ironic/common/hash_ring.py:61 2020-11-30 07:39:23.512 36 DEBUG ironic.common.hash_ring [req-1611e54a-6576-4b4e-8806-80df0da09e57 bootstrap-user - - - -] Finished rebuilding hash rings, available drivers are :fake-hardware, :idrac, :ipmi, :irmc ring /usr/lib/python3.6/site-packages/ironic/common/hash_ring.py:61 In the conductor logs we see: $ grep Loaded ironic-conductor.logs 2020-11-30 07:39:22.211 1 INFO ironic.common.driver_factory [req-375c4856-d9eb-41d8-a340-fff0091e084c - - - - -] Loaded the following hardware types: ['fake-hardware', 'idrac', 'ipmi', 'irmc', 'redfish'] 2020-11-30 07:39:22.213 1 INFO ironic.common.driver_factory [req-375c4856-d9eb-41d8-a340-fff0091e084c - - - - -] Loaded the following network interfaces: ['flat', 'noop'] 2020-11-30 07:39:22.213 1 INFO ironic.common.driver_factory [req-375c4856-d9eb-41d8-a340-fff0091e084c - - - - -] Loaded the following storage interfaces: ['cinder', 'noop'] 2020-11-30 07:39:22.280 1 INFO ironic.common.driver_factory [req-375c4856-d9eb-41d8-a340-fff0091e084c - - - - -] Loaded the following vendor interfaces: ['fake', 'idrac', 'ipmitool', 'no-vendor'] 2020-11-30 07:39:22.287 1 INFO ironic.common.driver_factory [req-375c4856-d9eb-41d8-a340-fff0091e084c - - - - -] Loaded the following management interfaces: ['fake', 'idrac', 'idrac-redfish', 'ipmitool', 'irmc', 'redfish'] 2020-11-30 07:39:22.292 1 INFO ironic.common.driver_factory [req-375c4856-d9eb-41d8-a340-fff0091e084c - - - - -] Loaded the following raid interfaces: ['agent', 'fake', 'idrac-wsman', 'irmc', 'no-raid'] 2020-11-30 07:39:22.296 1 INFO ironic.common.driver_factory [req-375c4856-d9eb-41d8-a340-fff0091e084c - - - - -] Loaded the following bios interfaces: ['idrac-redfish', 'idrac-wsman', 'irmc', 'no-bios', 'redfish'] 2020-11-30 07:39:22.296 1 INFO ironic.common.driver_factory [req-375c4856-d9eb-41d8-a340-fff0091e084c - - - - -] Loaded the following rescue interfaces: ['no-rescue'] 2020-11-30 07:39:22.298 1 INFO ironic.common.driver_factory [req-375c4856-d9eb-41d8-a340-fff0091e084c - - - - -] Loaded the following deploy interfaces: ['direct', 'fake'] 2020-11-30 07:39:22.299 1 INFO ironic.common.driver_factory [req-375c4856-d9eb-41d8-a340-fff0091e084c - - - - -] Loaded the following power interfaces: ['fake', 'idrac', 'idrac-redfish', 'ipmitool', 'irmc', 'redfish'] 2020-11-30 07:39:22.304 1 INFO ironic.common.driver_factory [req-375c4856-d9eb-41d8-a340-fff0091e084c - - - - -] Loaded the following boot interfaces: ['fake', 'idrac-redfish-virtual-media', 'ipxe', 'pxe', 'redfish-virtual-media'] 2020-11-30 07:39:22.304 1 INFO ironic.common.driver_factory [req-375c4856-d9eb-41d8-a340-fff0091e084c - - - - -] Loaded the following console interfaces: ['no-console'] 2020-11-30 07:39:22.306 1 INFO ironic.common.driver_factory [req-375c4856-d9eb-41d8-a340-fff0091e084c - - - - -] Loaded the following inspect interfaces: ['fake', 'idrac', 'inspector', 'irmc', 'redfish'] So I think this is racy behavior, we need to pass the full expected list into the terraform provider, or find some way of having ironic tell us what the expected/configured interfaces are. Ok this was discussed and also note previous discussion ref https://github.com/openshift/installer/issues/2880#issuecomment-572547395 There seem to be two issues: 1. The Ironic code loading the drivers isn't atomic, so we can get different results each API call before we get the final list of available drivers 2. The terraform-provider-ironic (and BMO) code just expects any drivers to be present, so we get a potential false-positive when we get the partially populated driver list from the API. It sounds like there may be fixes possible on the Ironic side, which resolves (1), in which case we may no longer need to fix (2), so moving back to the Ironic component for discussion around that. Please consider that same failure occurs on 4.4, 4.5 as well so the fix will be required on all versions verified on 4.7.0-0.nightly-2021-01-07-034013 Run deployment few times, the issue isn't reproduced Will re-open if happens again (In reply to yigal dalal from comment #5) > Please consider that same failure occurs on 4.4, 4.5 as well so the fix will > be required on all versions I've cloned this to 4.6 here https://bugzilla.redhat.com/show_bug.cgi?id=1917481 Does it happen frequently in 4.5/4.4? iirc it used to just be a problem on systems where the bootstrap VM is hosted in a VM. @derekh Yes it occurred on upgrade from 4.4 with this build: registry.ci.openshift.org/ocp/release:4.4.32 (In reply to yigal dalal from comment #9) > @derekh > Yes it occurred on upgrade from 4.4 with this build: > registry.ci.openshift.org/ocp/release:4.4.32 ok, It's yet to merge into 4.6, if id does I can backport it further. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |