Bug 1890256
| Summary: | Replacing a master node on a baremetal IPI deployment gets stuck when deleting the machine of the unhealthy member | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Marius Cornea <mcornea> |
| Component: | Bare Metal Hardware Provisioning | Assignee: | Zane Bitter <zbitter> |
| Bare Metal Hardware Provisioning sub component: | baremetal-operator | QA Contact: | Lubov <lshilin> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | high | CC: | beth.white, dhellmann, kiran, lshilin, stbenjam, yhe, zbitter |
| Version: | 4.6 | Keywords: | Triaged |
| Target Milestone: | --- | Flags: | lshilin:
needinfo-
|
| Target Release: | 4.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-24 15:27:22 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Marius Cornea
2020-10-21 18:37:56 UTC
Even after annotating the machine with: oc -n openshift-machine-api annotate machine ocp-edge-cluster-0-gbw9p-master-0 machine.openshift.io/exclude-node-draining= deleting the machine still fails with: 2020/10/21 18:57:03 Deleting machine ocp-edge-cluster-0-gbw9p-master-0 . 2020/10/21 18:57:03 deleting machine ocp-edge-cluster-0-gbw9p-master-0 using host openshift-master-0-0 2020/10/21 18:57:03 waiting for host openshift-master-0-0 to be deprovisioned E1021 18:57:03.945743 1 controller.go:231] ocp-edge-cluster-0-gbw9p-master-0: failed to delete machine: requeue in: 30s I1021 18:57:03.945759 1 controller.go:406] Actuator returned requeue-after error: requeue in: 30s In the baremetal-operator log we can see:
{"level":"info","ts":1603306563.9395769,"logger":"baremetalhost","msg":"Reconciling BareMetalHost","Request.Namespace":"openshift-machine-api","Request.Name":"openshift-master-0-0"}
{"level":"info","ts":1603306563.9396687,"logger":"baremetalhost_ironic","msg":"verifying ironic provisioner dependencies","host":"openshift-master-0-0"}
{"level":"info","ts":1603306564.0244021,"logger":"baremetalhost_ironic","msg":"found existing node by ID","host":"openshift-master-0-0"}
{"level":"error","ts":1603306564.0245044,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"metal3-baremetalhost-controller","request":"openshift-machine-api/openshift-master-0-0","error":"action \"externally provisioned\" failed: Invalid state for adopt: enroll","errorVerbose":"Invalid state for adopt: enroll\naction \"externally provisioned\" failed\ngithub.com/metal3-io/baremetal-operator/pkg/controller/baremetalhost.(*ReconcileBareMetalHost).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/pkg/controller/baremetalhost/baremetalhost_controller.go:307\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:256\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1374","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
After manually deleting the openshift-master-0-0 BMH the machine was finally deleted: oc -n openshift-machine-api delete bmh openshift-master-0-0 Removing TestBlocker since I was able to move forward after annotating the machine with exclude-node-draining and removing the BMH. The logs show all of the Hosts being registered into an empty Ironic right at the beginning. I'm inferring that the reason for this is that the metal3 pod was running on master-0 before it was powered down, and was rescheduled on another node afterwards. The root cause of the problem is that Ironic is not able to connect to the BMC of the master-0 machine right from the beginning: 020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager [req-35cd7821-0765-466e-b767-5ce56b8bc603 ironic-user - - - -] Failed to get power state for node 1c121c00-74dc-4b72-b8c5-5 6a14bd06bae. Error: HTTP GET https://192.168.123.1:8000/redfish/v1/Systems/7691867f-577d-4134-be10-87054201595f returned code 500. Base.1.0.GeneralError: Domain not found: no domain with matching uuid '7691867f- 577d-4134-be10-87054201595f' (master-0-0): sushy.exceptions.ServerSideError: HTTP GET https://192.168.123.1:8000/redfish/v1/Systems/7691867f-577d-4134-be10-87054201595f returned code 500. Base.1.0.GeneralError: Domain not found: no domain with matching uuid '7691867f-577d-4134-be10-87054201595f' (master-0-0) 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager Traceback (most recent call last): 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager File "/usr/lib/python3.6/site-packages/ironic/conductor/manager.py", line 1240, in _do_node_verify 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager power_state = task.driver.power.get_power_state(task) 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/redfish/power.py", line 112, in get_power_state 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager system = redfish_utils.get_system(task.node) 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/redfish/utils.py", line 289, in get_system 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager return _get_system() 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager File "/usr/lib/python3.6/site-packages/retrying.py", line 68, in wrapped_f 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager return Retrying(*dargs, **dkw).call(f, *args, **kw) 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager File "/usr/lib/python3.6/site-packages/retrying.py", line 223, in call 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager return attempt.get(self._wrap_exception) 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager File "/usr/lib/python3.6/site-packages/retrying.py", line 261, in get 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager six.reraise(self.value[0], self.value[1], self.value[2]) 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager raise value 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager File "/usr/lib/python3.6/site-packages/retrying.py", line 217, in call 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager attempt = Attempt(fn(*args, **kwargs), attempt_number, False) 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/redfish/utils.py", line 266, in _get_system 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager return conn.get_system(system_id) 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager File "/usr/lib/python3.6/site-packages/sushy/main.py", line 251, in get_system 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager registries=self.lazy_registries) 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager File "/usr/lib/python3.6/site-packages/sushy/resources/system/system.py", line 151, in __init__ 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager connector, identity, redfish_version, registries) 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager File "/usr/lib/python3.6/site-packages/sushy/resources/base.py", line 437, in __init__ 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager self.refresh() 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager File "/usr/lib/python3.6/site-packages/sushy/resources/base.py", line 472, in refresh 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager self._json = self._reader.get_json() 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager File "/usr/lib/python3.6/site-packages/sushy/resources/base.py", line 339, in get_json 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager data = self._conn.get(path=self._path) 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager File "/usr/lib/python3.6/site-packages/sushy/connector.py", line 178, in get 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager **extra_session_req_kwargs) 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager File "/usr/lib/python3.6/site-packages/sushy/connector.py", line 118, in _op 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager exceptions.raise_for_response(method, url, response) 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager File "/usr/lib/python3.6/site-packages/sushy/exceptions.py", line 161, in raise_for_response 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager raise ServerSideError(method, url, response) 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager sushy.exceptions.ServerSideError: HTTP GET https://192.168.123.1:8000/redfish/v1/Systems/7691867f-577d-4134-be10-8705420159 5f returned code 500. Base.1.0.GeneralError: Domain not found: no domain with matching uuid '7691867f-577d-4134-be10-87054201595f' (master-0-0) 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR ironic.conductor.manager 2020-10-21T18:13:58.484195339Z 2020-10-21 18:13:58.483 1 DEBUG ironic.common.states [req-35cd7821-0765-466e-b767-5ce56b8bc603 ironic-user - - - -] Exiting old state 'verifying' in response to event 'fail' on_exi t /usr/lib/python3.6/site-packages/ironic/common/states.py:295 2020-10-21T18:13:58.484498112Z 2020-10-21 18:13:58.484 1 DEBUG ironic.common.states [req-35cd7821-0765-466e-b767-5ce56b8bc603 ironic-user - - - -] Entering new state 'enroll' in response to event 'fail' on_enter /usr/lib/python3.6/site-packages/ironic/common/states.py:301 How did you power off the Host? I can't help but imagine that this is related. In any event, the baremetal-operator does not handle this well - it decides that registration is complete (since the Node exists) even though it has failed, and proceeds to trying to Adopt the node - a prerequisite for which is that registration has succeeded. It never gets past this and continues trying and failing to adopt forever. Once we get past Node draining, the Machine will not allow itself to be deleted until it has powered the Host down. In this case Ironic cannot read the power state, and in fact the BMO is not able to even run the code to manage it anyway, so it will never complete. (In reply to Zane Bitter from comment #6) > The logs show all of the Hosts being registered into an empty Ironic right > at the beginning. I'm inferring that the reason for this is that the metal3 > pod was running on master-0 before it was powered down, and was rescheduled > on another node afterwards. > The root cause of the problem is that Ironic is not able to connect to the > BMC of the master-0 machine right from the beginning: > > 020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager [req-35cd7821-0765-466e-b767-5ce56b8bc603 > ironic-user - - - -] Failed to get power state for node > 1c121c00-74dc-4b72-b8c5-5 > 6a14bd06bae. Error: HTTP GET > https://192.168.123.1:8000/redfish/v1/Systems/7691867f-577d-4134-be10- > 87054201595f returned code 500. Base.1.0.GeneralError: Domain not found: no > domain with matching uuid '7691867f- > 577d-4134-be10-87054201595f' (master-0-0): sushy.exceptions.ServerSideError: > HTTP GET > https://192.168.123.1:8000/redfish/v1/Systems/7691867f-577d-4134-be10- > 87054201595f returned code 500. Base.1.0.GeneralError: > Domain not found: no domain with matching uuid > '7691867f-577d-4134-be10-87054201595f' (master-0-0) > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager Traceback (most recent call last): > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager File > "/usr/lib/python3.6/site-packages/ironic/conductor/manager.py", line 1240, > in _do_node_verify > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager power_state = > task.driver.power.get_power_state(task) > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager File > "/usr/lib/python3.6/site-packages/ironic/drivers/modules/redfish/power.py", > line 112, in get_power_state > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager system = redfish_utils.get_system(task.node) > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager File > "/usr/lib/python3.6/site-packages/ironic/drivers/modules/redfish/utils.py", > line 289, in get_system > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager return _get_system() > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager File > "/usr/lib/python3.6/site-packages/retrying.py", line 68, in wrapped_f > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager return Retrying(*dargs, **dkw).call(f, *args, > **kw) > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager File > "/usr/lib/python3.6/site-packages/retrying.py", line 223, in call > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager return attempt.get(self._wrap_exception) > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager File > "/usr/lib/python3.6/site-packages/retrying.py", line 261, in get > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager six.reraise(self.value[0], self.value[1], > self.value[2]) > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager File "/usr/lib/python3.6/site-packages/six.py", > line 675, in reraise > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager raise value > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager File > "/usr/lib/python3.6/site-packages/retrying.py", line 217, in call > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager attempt = Attempt(fn(*args, **kwargs), > attempt_number, False) > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager File > "/usr/lib/python3.6/site-packages/ironic/drivers/modules/redfish/utils.py", > line 266, in _get_system > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager return conn.get_system(system_id) > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager File > "/usr/lib/python3.6/site-packages/sushy/main.py", line 251, in get_system > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager registries=self.lazy_registries) > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager File > "/usr/lib/python3.6/site-packages/sushy/resources/system/system.py", line > 151, in __init__ > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager connector, identity, redfish_version, > registries) > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager File > "/usr/lib/python3.6/site-packages/sushy/resources/base.py", line 437, in > __init__ > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager self.refresh() > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager File > "/usr/lib/python3.6/site-packages/sushy/resources/base.py", line 472, in > refresh > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager self._json = self._reader.get_json() > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager File > "/usr/lib/python3.6/site-packages/sushy/resources/base.py", line 339, in > get_json > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager data = self._conn.get(path=self._path) > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager File > "/usr/lib/python3.6/site-packages/sushy/connector.py", line 178, in get > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager **extra_session_req_kwargs) > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager File > "/usr/lib/python3.6/site-packages/sushy/connector.py", line 118, in _op > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager exceptions.raise_for_response(method, url, > response) > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager File > "/usr/lib/python3.6/site-packages/sushy/exceptions.py", line 161, in > raise_for_response > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager raise ServerSideError(method, url, response) > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager sushy.exceptions.ServerSideError: HTTP GET > https://192.168.123.1:8000/redfish/v1/Systems/7691867f-577d-4134-be10- > 8705420159 > 5f returned code 500. Base.1.0.GeneralError: Domain not found: no domain > with matching uuid '7691867f-577d-4134-be10-87054201595f' (master-0-0) > 2020-10-21T18:13:58.483705606Z 2020-10-21 18:13:58.471 1 ERROR > ironic.conductor.manager > 2020-10-21T18:13:58.484195339Z 2020-10-21 18:13:58.483 1 DEBUG > ironic.common.states [req-35cd7821-0765-466e-b767-5ce56b8bc603 ironic-user - > - - -] Exiting old state 'verifying' in response to event 'fail' on_exi > t /usr/lib/python3.6/site-packages/ironic/common/states.py:295 > 2020-10-21T18:13:58.484498112Z 2020-10-21 18:13:58.484 1 DEBUG > ironic.common.states [req-35cd7821-0765-466e-b767-5ce56b8bc603 ironic-user - > - - -] Entering new state 'enroll' in response to event 'fail' on_enter > /usr/lib/python3.6/site-packages/ironic/common/states.py:301 > > > How did you power off the Host? I can't help but imagine that this is > related. > I used virsh destroy and virsh undefine on the hypervisor which explains the 'Domain not found: no domain with matching uuid ' message as there isn't any VM with that UUID existing anymore. (In reply to Marius Cornea from comment #7) > I used virsh destroy and virsh undefine on the hypervisor which explains the > 'Domain not found: no domain with matching uuid ' message as there isn't any > VM with that UUID existing anymore. Yep, that explains everything then :) I guess you used that method so that Ironic couldn't turn it back on again. But it's going to systematically cause this problem, since we'll never be able to read the power state. Current behaviour of the Machine is that it will not allow itself to be deleted before it has verified that the Host power is off. So if a physical server goes away then the only way to get rid of the Machine is to delete the Host corresponding to that physical server. I'd argue that this part is correct behaviour. The fixable part is fixed upstream and incorporated downstream by https://github.com/openshift/baremetal-operator/pull/113 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |