Bug 1740956

Summary: Ironic nodes registered to the bmo Ironic are lost after rebooting the master node where bmo pod is running
Product: Kubernetes-native Infrastructure Reporter: Marius Cornea <mcornea>
Component: DeploymentAssignee: Angus Thomas <athomas>
Status: CLOSED WONTFIX QA Contact: Arik Chernetsky <achernet>
Severity: urgent Docs Contact:
Priority: urgent    
Version: unspecifiedCC: augol, dhellmann, kni-bugs, mlammon, ncredi, yprokule
Target Milestone: ---Keywords: Reopened
Target Release: 1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-16 11:49:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2019-08-14 01:15:43 UTC
Description of problem:

Ironic nodes registered to the bmo Ironic are lost after rebooting the master node where bmo pod is running.

Steps to Reproduce:

[cloud-user@rhhi-node-worker-0 dev-scripts]$ export OS_URL=http://172.22.0.3:6385
[cloud-user@rhhi-node-worker-0 dev-scripts]$ export OS_TOKEN=fake-token
[cloud-user@rhhi-node-worker-0 dev-scripts]$ openstack baremetal node list
+--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name               | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+
| 56904f3a-88da-4ea9-b237-155216d12d9d | openshift-master-0 | None          | power on    | adopt failed       | False       |
| 21466e2a-a0c5-4a0e-ae4b-788e02c4fa0c | openshift-master-1 | None          | power on    | adopt failed       | False       |
| 1b48b778-d341-4e21-ad9e-a49ae9619ed1 | openshift-master-2 | None          | power on    | adopt failed       | False       |
+--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+


[cloud-user@rhhi-node-worker-0 dev-scripts]$ oc -n openshift-machine-api get pods/metal3-baremetal-operator-7469cd5f89-5h2fs -o yaml | grep master-
  nodeName: rhhi-node-master-1


Power off the master node where the baremetal-operator pod is running:
openstack baremetal node power off openshift-master-1 #ran on provisionhost ironic


Wait for the node to go offline:
[cloud-user@rhhi-node-worker-0 ~]$ oc get nodes
NAME                 STATUS     ROLES           AGE   VERSION
rhhi-node-master-0   Ready      master,worker   82m   v1.14.0+739670a83
rhhi-node-master-1   NotReady   master,worker   82m   v1.14.0+739670a83
rhhi-node-master-2   Ready      master,worker   82m   v1.14.0+739670a83


Power on the node back on:
openstack baremetal node power on openshift-master-1 #ran on provisionhost ironic

Wait for the node to go online:
[cloud-user@rhhi-node-worker-0 ~]$ oc get nodes
NAME                 STATUS   ROLES           AGE   VERSION
rhhi-node-master-0   Ready    master,worker   84m   v1.14.0+739670a83
rhhi-node-master-1   Ready    master,worker   84m   v1.14.0+739670a83
rhhi-node-master-2   Ready    master,worker   84m   v1.14.0+739670a83

Check baremetal nodes list:

[cloud-user@rhhi-node-worker-0 ~]$ export OS_URL=http://172.22.0.3:6385
[cloud-user@rhhi-node-worker-0 ~]$ export OS_TOKEN=fake-token
[cloud-user@rhhi-node-worker-0 ~]$ openstack baremetal node list

Version-Release number of selected component (if applicable):

How reproducible:
100%

Actual results:
Ironic nodes which were registered during initial deployment are lost after rebooting the master node where the bmo pod is running.

Expected results:
The ironic nodes details persist across reboots.

Additional info:

Comment 1 Doug Hellmann 2019-08-23 22:54:59 UTC
https://github.com/metal3-io/baremetal-operator/pull/278 should resolve this

Comment 2 Nelly Credi 2019-08-25 13:01:37 UTC
If you feel that this bug is fixed, please set the 'fixed in version' and let QE verify it.
since i see you have a PR im setting it to POST