openshift-install 4.10.0-0.nightly-2021-12-23-153012 Version: openshift-install 4.10.0-0.nightly-2021-12-23-153012 Platform: openstack IPI What happened? Installation fails. Worker nodes are provisioned but don't become ready $ oc --namespace=openshift-machine-api get deployments NAME READY UP-TO-DATE AVAILABLE AGE cluster-autoscaler-operator 1/1 1 1 52m cluster-baremetal-operator 1/1 1 1 52m machine-api-controllers 1/1 1 1 44m machine-api-operator 1/1 1 1 52m $ oc get nodes NAME STATUS ROLES AGE VERSION ostest-ptlx5-master-0 Ready master 51m v1.22.1+6859754 ostest-ptlx5-master-1 Ready master 51m v1.22.1+6859754 ostest-ptlx5-master-2 Ready master 51m v1.22.1+6859754 From machine api controller log: I1229 18:28:52.359841 1 controller.go:175] ostest-ptlx5-worker-0-cvd7j: reconciling Machine I1229 18:28:53.855465 1 controller.go:298] ostest-ptlx5-worker-0-cvd7j: reconciling machine triggers idempotent update E1229 18:28:55.558392 1 controller.go:300] ostest-ptlx5-worker-0-cvd7j: error updating machine: Operation cannot be fulfilled on machines.machine.openshift.io "ostest-ptlx5-worker-0-cvd7j": the object has been modified; please apply your changes to the latest version and try again, retrying in 30s seconds
Deployment is working in CI: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.10-e2e-openstack-ovn/1478671317682098176 But we have events that repeat pathologically: event happened 181 times, something is wrong: ns/openshift-ovn-kubernetes pod/ovnkube-master-5phk2 node/7v8ww7v1-a4db4-cbqbx-master-2 - reason/BackOff Back-off restarting failed container This looks different from what you have but it needs more investigation.
Can we get the install-config.yaml and OpenStack logs for this deployment. The latter should provide a little more information on what was configured while the former should allow us to reproduce this issue.
We investigated this. The worker node had two default routes with an identical metric. default via 10.196.0.1 dev br-ex proto dhcp metric 100 default via 172.17.5.1 dev ens4 proto dhcp metric 100 We deleted the latter of these and things magically worked. This means this is duplicate of bug 2035326, and the fix is [1]. Closing as a duplicate. [1] https://github.com/openshift/machine-config-operator/pull/2898 *** This bug has been marked as a duplicate of bug 2035326 ***