1843597 – [IPI][OSP] Worker deleted on openstack is not recreated

Bug 1843597 - [IPI][OSP] Worker deleted on openstack is not recreated

Summary: [IPI][OSP] Worker deleted on openstack is not recreated

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.6.0
Assignee:	egarcia
QA Contact:	David Sanz
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1848755
TreeView+	depends on / blocked

Reported:	2020-06-03 15:51 UTC by David Sanz
Modified:	2020-10-27 16:05 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-10-27 16:04:47 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-api-provider-openstack pull 101	0	None	closed	Bug 1843597: Revendor MAO and client-go	2020-07-06 16:55:34 UTC
Red Hat Product Errata	RHBA-2020:4196	0	None	None	None	2020-10-27 16:05:18 UTC

Description David Sanz 2020-06-03 15:51:02 UTC

Description of problem:

With a MachineHealthCheck created for the workers machineset, if we delete a worker machine on openstack (openstack server delete <<UUID>>), it is not recreated.

$ oc get nodes,machines,machineset,MachineHealthCheck -A
NAME                               STATUS     ROLES    AGE   VERSION
node/mrnd-tst-6dlcb-master-0       Ready      master   43m   v1.18.3+a637491
node/mrnd-tst-6dlcb-master-1       Ready      master   43m   v1.18.3+a637491
node/mrnd-tst-6dlcb-master-2       Ready      master   43m   v1.18.3+a637491
node/mrnd-tst-6dlcb-worker-9wnc2   Ready      worker   30m   v1.18.3+a637491
node/mrnd-tst-6dlcb-worker-bgqgq   NotReady   worker   27m   v1.18.3+a637491

NAMESPACE               NAME                                                       PHASE     TYPE           REGION      ZONE   AGE
openshift-machine-api   machine.machine.openshift.io/mrnd-tst-6dlcb-master-0       Running   ci.m1.xlarge   regionOne   nova   43m
openshift-machine-api   machine.machine.openshift.io/mrnd-tst-6dlcb-master-1       Running   ci.m1.xlarge   regionOne   nova   43m
openshift-machine-api   machine.machine.openshift.io/mrnd-tst-6dlcb-master-2       Running   ci.m1.xlarge   regionOne   nova   43m
openshift-machine-api   machine.machine.openshift.io/mrnd-tst-6dlcb-worker-9wnc2   Running   ci.m1.xlarge   regionOne   nova   36m
openshift-machine-api   machine.machine.openshift.io/mrnd-tst-6dlcb-worker-bgqgq   Failed    ci.m1.xlarge   regionOne   nova   36m

NAMESPACE               NAME                                                    DESIRED   CURRENT   READY   AVAILABLE   AGE
openshift-machine-api   machineset.machine.openshift.io/mrnd-tst-6dlcb-worker   2         2         1       1           43m

NAMESPACE               NAME                                                             MAXUNHEALTHY   EXPECTEDMACHINES   CURRENTHEALTHY
openshift-machine-api   machinehealthcheck.machine.openshift.io/openstack-health-check   40%            2                  1

Related log from machine-api-controllers:

I0603 15:26:45.455332       1 controller.go:165] mrnd-tst-6dlcb-worker-bgqgq: reconciling Machine
I0603 15:26:45.497169       1 utils.go:99] Cloud provider CA cert not provided, using system trust bundle
I0603 15:26:50.751532       1 controller.go:420] mrnd-tst-6dlcb-worker-bgqgq: going into phase "Failed"
I0603 15:26:50.771680       1 controller.go:165] mrnd-tst-6dlcb-worker-bgqgq: reconciling Machine
W0603 15:26:50.771710       1 controller.go:262] mrnd-tst-6dlcb-worker-bgqgq: machine has gone "Failed" phase. It won't reconcile

Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-06-03-105031

How reproducible:


Steps to Reproduce:
1.Install cluster IPI on OSP
2.Create a MachineHealthCheck for workers machineset
3.Destroy worker instance on openstack

Actual results:
Worker is not recreated

Expected results:
Worker is recreated by the machineset

Additional info:

Comment 2 Mike Fedosin 2020-06-22 08:36:34 UTC

The fix has been merged, so I move this bug to ON_QA: https://github.com/openshift/cluster-api-provider-openstack/pull/101

Comment 3 David Sanz 2020-06-22 11:32:38 UTC

Verified on 4.6.0-0.nightly-2020-06-20-011219

Comment 6 errata-xmlrpc 2020-10-27 16:04:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.