1791057 – Scaling machineset down should check if the number of replicas matches the number of bmh

Bug 1791057 - Scaling machineset down should check if the number of replicas matches the number of bmh

Summary: Scaling machineset down should check if the number of replicas matches the nu...

Keywords:
Status:	CLOSED DUPLICATE of bug 1812588
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Bare Metal Hardware Provisioning
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.5.0
Assignee:	sdasu
QA Contact:	Amit Ugol
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1771572
TreeView+	depends on / blocked

Reported:	2020-01-14 18:42 UTC by Alexander Chuzhoy
Modified:	2020-05-12 16:58 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-05-12 16:58:38 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Alexander Chuzhoy 2020-01-14 18:42:25 UTC

Scaling machineset down should check if the number of replicas matches the number of bmh

Version:
4.4.0-0.nightly-2020-01-09-013524

Steps to reproduce:
Have 3 workes in the bmh list.

Remove a worker with: 
oc delete bmh openshift-worker-1  -n openshift-machine-api


#Check the list of bmh (note that we now have only 2 workers):
[kni@worker-2 ~]$ oc get bmh -n openshift-machine-api
NAME                 STATUS   PROVISIONING STATUS      CONSUMER                          BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ocp-edge-cluster-master-0         ipmi://192.168.123.1:6230                      true     
openshift-master-1   OK       externally provisioned   ocp-edge-cluster-master-1         ipmi://192.168.123.1:6231                      true     
openshift-master-2   OK       externally provisioned   ocp-edge-cluster-master-2         ipmi://192.168.123.1:6232                      true     
openshift-worker-0   OK       provisioned             ocp-edge-cluster-worker-0-d2fvm   ipmi://192.168.123.1:6233   unknown            true     
openshift-worker-9   OK       provisioned              ocp-edge-cluster-worker-0-ptklp   ipmi://192.168.123.1:6239   unknown            true  



Scale the number of replicas to 2 (expecting no action, since we already have 2 workers):
oc scale machineset -n openshift-machine-api ocp-edge-cluster-worker-0 --replicas=2


Actual observation: a worker gets deprovisioned and provisioned again.


(.ironic) [kni@worker-2 ~]$ oc get bmh -n openshift-machine-api
NAME                 STATUS   PROVISIONING STATUS      CONSUMER                          BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ocp-edge-cluster-master-0         ipmi://192.168.123.1:6230                      true     
openshift-master-1   OK       externally provisioned   ocp-edge-cluster-master-1         ipmi://192.168.123.1:6231                      true     
openshift-master-2   OK       externally provisioned   ocp-edge-cluster-master-2         ipmi://192.168.123.1:6232                      true     
openshift-worker-0   OK       deprovisioning           ocp-edge-cluster-worker-0-5bdrv   ipmi://192.168.123.1:6233   unknown            false    
openshift-worker-9   OK       provisioned              ocp-edge-cluster-worker-0-ptklp   ipmi://192.168.123.1:6239   unknown            true     






[kni@worker-2 ~]$ oc get bmh -n openshift-machine-api
NAME                 STATUS   PROVISIONING STATUS      CONSUMER                          BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ocp-edge-cluster-master-0         ipmi://192.168.123.1:6230                      true     
openshift-master-1   OK       externally provisioned   ocp-edge-cluster-master-1         ipmi://192.168.123.1:6231                      true     
openshift-master-2   OK       externally provisioned   ocp-edge-cluster-master-2         ipmi://192.168.123.1:6232                      true     
openshift-worker-0   OK       provisioning             ocp-edge-cluster-worker-0-d2fvm   ipmi://192.168.123.1:6233   unknown            true     
openshift-worker-9   OK       provisioned              ocp-edge-cluster-worker-0-ptklp   ipmi://192.168.123.1:6239   unknown            true  




[kni@worker-2 ~]$ oc get bmh -n openshift-machine-api
NAME                 STATUS   PROVISIONING STATUS      CONSUMER                          BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ocp-edge-cluster-master-0         ipmi://192.168.123.1:6230                      true     
openshift-master-1   OK       externally provisioned   ocp-edge-cluster-master-1         ipmi://192.168.123.1:6231                      true     
openshift-master-2   OK       externally provisioned   ocp-edge-cluster-master-2         ipmi://192.168.123.1:6232                      true     
openshift-worker-0   OK       provisioned              ocp-edge-cluster-worker-0-d2fvm   ipmi://192.168.123.1:6233   unknown            true     
openshift-worker-9   OK       provisioned              ocp-edge-cluster-worker-0-ptklp   ipmi://192.168.123.1:6239   unknown            true

Comment 1 Yu Qi Zhang 2020-01-15 01:16:00 UTC

I think this is not MCO but more machine-api, passing to Alberto to take a look

Comment 2 Alberto 2020-01-15 09:06:43 UTC

How many replicas was there originally? I reckon this is because when you scaled down there's no guarantee the machine which bmh was deleted is the one being deleted, assigning to Sandhya for baremetal specific insight.

Comment 3 Doug Hellmann 2020-01-15 18:44:22 UTC

The baremetal code uses an annotation to manage which machine is removed when the set is scaled down. https://github.com/metal3-io/metal3-docs/blob/master/design/remove-host.md

Comment 4 Beth White 2020-05-12 16:58:38 UTC


*** This bug has been marked as a duplicate of bug 1812588 ***

Note You need to log in before you can comment on or make changes to this bug.