Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1791057

Summary: Scaling machineset down should check if the number of replicas matches the number of bmh
Product: OpenShift Container Platform Reporter: Alexander Chuzhoy <sasha>
Component: Bare Metal Hardware ProvisioningAssignee: sdasu
Bare Metal Hardware Provisioning sub component: baremetal-operator QA Contact: Amit Ugol <augol>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: medium CC: agarcial, augol, beth.white, dhellmann, jerzhang, rbartal, scuppett, sgordon, stbenjam
Version: 4.4Keywords: Triaged
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-12 16:58:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1771572    

Description Alexander Chuzhoy 2020-01-14 18:42:25 UTC
Scaling machineset down should check if the number of replicas matches the number of bmh

Version:
4.4.0-0.nightly-2020-01-09-013524

Steps to reproduce:
Have 3 workes in the bmh list.

Remove a worker with: 
oc delete bmh openshift-worker-1  -n openshift-machine-api


#Check the list of bmh (note that we now have only 2 workers):
[kni@worker-2 ~]$ oc get bmh -n openshift-machine-api
NAME                 STATUS   PROVISIONING STATUS      CONSUMER                          BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ocp-edge-cluster-master-0         ipmi://192.168.123.1:6230                      true     
openshift-master-1   OK       externally provisioned   ocp-edge-cluster-master-1         ipmi://192.168.123.1:6231                      true     
openshift-master-2   OK       externally provisioned   ocp-edge-cluster-master-2         ipmi://192.168.123.1:6232                      true     
openshift-worker-0   OK       provisioned             ocp-edge-cluster-worker-0-d2fvm   ipmi://192.168.123.1:6233   unknown            true     
openshift-worker-9   OK       provisioned              ocp-edge-cluster-worker-0-ptklp   ipmi://192.168.123.1:6239   unknown            true  



Scale the number of replicas to 2 (expecting no action, since we already have 2 workers):
oc scale machineset -n openshift-machine-api ocp-edge-cluster-worker-0 --replicas=2


Actual observation: a worker gets deprovisioned and provisioned again.


(.ironic) [kni@worker-2 ~]$ oc get bmh -n openshift-machine-api
NAME                 STATUS   PROVISIONING STATUS      CONSUMER                          BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ocp-edge-cluster-master-0         ipmi://192.168.123.1:6230                      true     
openshift-master-1   OK       externally provisioned   ocp-edge-cluster-master-1         ipmi://192.168.123.1:6231                      true     
openshift-master-2   OK       externally provisioned   ocp-edge-cluster-master-2         ipmi://192.168.123.1:6232                      true     
openshift-worker-0   OK       deprovisioning           ocp-edge-cluster-worker-0-5bdrv   ipmi://192.168.123.1:6233   unknown            false    
openshift-worker-9   OK       provisioned              ocp-edge-cluster-worker-0-ptklp   ipmi://192.168.123.1:6239   unknown            true     






[kni@worker-2 ~]$ oc get bmh -n openshift-machine-api
NAME                 STATUS   PROVISIONING STATUS      CONSUMER                          BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ocp-edge-cluster-master-0         ipmi://192.168.123.1:6230                      true     
openshift-master-1   OK       externally provisioned   ocp-edge-cluster-master-1         ipmi://192.168.123.1:6231                      true     
openshift-master-2   OK       externally provisioned   ocp-edge-cluster-master-2         ipmi://192.168.123.1:6232                      true     
openshift-worker-0   OK       provisioning             ocp-edge-cluster-worker-0-d2fvm   ipmi://192.168.123.1:6233   unknown            true     
openshift-worker-9   OK       provisioned              ocp-edge-cluster-worker-0-ptklp   ipmi://192.168.123.1:6239   unknown            true  




[kni@worker-2 ~]$ oc get bmh -n openshift-machine-api
NAME                 STATUS   PROVISIONING STATUS      CONSUMER                          BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ocp-edge-cluster-master-0         ipmi://192.168.123.1:6230                      true     
openshift-master-1   OK       externally provisioned   ocp-edge-cluster-master-1         ipmi://192.168.123.1:6231                      true     
openshift-master-2   OK       externally provisioned   ocp-edge-cluster-master-2         ipmi://192.168.123.1:6232                      true     
openshift-worker-0   OK       provisioned              ocp-edge-cluster-worker-0-d2fvm   ipmi://192.168.123.1:6233   unknown            true     
openshift-worker-9   OK       provisioned              ocp-edge-cluster-worker-0-ptklp   ipmi://192.168.123.1:6239   unknown            true

Comment 1 Yu Qi Zhang 2020-01-15 01:16:00 UTC
I think this is not MCO but more machine-api, passing to Alberto to take a look

Comment 2 Alberto 2020-01-15 09:06:43 UTC
How many replicas was there originally? I reckon this is because when you scaled down there's no guarantee the machine which bmh was deleted is the one being deleted, assigning to Sandhya for baremetal specific insight.

Comment 3 Doug Hellmann 2020-01-15 18:44:22 UTC
The baremetal code uses an annotation to manage which machine is removed when the set is scaled down. https://github.com/metal3-io/metal3-docs/blob/master/design/remove-host.md

Comment 4 Beth White 2020-05-12 16:58:38 UTC

*** This bug has been marked as a duplicate of bug 1812588 ***