Bug 1791057 - Scaling machineset down should check if the number of replicas matches the number of bmh
Summary: Scaling machineset down should check if the number of replicas matches the nu...
Keywords:
Status: CLOSED DUPLICATE of bug 1812588
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.5.0
Assignee: sdasu
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks: 1771572
TreeView+ depends on / blocked
 
Reported: 2020-01-14 18:42 UTC by Alexander Chuzhoy
Modified: 2020-05-12 16:58 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-12 16:58:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Alexander Chuzhoy 2020-01-14 18:42:25 UTC
Scaling machineset down should check if the number of replicas matches the number of bmh

Version:
4.4.0-0.nightly-2020-01-09-013524

Steps to reproduce:
Have 3 workes in the bmh list.

Remove a worker with: 
oc delete bmh openshift-worker-1  -n openshift-machine-api


#Check the list of bmh (note that we now have only 2 workers):
[kni@worker-2 ~]$ oc get bmh -n openshift-machine-api
NAME                 STATUS   PROVISIONING STATUS      CONSUMER                          BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ocp-edge-cluster-master-0         ipmi://192.168.123.1:6230                      true     
openshift-master-1   OK       externally provisioned   ocp-edge-cluster-master-1         ipmi://192.168.123.1:6231                      true     
openshift-master-2   OK       externally provisioned   ocp-edge-cluster-master-2         ipmi://192.168.123.1:6232                      true     
openshift-worker-0   OK       provisioned             ocp-edge-cluster-worker-0-d2fvm   ipmi://192.168.123.1:6233   unknown            true     
openshift-worker-9   OK       provisioned              ocp-edge-cluster-worker-0-ptklp   ipmi://192.168.123.1:6239   unknown            true  



Scale the number of replicas to 2 (expecting no action, since we already have 2 workers):
oc scale machineset -n openshift-machine-api ocp-edge-cluster-worker-0 --replicas=2


Actual observation: a worker gets deprovisioned and provisioned again.


(.ironic) [kni@worker-2 ~]$ oc get bmh -n openshift-machine-api
NAME                 STATUS   PROVISIONING STATUS      CONSUMER                          BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ocp-edge-cluster-master-0         ipmi://192.168.123.1:6230                      true     
openshift-master-1   OK       externally provisioned   ocp-edge-cluster-master-1         ipmi://192.168.123.1:6231                      true     
openshift-master-2   OK       externally provisioned   ocp-edge-cluster-master-2         ipmi://192.168.123.1:6232                      true     
openshift-worker-0   OK       deprovisioning           ocp-edge-cluster-worker-0-5bdrv   ipmi://192.168.123.1:6233   unknown            false    
openshift-worker-9   OK       provisioned              ocp-edge-cluster-worker-0-ptklp   ipmi://192.168.123.1:6239   unknown            true     






[kni@worker-2 ~]$ oc get bmh -n openshift-machine-api
NAME                 STATUS   PROVISIONING STATUS      CONSUMER                          BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ocp-edge-cluster-master-0         ipmi://192.168.123.1:6230                      true     
openshift-master-1   OK       externally provisioned   ocp-edge-cluster-master-1         ipmi://192.168.123.1:6231                      true     
openshift-master-2   OK       externally provisioned   ocp-edge-cluster-master-2         ipmi://192.168.123.1:6232                      true     
openshift-worker-0   OK       provisioning             ocp-edge-cluster-worker-0-d2fvm   ipmi://192.168.123.1:6233   unknown            true     
openshift-worker-9   OK       provisioned              ocp-edge-cluster-worker-0-ptklp   ipmi://192.168.123.1:6239   unknown            true  




[kni@worker-2 ~]$ oc get bmh -n openshift-machine-api
NAME                 STATUS   PROVISIONING STATUS      CONSUMER                          BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ocp-edge-cluster-master-0         ipmi://192.168.123.1:6230                      true     
openshift-master-1   OK       externally provisioned   ocp-edge-cluster-master-1         ipmi://192.168.123.1:6231                      true     
openshift-master-2   OK       externally provisioned   ocp-edge-cluster-master-2         ipmi://192.168.123.1:6232                      true     
openshift-worker-0   OK       provisioned              ocp-edge-cluster-worker-0-d2fvm   ipmi://192.168.123.1:6233   unknown            true     
openshift-worker-9   OK       provisioned              ocp-edge-cluster-worker-0-ptklp   ipmi://192.168.123.1:6239   unknown            true

Comment 1 Yu Qi Zhang 2020-01-15 01:16:00 UTC
I think this is not MCO but more machine-api, passing to Alberto to take a look

Comment 2 Alberto 2020-01-15 09:06:43 UTC
How many replicas was there originally? I reckon this is because when you scaled down there's no guarantee the machine which bmh was deleted is the one being deleted, assigning to Sandhya for baremetal specific insight.

Comment 3 Doug Hellmann 2020-01-15 18:44:22 UTC
The baremetal code uses an annotation to manage which machine is removed when the set is scaled down. https://github.com/metal3-io/metal3-docs/blob/master/design/remove-host.md

Comment 4 Beth White 2020-05-12 16:58:38 UTC

*** This bug has been marked as a duplicate of bug 1812588 ***


Note You need to log in before you can comment on or make changes to this bug.