Bug 1826764 - Machine annotated with "machine.openshift.io/cluster-api-delete-machine: yes" is reused
Summary: Machine annotated with "machine.openshift.io/cluster-api-delete-machine: yes"...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.4.z
Assignee: Zane Bitter
QA Contact: Raviv Bar-Tal
URL:
Whiteboard:
Depends On: 1816514
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-22 13:57 UTC by Yurii Prokulevych
Modified: 2023-09-14 05:55 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-14 13:43:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-api-provider-baremetal pull 95 0 None closed [release-4.4] Bug 1826764: Update openshift/cluster-api 2021-02-13 23:09:23 UTC

Description Yurii Prokulevych 2020-04-22 13:57:33 UTC
Description of problem:
-----------------------
Machine annotated for deletion was reused to provision another BMH

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
4.4.0-0.nightly-2020-04-18-095545



Steps to Reproduce:
-------------------
1. Depbloy bm ipi - 3 masters + 2workers
2. Add to new bmh/secrets for another 2 workers

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  creationTimestamp: null
  name: openshift-worker-0-2
  namespace: openshift-machine-api
spec:
  bmc:
    address: redfish://192.168.123.1:8000/redfish/v1/Systems/822e8bc7-6172-4a1d-8b75-b87e64d2775b
    credentialsName: openshift-worker-0-2-bmc-secret
    disableCertificateVerification: true
  bootMACAddress: "52:54:00:54:d8:ff"
  online: false
  userData:
    name: worker-user-data
    namespace: openshift-machine-api

---
apiVersion: v1
data:
  password: cGFzc3dvcmQ=
  username: YWRtaW4=
kind: Secret
metadata:
  creationTimestamp: null
  name: openshift-worker-0-2-bmc-secret
  namespace: openshift-machine-api
type: Opaque
[kni@provisionhost-0-0 ScaleUp]$

3. Make sure that successfully introspected
oc get bmh -n openshift-machine-api
NAME                   STATUS   PROVISIONING STATUS      CONSUMER    BMC  HARDWARE PROFILE   ONLINE   ERROR
...
openshift-worker-0-2   OK       ready                                                        redfish://192.168.123.1:8000/redfish/v1/Systems/822e8bc7-6172-4a1d-8b75-b87e64d2775b   unknown            false
openshift-worker-0-3   OK       ready                                                        redfish://192.168.123.1:8000/redfish/v1/Systems/a24fbc47-4982-4e2f-825b-159f5fd896ea   unknown            false

4. Scale up existing machineSet
    oc scale --replicas=3 machineset/ocp-edge-cluster-0-worker-0 -n openshift-machine-api
    machineset.machine.openshift.io/ocp-edge-cluster-0-worker-0 scaled

5. Wait for new node to be ready

oc get nodes
NAME         STATUS   ROLES    AGE    VERSION
master-0-0   Ready    master   2d2h   v1.17.1
master-0-1   Ready    master   2d2h   v1.17.1
master-0-2   Ready    master   2d2h   v1.17.1
worker-0-0   Ready    worker   2d1h   v1.17.1
worker-0-1   Ready    worker   2d1h   v1.17.1
worker-0-3   Ready    worker   10m    v1.17.1

6. Add annotation `machine.openshift.io/cluster-api-delete-machine=yes` to  machine for new worker(worker-0-3 in this example:

    oc annotate machine ocp-edge-cluster-0-worker-0-hkw8q machine.openshift.io/cluster-api-delete-machine=yes -n openshift-machine-api
machine.machine.openshift.io/ocp-edge-cluster-0-worker-0-hkw8q annotated

7. Delete baremtalHost object for the corresponding machine

    oc delete bmh openshift-worker-0-3 -n openshift-machine-api
    baremetalhost.metal3.io "openshift-worker-0-3" deleted

8. Cause there is a spare bmh and replicaset still has requires 3 worker nodes,
   available bmh (openshift-worker-0-2) is immediately provisioned

9. Check that now new machine was created, but the one with annotation is reused:
   (after some time details internalIP, Hostname, are updated with correct data):

Name:         ocp-edge-cluster-0-worker-0-hkw8q
Namespace:    openshift-machine-api
Labels:       machine.openshift.io/cluster-api-cluster=ocp-edge-cluster-0
              machine.openshift.io/cluster-api-machine-role=worker
              machine.openshift.io/cluster-api-machine-type=worker
              machine.openshift.io/cluster-api-machineset=ocp-edge-cluster-0-worker-0
Annotations:  machine.openshift.io/cluster-api-delete-machine: yes
              metal3.io/BareMetalHost: openshift-machine-api/openshift-worker-0-2
API Version:  machine.openshift.io/v1beta1
Kind:         Machine
Metadata:
  Creation Timestamp:  2020-04-22T12:28:13Z
  Finalizers:
    machine.machine.openshift.io
  Generate Name:  ocp-edge-cluster-0-worker-0-
  Generation:     1
  Owner References:
    API Version:           machine.openshift.io/v1beta1
    Block Owner Deletion:  true
    Controller:            true
    Kind:                  MachineSet
    Name:                  ocp-edge-cluster-0-worker-0
    UID:                   6d6c65cd-f3e0-4ab7-b01c-a65e63d45a98
  Resource Version:        972278
  Self Link:               /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machines/ocp-edge-cluster-0-worker-0-hkw8q
  UID:                     66d5f3d6-a3c3-4702-9c12-43e4eaaadacd
Spec:
  Metadata:
    Creation Timestamp:  <nil>
  Provider Spec:
    Value:
      Host Selector:
      Image:
        Checksum:  http://[fd00:1101::3]:6180/images/rhcos-44.81.202003110027-0-openstack.x86_64.qcow2/rhcos-44.81.202003110027-0-compressed.x86_64.qcow2.md5sum
        URL:       http://[fd00:1101::3]:6180/images/rhcos-44.81.202003110027-0-openstack.x86_64.qcow2/rhcos-44.81.202003110027-0-compressed.x86_64.qcow2
      Metadata:
        Creation Timestamp:  <nil>
      User Data:
        Name:  worker-user-data
Status:
  Addresses:
    Address:     192.168.123.130
    Type:        InternalIP
    Address:     fd00:1101::4f
    Type:        InternalIP
    Address:     worker-0-2
    Type:        Hostname
    Address:     worker-0-2
    Type:        InternalDNS
  Last Updated:  2020-04-22T12:49:00Z
  Node Ref:
    Kind:  Node
    Name:  worker-0-3
    UID:   fc5a1b49-93be-41ce-891a-c29b7ab8e44c
Events:    <none>

10. Node node-3 moves into NotReady state and still has IP address of previous BMH

Actual results:
---------------
machine with annotation `machine.openshift.io/cluster-api-delete-machine=yes` is reused


Expected results:
-----------------
machine with annotation `machine.openshift.io/cluster-api-delete-machine=yes` isn't reused

Additional info:
----------------
Virtual deployment, provisioning net IPv6, baremetal net IPv4, disconnected env

Comment 1 Stephen Cuppett 2020-04-22 14:45:05 UTC
Setting target release to current development version (4.5) for investigation. Where fixes (if any) are required/requested for prior versions, cloned BZs will be created when appropriate.

Comment 2 sdasu 2020-05-19 16:57:42 UTC
Seems like this is a generic MAO issue and not specific to baremetal. Could someone from MAO please take a look?

Comment 3 Alberto 2020-05-19 17:13:28 UTC
This is describing an scenario resulting from bypassing machine deletion. Annotating with "machine.openshift.io/cluster-api-delete-machine: yes" just makes a machine as highest priority to be delete during a scale down operation. Moving back to baremetal team:
Please see my comment here https://bugzilla.redhat.com/show_bug.cgi?id=1812588#c19.

Comment 4 Alberto 2020-05-19 17:14:04 UTC
can we please make the comments public so every one coming here has as much context as possible?

Comment 5 Michael Gugino 2020-05-26 14:54:30 UTC
There's two issues here.

1) machine.openshift.io/cluster-api-delete-machine doesn't delete the machine or cause the machine to be deleted.  That's just for informing the machineSet that next time there is a scale down operation, prioritize this machine over all others when selecting deletion candidates.  It has no other effect and should not be used in the manner described here.

2)  Re-using an existing machine object with a different backing VM is absolutely unsupported.  If you delete an instance in the cloud (in this case, via the bare metal API), that is considered unrecoverable from the machine-api POV.  That machine-object should be deleted, not reused.  If the bare metal provider is reusing existing machines, it needs to be changed to not do this.

As far as the bare metal provider assigning a new machine object to an existing metal instance, that is fine, provided that instance has not previously joined the cluster as a node.  EG, if it's just a warm VM awaiting configuration or something.  1 machine == 1 VM/Host == 1 node.  These are immutable.

Comment 8 Zane Bitter 2020-08-11 00:46:55 UTC
The cluster-api-delete-machine annotation appears irrelevant to the issue in question here. The Machine should be deleted, not because of the annotation (that would only have an effect if we scaled down), but because the Host it is using is deleted. That appears to have been implemented correctly even at the time of this bug report:

https://github.com/openshift/cluster-api-provider-baremetal/blob/9f1ec40287c4a08c1a9777a9bccdf94473883efd/pkg/cloud/baremetal/actuators/machine/actuator.go#L224-L259

However, the Exists() method of the actuator returns false if the Host named in the annotation cannot be found (or the annotation does not exist). When Exists() returns false, the machine controller calls the Create() method of the actuator instead of Update(), and in the Create() phase it is assumed that if the Host does not exist we should select one.

So we have only a brief window to detect that the Host is going away: after it enters the Deleting state, but before it is actually deleted. The deleting state is generally very short, since all it has to do is delete the node from Ironic and remove the finalizer. The Machine has its own finalizer on the Host, but it removes it and requeues before deleting the Machine object itself, so there's amply opportunity for races. What likely happened is that we removed our finalizer the Host disappeared from the API before we requeued and called Exists() again, so the Machine controller bounced back to calling Create().

My PR https://github.com/openshift/cluster-api-provider-baremetal/pull/57 would have solved this, but only by making it more likely that we'd win the race (since we would delete the Machine as soon as the Host enters the Deprovisioning state, rather than wait for the last moment).

A true fix would be to make Exists() return true if the Machine has ever been created. One way we could potentially do that is just by looking for the existence of the annotation. Another might be to look for a link between the Machine and a Node - in theory this is something the Machine controller could and should check before asking the actuator.

Comment 9 Michael Gugino 2020-08-11 13:39:47 UTC
(In reply to Zane Bitter from comment #8)
> The cluster-api-delete-machine annotation appears irrelevant to the issue in
> question here. The Machine should be deleted, not because of the annotation
> (that would only have an effect if we scaled down), but because the Host it
> is using is deleted. That appears to have been implemented correctly even at
> the time of this bug report:
> 
> https://github.com/openshift/cluster-api-provider-baremetal/blob/
> 9f1ec40287c4a08c1a9777a9bccdf94473883efd/pkg/cloud/baremetal/actuators/
> machine/actuator.go#L224-L259
> 
> However, the Exists() method of the actuator returns false if the Host named
> in the annotation cannot be found (or the annotation does not exist). When
> Exists() returns false, the machine controller calls the Create() method of
> the actuator instead of Update(), and in the Create() phase it is assumed
> that if the Host does not exist we should select one.
> 
> So we have only a brief window to detect that the Host is going away: after
> it enters the Deleting state, but before it is actually deleted. The
> deleting state is generally very short, since all it has to do is delete the
> node from Ironic and remove the finalizer. The Machine has its own finalizer
> on the Host, but it removes it and requeues before deleting the Machine
> object itself, so there's amply opportunity for races. What likely happened
> is that we removed our finalizer the Host disappeared from the API before we
> requeued and called Exists() again, so the Machine controller bounced back
> to calling Create().
> 
> My PR https://github.com/openshift/cluster-api-provider-baremetal/pull/57
> would have solved this, but only by making it more likely that we'd win the
> race (since we would delete the Machine as soon as the Host enters the
> Deprovisioning state, rather than wait for the last moment).
> 
> A true fix would be to make Exists() return true if the Machine has ever
> been created. One way we could potentially do that is just by looking for
> the existence of the annotation. Another might be to look for a link between
> the Machine and a Node - in theory this is something the Machine controller
> could and should check before asking the actuator.

I think we should set up a meeting, there are some problems with the design if I'm understanding this correctly.

Comment 10 Zane Bitter 2020-08-11 16:25:47 UTC
Michael and I had a call and we figured out the problem. The issue is happening basically as I described above: Exists() returns false, then Create() is called instead of Update(), and the Machine grabs a new Host and starts provisioning that.

However, the tricky part is that returning false from Exists() is actually correct. The solution that I suggested of the Machine controller checking the state of the Machine before relying on the actuator's Exists() method to determine whether to Create() or Update() was in fact implemented in the OpenShift fork of the Cluster API 10 months ago already: https://github.com/openshift/cluster-api/pull/75

So the real problem is that the release-4.4 branch of CAPBM at the time this bug was opened had an ancient version of the Cluster API vendored (which was already fixed in the release-4.5 branch by https://github.com/openshift/cluster-api-provider-baremetal/pull/63), at a time when the Machine controllers in OpenShift had already been migrated to the MAO repository (9 months ago: https://github.com/openshift/machine-api-operator/pull/441; this was fixed later in the release-4.5 branch of CAPBM by https://github.com/openshift/cluster-api-provider-baremetal/pull/69).

So this issue is fixed in 4.5. The only question is whether to fix it in 4.4. If we chose to do so, the obvious way would be to sync the vendoring with the actual Machine controller released in 4.4.

Comment 12 Zane Bitter 2020-08-11 17:13:20 UTC
Michael also mentioned that we should *not* be deleting the Machine object from the actuator, even in the case that the underlying Host is deleted. I opened bug 1868104 to track that issue.

Comment 16 Red Hat Bugzilla 2023-09-14 05:55:45 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.