Description of problem:
When an AWS Machine uses NLB load balancer registration via MAPI, the registration type can either be IP based or instance ID based. The type is set when the target group is created, so for the control plane, is IP based due to limitations of the installer.
When we terminate a node, we check first if the instance exists, then terminate the instance, THEN remove it from the load balancer if it is IP based (instance ID is removed automatically).
In the case that the IP based deregistration fails, we may leak the target registration if the instance goes away before we manage to successfully deregister the instance
Version-Release number of selected component (if applicable):
100% (theoretical, haven't actually tried)
Steps to Reproduce:
1. Remove the DeregisterTargets permission from the Machine API AWS Credentials Request (CVO needs to be turned off for this)
2. Create a Machine which uses an IP based target group NLB
3. Delete the Machine, deregistration should fail, but this doesn't block the Machine from being removed from the cluster
4. Once the Machine is gone, its IP will still be listed as a target for the NLB
IP address is left behind even after the Machine has gone away
Machine should not go away until the NLB target is successfully deregistered.
Hi Joel , Tried it on 4.10 , by deleting master which were under NLB , the target got degistered successfully , so I was not able to reproduce it .
Do you suggest any other way to reproduce ?
Cluster version is 4.10.0-0.nightly-2022-03-19-230512
Cluster installed with masters registered behind NLB .
Deleted master .
Master deleted successfully and deregistered from NLB also .
Additional info :
[miyadav@miyadav ~]$ oc get machine -o wide
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
miyadav-2203-kfq7h-master-0 Running m6i.xlarge us-east-2 us-east-2a 173m ip-10-0-139-34.us-east-2.compute.internal aws:///us-east-2a/i-0d546d8294fc09c32 running
miyadav-2203-kfq7h-master-1 Running m6i.xlarge us-east-2 us-east-2b 173m ip-10-0-178-95.us-east-2.compute.internal aws:///us-east-2b/i-01737aec6d9a658be running
miyadav-2203-kfq7h-master-2 Running m6i.xlarge us-east-2 us-east-2c 173m ip-10-0-222-91.us-east-2.compute.internal aws:///us-east-2c/i-0a3e4af7b4cd8dde0 running
miyadav-2203-kfq7h-worker-us-east-2a-n4qvc Running m6i.large us-east-2 us-east-2a 170m ip-10-0-153-203.us-east-2.compute.internal aws:///us-east-2a/i-0dea7c5ca1d3a07e0 running
miyadav-2203-kfq7h-worker-us-east-2b-6tmzt Running m6i.large us-east-2 us-east-2b 170m ip-10-0-191-98.us-east-2.compute.internal aws:///us-east-2b/i-06f9985f81bea5732 running
miyadav-2203-kfq7h-worker-us-east-2c-s9hj9 Running m6i.large us-east-2 us-east-2c 170m ip-10-0-212-182.us-east-2.compute.internal aws:///us-east-2c/i-06c5ca9a10cf79554 running
[miyadav@miyadav ~]$ oc delete machine miyadav-2203-kfq7h-master-0
machine.machine.openshift.io "miyadav-2203-kfq7h-master-0" deleted
Deleted the 10.0.139.34 instance.
It is registered , deregister needed , will create a case for this ( attaching snap which makes it clearer) thanks Joel.
Validated at - Cluster version is 4.11.0-0.nightly-2022-03-20-160505
1.Create cluster with masters behing NLB.
Cluster created successfully.
2.Take a note of registered targets
3.Delete on of the registered targets
Expected and Actual - Machine deleted successfully after some time ( 2-3mins)
4. Validate the target is deregistered .
Actual & expected - Target degistered successfuly after draining
Snaps attached to test case in polarion.
Moved to VERIFIED
The only way I could think to force this would be to remove the `DeregisterTargets` permission from the Machine API credentials request (if you do this, you need to disable CVO first). Once the credentials have synced, what you should see is that in the old case, the Machine would still go away. With the new case, the Machine will not go away until the target is removed (either via AWS console, or fixing the permissions issue)
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.