Description of problem: Deleted a master machine via machine-api on AWS. Machine is not removed from appropriate load balancers. New machines are added to the load balancers, though. Version-Release number of selected component (if applicable): How reproducible: TBD. Steps to Reproduce: 1. delete an existing master that was created by the installer Actual results: In ec2 console, the master is still listed as a backend for both internal and external LBs. Expected results: It should not be present after deletion. Additional info: Master machine config loadBalancers: - name: mgugino-deva6-zcchn-int type: network - name: mgugino-deva6-zcchn-ext type: network
We will need to investigate or reassign this during the next sprint. It's not clear to me at present which component is responsible for the load balancer attachment of VMs, is this a problem with Machine API and the way we are creating machines? As far as I was aware we do not touch load balancer attachments
adding UpcomingSprint tag, the team is still investigating this issue.
In AWS, if you register an instance to the target group using instance ID, as is done in the Machine API provider, then you do not need to remove it, as it is automatically removed when the instance is terminated. Hence, the machine API provider does not have code to remove load balancer attachments, it assumes all instances are attached using instance ID. However, the installer uses [1] IP addresses to register the instances, therefore manual removal is needed instead. There are limitations to using instance IDs, such as not being able to use certain instance types [2] > You cannot register instances by instance ID if they use one of the following instance types: C1, CC1, CC2, CG1, CG2, CR1, G1, G2, HI1, HS1, M1, M2, M3, or T1. It would be good to understand whether registering by IP was a conscious decision on the installer team side or whether they might consider using instance IDs going forward? If there's a technical reason the installer can't use instance IDs, then the Machine API should implement deregister logic for the load balancer attachments. Given clusters have been installed using the IP based attachments, we may want to do this anyway to catch instances installed in this way. [1]: https://github.com/openshift/installer/blob/c0f508287415fd3bf489b214b0132f75e3c03c9f/data/data/aws/master/main.tf#L171 [2]: https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-register-targets.html
It should be simple enough to add the removal logic to the deletion flow of a machine. For existing clusters, my preference is to send out some kind of advisory to clean up anything that might be lingering from prior to us add this feature. Reason being, we don't know what users might have added behind this LB since creation time, and we can't go removing just anything that doesn't match a machine object. This is especially true for a UPI cluster.
Agreed, we don't want to affect any manual changes. We should only remove the load balancer attachment if it matches the instance ID of the Machine we are deleting, or it matches the IP of the Machine we are deleting.
I think we have an agreed plan of action here, setting target for 4.8
@miyadav This is not what I observed in my testing. The target group itself should start deprovisioning and eventually go away. I had to refresh console for that change to appear, but it worked for me.
Hi Danil , I dont see that refreshed many time ..will share details on chat
I'm seeing we are missing permissions on that kind of operation. Didn't come up in my testing, targets from target groups went away as they should. Here is a log snippet from QE provided cluster: E0326 12:16:56.376058 1 loadbalancers.go:117] Failed to unregister instance "i-06828db8c6de8ac53" from target group "arn:aws:elasticloadbalancing:us-east-2:301721915996:targetgroup/miyadav-aws-26-sq5k5-aext/e7b92a1a0249694a": AccessDenied: User: arn:aws:iam::301721915996:user/miyadav-aws-26-sq5k5-openshift-machine-api-aws-c5bws is not authorized to perform: elasticloadbalancing:DeregisterTargets on resource: arn:aws:elasticloadbalancing:us-east-2:301721915996:targetgroup/miyadav-aws-26-sq5k5-aext/e7b92a1a0249694a status code: 403, request id: 931605e3-1b43-457f-858b-8f896cfc33fd E0326 12:16:56.376099 1 reconciler.go:342] miyadav-aws-26-sq5k5-master-2: Failed to register network load balancers: [arn:aws:elasticloadbalancing:us-east-2:301721915996:targetgroup/miyadav-aws-26-sq5k5-aint/fe879ca54c0b3cc0: AccessDenied: User: arn:aws:iam::301721915996:user/miyadav-aws-26-sq5k5-openshift-machine-api-aws-c5bws is not authorized to perform: elasticloadbalancing:DeregisterTargets on resource: arn:aws:elasticloadbalancing:us-east-2:301721915996:targetgroup/miyadav-aws-26-sq5k5-aint/fe879ca54c0b3cc0
Validated on : 4.8.0-0.nightly-2021-04-18-101412 Steps : 1.Delete the master machine using oc delete machine <machine-name> Keep a note of IP (master-0 in this case) [miyadav@miyadav ~]$ oc get machines -o wide NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE miyadav-19-88c9g-master-0 Running m5.xlarge us-east-2 us-east-2a 58m ip-10-0-145-39.us-east-2.compute.internal aws:///us-east-2a/i-0f7c4d1e58a97e8c8 running miyadav-19-88c9g-master-1 Running m5.xlarge us-east-2 us-east-2b 58m ip-10-0-178-16.us-east-2.compute.internal aws:///us-east-2b/i-0050d064a1128d912 running miyadav-19-88c9g-master-2 Running m5.xlarge us-east-2 us-east-2c 58m ip-10-0-209-36.us-east-2.compute.internal aws:///us-east-2c/i-024b93076a9f78046 running miyadav-19-88c9g-worker-us-east-2a-w4sz4 Running m5.large us-east-2 us-east-2a 52m ip-10-0-130-210.us-east-2.compute.internal aws:///us-east-2a/i-024bde0fdf3b3f4b8 running miyadav-19-88c9g-worker-us-east-2b-mfnn7 Running m5.large us-east-2 us-east-2b 52m ip-10-0-185-195.us-east-2.compute.internal aws:///us-east-2b/i-099432161351bb086 running miyadav-19-88c9g-worker-us-east-2c-zwp8v Running m5.large us-east-2 us-east-2c 52m ip-10-0-215-246.us-east-2.compute.internal aws:///us-east-2c/i-04c30eeb00dae7231 running Check the target groups from aws console , all three masters will be present 2.Delete the machine oc delete machine miyadav-19-88c9g-master-0 3. Navigate to the aws console , to check if the ip is removed from target groups from both internal and external LBs Actual & Expected : The deleted IP removed and the other two masters are marked as healthy ( have two targets instead of three now) [miyadav@miyadav ~]$ aws elbv2 describe-target-groups --target-group arn:aws:elasticloadbalancing:us-east-2:301721915996:targetgroup/miyadav-19-88c9g-sint/f1f467f77f4eb381 TARGETGROUPS True 10 /healthz 22623 HTTPS 10 2 22623 TCP arn:aws:elasticloadbalancing:us-east-2:301721915996:targetgroup/miyadav-19-88c9g-sint/f1f467f77f4eb381 miyadav-19-88c9g-sint ip 2 vpc-02382a021db4e9c2b LOADBALANCERARNS arn:aws:elasticloadbalancing:us-east-2:301721915996:loadbalancer/net/miyadav-19-88c9g-int/13f10ea6185c1aae MATCHER 200-399 [miyadav@miyadav ~]$ aws elbv2 describe-target-groups --target-group arn:aws:elasticloadbalancing:us-east-2:301721915996:targetgroup/miyadav-19-88c9g-aint/bde8eb6f9a86d180 TARGETGROUPS True 10 /readyz 6443 HTTPS 10 2 6443 TCP arn:aws:elasticloadbalancing:us-east-2:301721915996:targetgroup/miyadav-19-88c9g-aint/bde8eb6f9a86d180 miyadav-19-88c9g-aint ip 2 vpc-02382a021db4e9c2b LOADBALANCERARNS arn:aws:elasticloadbalancing:us-east-2:301721915996:loadbalancer/net/miyadav-19-88c9g-int/13f10ea6185c1aae MATCHER 200-399 [miyadav@miyadav ~]$ aws elbv2 describe-target-groups --target-group arn:aws:elasticloadbalancing:us-east-2:301721915996:targetgroup/miyadav-19-88c9g-aext/8b0575780d9cac07 TARGETGROUPS True 10 /readyz 6443 HTTPS 10 2 6443 TCP arn:aws:elasticloadbalancing:us-east-2:301721915996:targetgroup/miyadav-19-88c9g-aext/8b0575780d9cac07 miyadav-19-88c9g-aext ip 2 vpc-02382a021db4e9c2b LOADBALANCERARNS arn:aws:elasticloadbalancing:us-east-2:301721915996:loadbalancer/net/miyadav-19-88c9g-ext/c7d5f8d030f846e4 MATCHER 200-399 Additional Info: Moved to VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438