Bug 2139477 - Tags on AWS security group for gateway node break cloud-controller LoadBalancer
Summary: Tags on AWS security group for gateway node break cloud-controller LoadBalancer
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Submariner
Version: rhacm-2.6
Hardware: All
OS: All
unspecified
medium
Target Milestone: ---
: rhacm-2.7
Assignee: Stephen Kitt
QA Contact: Noam Manos
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-11-02 16:19 UTC by Jason Kincl
Modified: 2023-01-31 21:49 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-31 21:49:34 UTC
Target Upstream Version:
Embargoed:
bot-tracker-sync: rhacm-2.7+
nyechiel: rhacm-2.7.z+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github stolostron backlog issues 26686 0 None None None 2022-11-02 21:02:25 UTC
Github submariner-io cloud-prepare pull 446 0 None open Don't tag Submariner SG as cluster-owned 2022-11-02 16:50:47 UTC

Description Jason Kincl 2022-11-02 16:19:58 UTC
**What happened**:

I am unable to create a Kubernetes Service of Type=LoadBalancer on a cluster with submariner deployed.

```
$ oc create svc loadbalancer test-lb --tcp=80:8080
service/test-lb created
$ oc describe svc test-lb
...
Events:
  Type     Reason                  Age                 From                Message
  ----     ------                  ----                ----                -------
  Normal   EnsuringLoadBalancer    27s (x5 over 106s)  service-controller  Ensuring load balancer
  Warning  SyncLoadBalancerFailed  27s (x5 over 104s)  service-controller  Error syncing load balancer: failed to ensure load balancer: Multiple tagged security groups found for instance i-0dccf1549b1a4c8b0; ensure only the k8s security group is tagged; the tagged groups were sg-0feecd047890a9c58(west-dc-fvrgk-submariner-gw-sg) sg-0cdc8e0e8f98e6a94(terraform-20220923172109149900000002)
```

If I remove the kubernetes.io/cluster/<name> tag from the submariner-gw-sg security group then everything works as expected.

**What you expected to happen**:

The service should sync and a AWS load balancer should be created

**How to reproduce it (as minimally and precisely as possible)**:

Create a LoadBalancer service on a cluster that has submariner installed on AWS

**Anything else we need to know?**:

This appears to be a limitation of the kube cloud-controller for AWS (https://github.com/kubernetes/kubernetes/issues/73906)

I did some more digging into the submariner codebase and I found that back in 2020 the tag was removed from the security group for this very reason: https://github.com/submariner-io/submariner/commit/54e25267a87eb42e5610d8b47070ad98d56e1fde

However when code was refactored and this code was moved to submariner-io/cloud-prepare and the tag was reintroduced: https://github.com/submariner-io/cloud-prepare/blob/devel/pkg/aws/securitygroups.go#L160

**Environment**:
- Submariner version (use `subctl version`): v0.13.1
- Kubernetes version (use `kubectl version`): v1.24.0+b62823b
- Cloud provider or hardware configuration: AWS
- OS: OpenShift 4.11.4

Comment 1 Stephen Kitt 2022-11-02 16:52:09 UTC
Thanks for the detailed investigation!

Comment 2 Maxim Babushkin 2022-12-07 09:54:48 UTC
The fix has been verified.
Load balancer resource has been created successfully.

$ oc create svc loadbalancer test-lb --tcp=80:8080
service/test-lb created

$ oc describe svc test-lb
Name:                     test-lb
Namespace:                default
Labels:                   app=test-lb
Annotations:              <none>
Selector:                 app=test-lb
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       172.30.150.3
IPs:                      172.30.150.3
LoadBalancer Ingress:     a37cd338a5f9d4cb092eb8f4c2948543-759125983.us-west-1.elb.amazonaws.com
Port:                     80-8080  80/TCP
TargetPort:               8080/TCP
NodePort:                 80-8080  30620/TCP
Endpoints:                <none>
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason                Age   From                Message
  ----    ------                ----  ----                -------
  Normal  EnsuringLoadBalancer  30s   service-controller  Ensuring load balancer
  Normal  EnsuredLoadBalancer   27s   service-controller  Ensured load balancer


Note You need to log in before you can comment on or make changes to this bug.