Bug 2098054

Summary: The control plane should tag AWS security groups at creation
Product: OpenShift Container Platform Reporter: Wu Siu Wa <siwu>
Component: Cloud ComputeAssignee: Mike Fedosin <mfedosin>
Cloud Compute sub component: Cloud Controller Manager QA Contact: Huali Liu <huliu>
Status: CLOSED ERRATA Docs Contact: Jeana Routh <jrouth>
Severity: medium    
Priority: medium CC: haowang
Version: 4.10Keywords: ServiceDeliveryImpact
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
* With this release, AWS security groups are tagged immediately instead of after creation. This means that fewer requests are sent to AWS and the required user privileges are lowered. (link:https://bugzilla.redhat.com/show_bug.cgi?id=2098054[*BZ#2098054*], link:https://issues.redhat.com/browse/OCPBUGS-3094[*OCPBUGS-3094*])
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-17 19:50:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Wu Siu Wa 2022-06-17 07:10:23 UTC
Description of problem:


Version-Release number of selected component (if applicable):
ROSA 4.10.16

How reproducible:
100%

Steps to Reproduce:
1. Create a ROSA cluster follow https://www.rosaworkshop.io/rosa/2-deploy/
2. It will create a default ingress controller[1]
3. From AWS cloudtrail, we can see a CreateSecurityGroup event, then a CreateTags event.

Actual results:
- In the CreateSecurityGroup event, there's no TagSpecification[2] in the requestParameters.
- Then there's a CreateTags event that add the tags to the security group.

Expected results:
- The CreateSecurityGroup create the security group with TagSpecification.
- No CreateTags event after that.


Additional info:

FYI, the ROSA installation will pass the below parameters to install-config:
~~~
platform:
    aws:
        experimentalPropagateUserTags: true
        region: us-east-1
        userTags:
            red-hat-clustertype: rosa
            red-hat-managed: "true"
~~~

User story:
We want to have tag-based authorization in AWS.[3]
If the control plane creates resources THEN add tags, we can not restrict the aws:RequestTag in the Create* call, and we need to open up the ec2:CreateTags actions.
The expected behavior is the control plane creates resources WITH tags, so aws:RequestTag is practical.


[1]
kind: IngressController
metadata:
  finalizers:
  - ingresscontroller.operator.openshift.io/finalizer-ingresscontroller
  name: default
  namespace: openshift-ingress-operator
spec:
  defaultCertificate:
    name: siwu-test-primary-cert-bundle-secret
  domain: apps.siwu-test.xqmr.s1.devshift.org
  httpEmptyRequestsPolicy: Respond
  nodePlacement:
    nodeSelector:
      matchLabels:
        node-role.kubernetes.io/infra: ""
    tolerations:
    - effect: NoSchedule
      key: node-role.kubernetes.io/infra
      operator: Exists
  replicas: 2


[2] https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CreateSecurityGroup.html
[3] https://aws.amazon.com/premiumsupport/knowledge-center/iam-policy-tags-restrict/

Comment 1 Wu Siu Wa 2022-06-17 07:16:21 UTC
Example cloudtrail:

~~~
    "eventSource": "ec2.amazonaws.com",
    "eventName": "CreateSecurityGroup",
    "awsRegion": "us-east-1",
    "userAgent": "kubernetes/v1.23.5+3afdacb aws-sdk-go/1.38.49 (go1.17.5; linux; amd64)",
    "requestParameters": {
        "groupName": "k8s-elb-a23ae183fe51c440194ceccdd4de0609",
        "groupDescription": "Security group for Kubernetes ELB a23ae183fe51c440194ceccdd4de0609 (openshift-ingress/router-default)",
        "vpcId": "vpc-xxxxxxxxxx"
    },
    "responseElements": {
        "requestId": "xxxxx",
        "_return": true,
        "groupId": "sg-011bfd5afe792f248"
    },
~~~

~~~
    "eventSource": "ec2.amazonaws.com",
    "eventName": "CreateTags",
    "awsRegion": "us-east-1",
    "userAgent": "kubernetes/v1.23.5+3afdacb aws-sdk-go/1.38.49 (go1.17.5; linux; amd64)",
    "requestParameters": {
        "resourcesSet": {
            "items": [
                {
                    "resourceId": "sg-011bfd5afe792f248"
                }
            ]
        },
        "tagSet": {
            "items": [
                {
                    "key": "red-hat-clustertype",
                    "value": "rosa"
                },
                {
                    "key": "red-hat-managed",
                    "value": "true"
                },
                {
                    "key": "kubernetes.io/cluster/siwu-test-4j7sm",
                    "value": "owned"
                }
            ]
        }
    },
    "responseElements": {
        "requestId": "xxxxx",
        "_return": true
    },
~~~

Searching the k8s code, the logic seems from:
https://github.com/kubernetes/legacy-cloud-providers/blob/3003d39fee75897d5fbf5511a20707c68094c99c/aws/aws.go#L3720
https://github.com/kubernetes/legacy-cloud-providers/blob/3003d39fee75897d5fbf5511a20707c68094c99c/aws/aws.go#L3325

Comment 2 Joel Speed 2022-06-20 09:54:03 UTC
The tagging of the security group comes from an annotation [1] set on the service object. Whichever operator is creating that service needs to set that annotation to match the value of the platform tags. As a first step, do we know if that's being done? IE are the correct tags being applied?

If they are, then great, we can look into getting that create request updated, though, please bear in mind that the legacy cloud provider is in maintenance mode and we may face some back-pressure upstream from a change such as this.

Have you tried to create a TechPreviewNoUpgrade cluster and try this feature out there? It appears that in the external cloud provider (enabled by tech preview), that the tags are on the create request [2]. If that's the case, and this is working in tech preview, we may want to look into promoting the external cloud provider to GA to avoid the potential back pressure of a change to the legacy cloud provider.

[1]: https://github.com/kubernetes/legacy-cloud-providers/blob/3003d39fee75897d5fbf5511a20707c68094c99c/aws/aws.go#L184-L188
[2]: https://github.com/kubernetes/cloud-provider-aws/blob/bb211bf9f085a86bc376ea715242f946a60d961e/pkg/providers/v1/aws_loadbalancer.go#L1002-L1006

Comment 3 Joel Speed 2022-07-04 15:38:18 UTC
Mike is going to look into implementing this within the CCM/CCM operator

Comment 4 Mike Fedosin 2022-08-11 12:51:39 UTC
This bug was fixed in the CCM with https://github.com/kubernetes/cloud-provider-aws/pull/293.

As Joel said, it's unlikely that they will allow us to modify the legacy code, so I created a carry PR for openshift/kubernetes https://github.com/openshift/kubernetes/pull/1340 that basically copies CCM's 293

Comment 11 errata-xmlrpc 2023-01-17 19:50:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399