Bug 2049872

Summary: cluster storage operator AWS credentialsrequest lacks KMS privileges
Product: OpenShift Container Platform Reporter: Dale Bewley <dbewley>
Component: StorageAssignee: Jonathan Dobson <jdobson>
Storage sub component: Storage QA Contact: Wei Duan <wduan>
Status: CLOSED ERRATA Docs Contact: Lisa Pettyjohn <lpettyjo>
Severity: high    
Priority: high CC: adeshpan, aos-bugs, awestbro, jdobson, jsafrane, pkhaire, yunjiang
Version: 4.9   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Release Note
Doc Text:
The default credentials request for AWS has been modified to allow mounting of encrypted volumes using customer managed keys from KMS. Administrators who created credentials requests in manual mode with CCO will need to apply those changes manually if they intend to mount encrypted volumes using customer managed keys on AWS. Other administrators should not be impacted by this change.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 10:46:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2072191    

Description Dale Bewley 2022-02-02 20:10:48 UTC
Description of problem:

 Cluster storage operator credentials request for AWS does not include KMS statements. This leads to failure to deploy PVs due to inability to provide a key.

Version-Release number of selected component (if applicable):

 Tested on 4.9.12.

How reproducible:

 Always

Steps to Reproduce:
1. Install to restricted AWS environment with out KMS privileges by default
2. Create IAM roles with ccoctl from credentials requests
3. Create pvc in gp2-csi storage class (same problem, deiff error in gp2 SC)

        cat <<EOF | oc create -n $PROJ -f -
        apiVersion: v1
        kind: PersistentVolumeClaim
        metadata:
          name: data-csi
        spec:
          storageClassName: "gp2-csi"
          resources:
            requests:
              storage: 512Mi
          accessModes:
            - ReadWriteOnce
        EOF
        oc set volume -n $PROJ deployment/demo --add -m /opt/app-root/src/data \
                --name=data -t persistentVolumeClaim --claim-name=data-csi
        oc describe pvc data-csi -n $PROJ

Actual results:

 PVC in gp2-csi class results in:

    Events:
      Type     Reason                Age                From                                                                     Message
      ----     ------                ----               ----                                                                     -------
      Normal   WaitForFirstConsumer  27s (x3 over 57s)  persistentvolume-controller                                              waiting for first consumer to be created before binding
      Warning  ProvisioningFailed    15s                ebs.csi.aws.com_ip-100-127-136-183_16213ddf-eedf-480a-84ed-116b1df1caa6  failed to provision volume with StorageClass "gp2-csi": rpc error: code = Internal desc = Could not create volume "pvc-eeb59df1-f744-46e9-b0c4-288f3c8d1bc1": failed to get an available volume in EC2: InvalidVolume.NotFound: The volume 'vol-0c94abf5d9bfe6680' does not exist.
               status code: 400, request id: 3c39958c-353a-48c2-bc75-37bd04673718
      Normal   ExternalProvisioning  12s (x2 over 19s)  persistentvolume-controller                                              waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator
      Normal   Provisioning          8s (x4 over 19s)   ebs.csi.aws.com_ip-100-127-136-183_16213ddf-eedf-480a-84ed-116b1df1caa6  External provisioner is provisioning volume for claim "dale/data-1150"
      Warning  ProvisioningFailed    7s (x3 over 14s)   ebs.csi.aws.com_ip-100-127-136-183_16213ddf-eedf-480a-84ed-116b1df1caa6  failed to provision volume with StorageClass "gp2-csi": rpc error: code = AlreadyExists desc = Could not create volume "pvc-eeb59df1-f744-46e9-b0c4-288f3c8d1bc1": Parameters on this idempotent request are inconsistent with parameters used in previous request(s)


 PVC in the gp2 class (using the in-tree driver) results in error providing clue to cause of failure above:

        Events:
          Type     Reason                Age                From                         Message
          ----     ------                ----               ----                         -------
          Normal   WaitForFirstConsumer  13s (x3 over 41s)  persistentvolume-controller  waiting for first consumer to be created before binding
          Warning  ProvisioningFailed    6s                 persistentvolume-controller  Failed to provision volume with StorageClass "gp2": failed to create encrypted volume: the volume disappeared after creation, most likely due to inaccessible KMS encryption key
          Normal   WaitForPodScheduled   6s                 persistentvolume-controller  waiting for pod demo-5c75b5598f-gvpvv to be scheduled


Expected results:

 PV created and PVC bound.

 After adding to following policy statement to the IAM role used by csi operator pod:

        {
            "Sid": "AddKMS0",
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt",
                "kms:Encrypt",
                "kms:GenerateDataKey",
                "kms:GenerateDataKeyWithoutPlainText",
                "kms:DescribeKey"
            ],
            "Resource": "*"
        }

 PVC binds properly:

    Events:
      Type    Reason                 Age                From                                                                     Message
      ----    ------                 ----               ----                                                                     -------
      Normal  WaitForFirstConsumer   31s (x4 over 65s)  persistentvolume-controller                                              waiting for first consumer to be created before binding
      Normal  Provisioning           17s                ebs.csi.aws.com_ip-100-127-136-183_16213ddf-eedf-480a-84ed-116b1df1caa6  External provisioner is provisioning volume for claim "dale/data-1147"
      Normal  ExternalProvisioning   16s (x3 over 17s)  persistentvolume-controller                                              waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator
      Normal  ProvisioningSucceeded  14s                ebs.csi.aws.com_ip-100-127-136-183_16213ddf-eedf-480a-84ed-116b1df1caa6  Successfully provisioned volume pvc-81ea5b8e-bbe9-4d10-9fc4-452a58e66d79


Additional info:

 The credentials request lacks any KMS actions.
 https://github.com/openshift/cluster-storage-operator/blob/master/manifests/03_credentials_request_aws.yaml#L20

 Contrast this to machine API which uses KMS for boot disk encryption.
 https://github.com/openshift/machine-api-operator/blob/master/install/0000_30_machine-api-operator_00_credentials-request.yaml#L45-L58

Comment 4 Wei Duan 2022-03-07 11:07:27 UTC
Reproduced in 4.10.2 without fix:


$ oc get pvc
NAME    STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
mypvc   Pending                                      gp2-csi-enc    56m

Events:
  Type     Reason               Age   From                                                                  Message
  ----     ------               ----  ----                                                                  -------
  Normal   WaitForPodScheduled  52m   persistentvolume-controller                                           waiting for pod mypod to be scheduled
  Warning  ProvisioningFailed   52m   ebs.csi.aws.com_ip-10-0-189-237_f9750b96-cdf4-4298-b399-f67e49be6119  failed to provision volume with StorageClass "gp2-csi-enc": rpc error: code = Internal desc = Could not create volume "pvc-047ae64e-8d1c-41d1-8140-7fc91ce1541c": failed to get an available volume in EC2: InvalidVolume.NotFound: The volume 'vol-05326553803a304c6' does not exist.
           status code: 400, request id: a642efc0-7074-4acb-b0f4-869ec913cd00
  Warning  ProvisioningFailed    26m (x14 over 52m)         ebs.csi.aws.com_ip-10-0-189-237_f9750b96-cdf4-4298-b399-f67e49be6119  failed to provision volume with StorageClass "gp2-csi-enc": rpc error: code = AlreadyExists desc = Could not create volume "pvc-047ae64e-8d1c-41d1-8140-7fc91ce1541c": Parameters on this idempotent request are inconsistent with parameters used in previous request(s)
  Normal   ExternalProvisioning  <invalid> (x228 over 52m)  persistentvolume-controller                                           waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator
  Normal   Provisioning          <invalid> (x23 over 52m)   ebs.csi.aws.com_ip-10-0-189-237_f9750b96-cdf4-4298-b399-f67e49be6119  External provisioner is provisioning volume for claim "wduan/mypvc"
[wduan@preserve-wduan-ws ~]$ oc get sc
NAME            PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
gp2 (default)   kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   true                   3h59m
gp2-csi         ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   3h58m
gp2-csi-enc     ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   58m
gp3-csi         ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   3h58m

Comment 5 Wei Duan 2022-03-07 11:41:07 UTC
Verified pass with 

$ oc get pvc
NAME    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
mypvc   Bound    pvc-097f3046-495d-4b01-90a1-f21bd001bccf   2Gi        RWO            gp2-csi-enc    8m8s

$ oc get pod
NAME    READY   STATUS    RESTARTS   AGE
mypod   1/1     Running   0          7m39s

Marked as Verified.

Comment 6 Jan Safranek 2022-03-24 09:24:30 UTC
*** Bug 2066813 has been marked as a duplicate of this bug. ***

Comment 10 errata-xmlrpc 2022-08-10 10:46:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069