Bug 2074706

Summary: Custom EC2 endpoint is not considered by AWS EBS CSI driver
Product: OpenShift Container Platform Reporter: Aditya Deshpande <adeshpan>
Component: StorageAssignee: Hemant Kumar <hekumar>
Storage sub component: Storage QA Contact: Penghao Wang <pewang>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: aos-bugs, hekumar, jsafrane, pewang
Version: 4.9   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2077894 (view as bug list) Environment:
Last Closed: 2022-08-10 11:06:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2077894    

Description Aditya Deshpande 2022-04-12 20:58:10 UTC
Description of problem:
At the time of installation of OCP on AWS, custom EC2 serviceEndpoints can be configured.
After configuring those endpoints because of cluster cannot have internet access and could not reach to public EC2 endpoint, PVC provisioned by storageclass of AWS EBS CSI driver is going into pending state.
~~~
# oc describe pvc test-csi9 -n python-test
Name:          test-csi9
Namespace:     python-test
StorageClass:  gp2-csi-test
Status:        Pending
Volume:
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Block
Used By:       <none>
Events:
  Type     Reason              Age                  From                                                                  Message
  ----     ------              ----                 ----                                                                  -------
  Warning  ProvisioningFailed  110m (x2 over 128m)  ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7  failed to provision volume with StorageClass "gp2-csi-test": rpc error: code = Internal desc = Could not create volume "pvc-e8d59f26-ea20-44ee-8090-f9ad5da6a0d8": could not create volume in EC2: RequestCanceled: request context canceled
caused by: context deadline exceeded

  Normal   Provisioning        105m (x14 over 128m)  ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7  External provisioner is provisioning volume for claim "python-test/test-csi9"

  Warning  ProvisioningFailed  104m (x12 over 128m)  ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7  failed to provision volume with StorageClass "gp2-csi-test": rpc error: code = DeadlineExceeded desc = context deadline exceeded

  Warning  ProvisioningFailed  56m (x3 over 100m)    ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7  failed to provision volume with StorageClass "gp2-csi-test": rpc error: code = Internal desc = Could not create volume "pvc-e8d59f26-ea20-44ee-8090-f9ad5da6a0d8": could not create volume in EC2: RequestError: send request failed
caused by: Post "https://ec2.us-east-1.amazonaws.com/": x509: certificate signed by unknown authority

  Warning  ProvisioningFailed  26m (x5 over 101m)  ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7  failed to provision volume with StorageClass "gp2-csi-test": rpc error: code = Internal desc = Could not create volume "pvc-e8d59f26-ea20-44ee-8090-f9ad5da6a0d8": could not create volume in EC2: RequestCanceled: request context canceled
caused by: context deadline exceeded

  Warning  ProvisioningFailed    11m (x23 over 101m)     ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7 failed to provision volume with StorageClass "gp2-csi-test": rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   Provisioning          3m43s (x33 over 101m)   ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7  External provisioner is provisioning volume for claim "python-test/test-csi9"
  Normal   ExternalProvisioning  3m16s (x514 over 128m)  persistentvolume-controller                                           waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator
~~~

As per the logs of container csi-driver from pod aws-ebs-csi-driver-controller-xxx-xxx of namespace openshift-cluster-csi-drivers:
~~~
2022-03-29T19:12:01.416782663Z E0329 19:12:01.416734       1 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not create volume "pvc-e8d59f26-ea20-44ee-8090-f9ad5da6a0d8": could not create volume in EC2: RequestError: send request failed
2022-03-29T19:12:01.416782663Z caused by: Post "https://ec2.us-east-1.amazonaws.com/": x509: certificate signed by unknown authority
~~~
 
The configmap kube-cloud-config yaml from openshift-cluster-csi-drivers namespace shows correct custom EC2 endpoint as per installation.
(Attaching must-gather)

Version-Release number of selected component (if applicable):
OCP 4.9.24

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
PV provisioning is not working with custom EC2 endpoint.

Expected results:
PVC should get bound to newly created PV referencing custom EC2 endpoint.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:
As mentioned above

StorageClass Dump (if StorageClass used by PV/PVC):
# omg get sc gp2-csi-test -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: '2022-03-17T18:00:12Z'
  name: gp2-csi-test
  resourceVersion: '4012135'
  uid: baa1ba52-390b-44e8-b2e1-50dfa7d0dcbe
parameters:
  encrypted: 'true'
  type: gp2
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: Immediate


Additional info:

Comment 5 Jan Safranek 2022-04-19 14:10:12 UTC
We need to update both AWS EBS and AWS EFS CSI driver operators to pass the endpoint to the driver + ensure the drivers have necessary support for them (EBS should be fine, support for custom endpoints in EFS is unknown).

Comment 6 Hemant Kumar 2022-04-20 19:50:38 UTC
afaict - the EFS driver does not yet support custom endpoint and hence support for custom endpoints has to be implemented in the driver and backported. But then again - we need to figure out if EFS driver should provide a different mechanism of overriding efs endpoints, because EFS service in AWS is distinct from EC2 service (and so are their endpoints).

Comment 18 errata-xmlrpc 2022-08-10 11:06:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069