Bug 2074706 - Custom EC2 endpoint is not considered by AWS EBS CSI driver
Summary: Custom EC2 endpoint is not considered by AWS EBS CSI driver
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.11.0
Assignee: Hemant Kumar
QA Contact: Penghao Wang
URL:
Whiteboard:
Depends On:
Blocks: 2077894
TreeView+ depends on / blocked
 
Reported: 2022-04-12 20:58 UTC by Aditya Deshpande
Modified: 2022-08-10 11:06 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2077894 (view as bug list)
Environment:
Last Closed: 2022-08-10 11:06:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift aws-ebs-csi-driver-operator pull 153 0 None open Bug 2074706: Set custom endpoint environment variable if available 2022-04-20 19:51:57 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:06:50 UTC

Description Aditya Deshpande 2022-04-12 20:58:10 UTC
Description of problem:
At the time of installation of OCP on AWS, custom EC2 serviceEndpoints can be configured.
After configuring those endpoints because of cluster cannot have internet access and could not reach to public EC2 endpoint, PVC provisioned by storageclass of AWS EBS CSI driver is going into pending state.
~~~
# oc describe pvc test-csi9 -n python-test
Name:          test-csi9
Namespace:     python-test
StorageClass:  gp2-csi-test
Status:        Pending
Volume:
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Block
Used By:       <none>
Events:
  Type     Reason              Age                  From                                                                  Message
  ----     ------              ----                 ----                                                                  -------
  Warning  ProvisioningFailed  110m (x2 over 128m)  ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7  failed to provision volume with StorageClass "gp2-csi-test": rpc error: code = Internal desc = Could not create volume "pvc-e8d59f26-ea20-44ee-8090-f9ad5da6a0d8": could not create volume in EC2: RequestCanceled: request context canceled
caused by: context deadline exceeded

  Normal   Provisioning        105m (x14 over 128m)  ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7  External provisioner is provisioning volume for claim "python-test/test-csi9"

  Warning  ProvisioningFailed  104m (x12 over 128m)  ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7  failed to provision volume with StorageClass "gp2-csi-test": rpc error: code = DeadlineExceeded desc = context deadline exceeded

  Warning  ProvisioningFailed  56m (x3 over 100m)    ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7  failed to provision volume with StorageClass "gp2-csi-test": rpc error: code = Internal desc = Could not create volume "pvc-e8d59f26-ea20-44ee-8090-f9ad5da6a0d8": could not create volume in EC2: RequestError: send request failed
caused by: Post "https://ec2.us-east-1.amazonaws.com/": x509: certificate signed by unknown authority

  Warning  ProvisioningFailed  26m (x5 over 101m)  ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7  failed to provision volume with StorageClass "gp2-csi-test": rpc error: code = Internal desc = Could not create volume "pvc-e8d59f26-ea20-44ee-8090-f9ad5da6a0d8": could not create volume in EC2: RequestCanceled: request context canceled
caused by: context deadline exceeded

  Warning  ProvisioningFailed    11m (x23 over 101m)     ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7 failed to provision volume with StorageClass "gp2-csi-test": rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Normal   Provisioning          3m43s (x33 over 101m)   ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7  External provisioner is provisioning volume for claim "python-test/test-csi9"
  Normal   ExternalProvisioning  3m16s (x514 over 128m)  persistentvolume-controller                                           waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator
~~~

As per the logs of container csi-driver from pod aws-ebs-csi-driver-controller-xxx-xxx of namespace openshift-cluster-csi-drivers:
~~~
2022-03-29T19:12:01.416782663Z E0329 19:12:01.416734       1 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not create volume "pvc-e8d59f26-ea20-44ee-8090-f9ad5da6a0d8": could not create volume in EC2: RequestError: send request failed
2022-03-29T19:12:01.416782663Z caused by: Post "https://ec2.us-east-1.amazonaws.com/": x509: certificate signed by unknown authority
~~~
 
The configmap kube-cloud-config yaml from openshift-cluster-csi-drivers namespace shows correct custom EC2 endpoint as per installation.
(Attaching must-gather)

Version-Release number of selected component (if applicable):
OCP 4.9.24

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
PV provisioning is not working with custom EC2 endpoint.

Expected results:
PVC should get bound to newly created PV referencing custom EC2 endpoint.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:
As mentioned above

StorageClass Dump (if StorageClass used by PV/PVC):
# omg get sc gp2-csi-test -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: '2022-03-17T18:00:12Z'
  name: gp2-csi-test
  resourceVersion: '4012135'
  uid: baa1ba52-390b-44e8-b2e1-50dfa7d0dcbe
parameters:
  encrypted: 'true'
  type: gp2
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: Immediate


Additional info:

Comment 5 Jan Safranek 2022-04-19 14:10:12 UTC
We need to update both AWS EBS and AWS EFS CSI driver operators to pass the endpoint to the driver + ensure the drivers have necessary support for them (EBS should be fine, support for custom endpoints in EFS is unknown).

Comment 6 Hemant Kumar 2022-04-20 19:50:38 UTC
afaict - the EFS driver does not yet support custom endpoint and hence support for custom endpoints has to be implemented in the driver and backported. But then again - we need to figure out if EFS driver should provide a different mechanism of overriding efs endpoints, because EFS service in AWS is distinct from EC2 service (and so are their endpoints).

Comment 18 errata-xmlrpc 2022-08-10 11:06:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.