Hide Forgot
Description of problem: At the time of installation of OCP on AWS, custom EC2 serviceEndpoints can be configured. After configuring those endpoints because of cluster cannot have internet access and could not reach to public EC2 endpoint, PVC provisioned by storageclass of AWS EBS CSI driver is going into pending state. ~~~ # oc describe pvc test-csi9 -n python-test Name: test-csi9 Namespace: python-test StorageClass: gp2-csi-test Status: Pending Volume: Labels: <none> Annotations: volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Block Used By: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ProvisioningFailed 110m (x2 over 128m) ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7 failed to provision volume with StorageClass "gp2-csi-test": rpc error: code = Internal desc = Could not create volume "pvc-e8d59f26-ea20-44ee-8090-f9ad5da6a0d8": could not create volume in EC2: RequestCanceled: request context canceled caused by: context deadline exceeded Normal Provisioning 105m (x14 over 128m) ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7 External provisioner is provisioning volume for claim "python-test/test-csi9" Warning ProvisioningFailed 104m (x12 over 128m) ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7 failed to provision volume with StorageClass "gp2-csi-test": rpc error: code = DeadlineExceeded desc = context deadline exceeded Warning ProvisioningFailed 56m (x3 over 100m) ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7 failed to provision volume with StorageClass "gp2-csi-test": rpc error: code = Internal desc = Could not create volume "pvc-e8d59f26-ea20-44ee-8090-f9ad5da6a0d8": could not create volume in EC2: RequestError: send request failed caused by: Post "https://ec2.us-east-1.amazonaws.com/": x509: certificate signed by unknown authority Warning ProvisioningFailed 26m (x5 over 101m) ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7 failed to provision volume with StorageClass "gp2-csi-test": rpc error: code = Internal desc = Could not create volume "pvc-e8d59f26-ea20-44ee-8090-f9ad5da6a0d8": could not create volume in EC2: RequestCanceled: request context canceled caused by: context deadline exceeded Warning ProvisioningFailed 11m (x23 over 101m) ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7 failed to provision volume with StorageClass "gp2-csi-test": rpc error: code = DeadlineExceeded desc = context deadline exceeded Normal Provisioning 3m43s (x33 over 101m) ebs.csi.aws.com_ip-100-90-xxx-384850fb46d7 External provisioner is provisioning volume for claim "python-test/test-csi9" Normal ExternalProvisioning 3m16s (x514 over 128m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "ebs.csi.aws.com" or manually created by system administrator ~~~ As per the logs of container csi-driver from pod aws-ebs-csi-driver-controller-xxx-xxx of namespace openshift-cluster-csi-drivers: ~~~ 2022-03-29T19:12:01.416782663Z E0329 19:12:01.416734 1 driver.go:119] GRPC error: rpc error: code = Internal desc = Could not create volume "pvc-e8d59f26-ea20-44ee-8090-f9ad5da6a0d8": could not create volume in EC2: RequestError: send request failed 2022-03-29T19:12:01.416782663Z caused by: Post "https://ec2.us-east-1.amazonaws.com/": x509: certificate signed by unknown authority ~~~ The configmap kube-cloud-config yaml from openshift-cluster-csi-drivers namespace shows correct custom EC2 endpoint as per installation. (Attaching must-gather) Version-Release number of selected component (if applicable): OCP 4.9.24 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: PV provisioning is not working with custom EC2 endpoint. Expected results: PVC should get bound to newly created PV referencing custom EC2 endpoint. Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: As mentioned above StorageClass Dump (if StorageClass used by PV/PVC): # omg get sc gp2-csi-test -o yaml allowVolumeExpansion: true apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: creationTimestamp: '2022-03-17T18:00:12Z' name: gp2-csi-test resourceVersion: '4012135' uid: baa1ba52-390b-44e8-b2e1-50dfa7d0dcbe parameters: encrypted: 'true' type: gp2 provisioner: ebs.csi.aws.com reclaimPolicy: Delete volumeBindingMode: Immediate Additional info:
We need to update both AWS EBS and AWS EFS CSI driver operators to pass the endpoint to the driver + ensure the drivers have necessary support for them (EBS should be fine, support for custom endpoints in EFS is unknown).
afaict - the EFS driver does not yet support custom endpoint and hence support for custom endpoints has to be implemented in the driver and backported. But then again - we need to figure out if EFS driver should provide a different mechanism of overriding efs endpoints, because EFS service in AWS is distinct from EC2 service (and so are their endpoints).
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069