A few examples: Block: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/24633/pull-ci-openshift-origin-master-e2e-aws-csi/158 FS: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/24622/pull-ci-openshift-origin-master-e2e-aws-csi/165 External Storage [Driver: ebs.csi.aws.com] [Testpattern: Dynamic PV (block volmode)(allowExpansion)] volume-expand Verify if offline PVC expansion works expand_less 5m57s fail [k8s.io/kubernetes/test/e2e/storage/testsuites/volume_expand.go:213]: while recreating pod for resizing Unexpected error: <*errors.errorString | 0xc002801c80>: { s: "pod \"security-context-70334232-54ff-4328-b510-acc0ac7a3a95\" is not Running: timed out waiting for the condition", } pod "security-context-70334232-54ff-4328-b510-acc0ac7a3a95" is not Running: timed out waiting for the condition occurred
PR is not merged right now. Update the status to assigned.
Inspected a new failure here: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/pr-logs/pull/24680/pull-ci-openshift-origin-master-e2e-aws-csi/2908 1. This is the first time the controller tried to attach the resized volume to a new pod (15:05:37): May 26 15:15:45.314: INFO: At 2020-05-26 15:05:37 +0000 UTC - event for security-context-eb8d1bb4-af5b-4fbb-bcc5-35bfcd225d3e: {attachdetach-controller } FailedAttachVolume: AttachVolume.Attach failed for volume "pvc-ffe583b7-5d6d-4937-84ff-5cb5e02df464" : rpc error: code = Internal desc = Could not attach volume "vol-069d2b1853f909252" to node "i-045220a861c8e0b43": could not attach volume "vol-069d2b1853f909252" to node "i-045220a861c8e0b43": OperationNotPermitted: Cannot attach volume vol-069d2b1853f909252 when it is in modification state: MODIFYING 2. As we can see, it failed. However, it retries many times, until it succeeds more than 8 minutes later: May 26 15:15:45.314: INFO: At 2020-05-26 15:13:51 +0000 UTC - event for security-context-eb8d1bb4-af5b-4fbb-bcc5-35bfcd225d3e: {attachdetach-controller } SuccessfulAttachVolume: AttachVolume.Attach succeeded for volume "pvc-ffe583b7-5d6d-4937-84ff-5cb5e02df464" 3. So far it spent 8 min and 14 secs out of 10 minutes of the deadline. 4. The deadline of 10 min is reached, so the pod is deleted and the test fails May 26 15:15:37.004: INFO: Deleting pod "security-context-eb8d1bb4-af5b-4fbb-bcc5-35bfcd225d3e" in namespace "e2e-volume-expand-4993"
The cause of this flake is: reattaching a resized volume in AWS can take many minutes, exceeding the deadlines we tried. We'll have to fix this in upstream. One possible solutions is add some granularity to deadlines, making them configurable per plugin/driver.
Will work on the upstream issue: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/498
@hekumar is working on a fix.
Passed on 4.6.0-0.nightly-2020-08-27-005538/4.6.0-0.nightly-2020-08-26-202109/4.6.0-0.nightly-2020-08-26-093617
I hit this recently (around 2 hours ago): https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_aws-ebs-csi-driver-operator/83/pull-ci-openshift-aws-ebs-csi-driver-operator-master-e2e-operator/1300345097472184320 @chao, could you take a look?
failed again.
Yes it is failed again. @fbertina Thanks for finding this.
Still hit similar error in here https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/25442/pull-ci-openshift-origin-master-e2e-aws-csi/1301692073983873024
Checked several job here https://prow.ci.openshift.org/?job=pull-ci-openshift-origin-master-e2e-aws-csi Passed this bz.