Hide Forgot
e2e-azure-csi tests runs our CSI certification tests and Azure CSI driver consistently fails these tests. Example: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_azure-disk-csi-driver-operator/12/pull-ci-openshift-azure-disk-csi-driver-operator-master-e2e-azure-csi/1380559175712509952 Full history (in the operator repo): https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-azure-disk-csi-driver-operator-master-e2e-azure-csi Failed tests: [Testpattern: Dynamic PV (default fs)] volumes should allow exec of files on the volume [Testpattern: Dynamic PV (default fs)(allowExpansion)] volume-expand should resize volume when PVC is edited while pod is using it [Testpattern: Dynamic PV (default fs)(allowExpansion)] volume-expand Verify if offline PVC expansion works [Testpattern: Pre-provisioned Snapshot (retain policy)] snapshottable[Feature:VolumeSnapshotDataSource] volume snapshot controller should check snapshot fields, check restore correctly works after modifying source data, check deletion [Testpattern: Dynamic PV (default fs)] provisioning should provision storage with mount options [Testpattern: Dynamic PV (default fs)] provisioning should provision storage with snapshot data source [Feature:VolumeSnapshotDataSource] [Testpattern: Dynamic PV (default fs)] fsgroupchangepolicy (OnRootMismatch)[LinuxOnly], pod created with an initial fsgroup, volume contents ownership changed in first pod, new pod with different fsgroup applied to the volume contents [Testpattern: Dynamic PV (default fs)] fsgroupchangepolicy (Always)[LinuxOnly], pod created with an initial fsgroup, new pod fsgroup applied to volume contents [Testpattern: Dynamic PV (block volmode)] multiVolume [Slow] should access to two volumes with different volume mode and retain data across pod recreation on the same node [LinuxOnly] [Testpattern: Dynamic PV (default fs)] subPath should fail if non-existent subpath is outside the volume [Slow][LinuxOnly] [Testpattern: Dynamic PV (block volmode)(allowExpansion)] volume-expand Verify if offline PVC expansion works [Testpattern: Dynamic PV (block volmode)] multiVolume [Slow] should access to two volumes with the same volume mode and retain data across pod recreation on the same node [LinuxOnly] [Testpattern: Dynamic PV (immediate binding)] topology should provision a volume and schedule a pod with AllowedTopologies [Testpattern: Dynamic PV (default fs)] subPath should support restarting containers using file as subpath [Slow][LinuxOnly] [Testpattern: Dynamic PV (default fs)] volumes should store data [Testpattern: Dynamic PV (block volmode)(allowExpansion)] volume-expand should resize volume when PVC is edited while pod is using it [Testpattern: Dynamic PV (xfs)][Slow] volumes should store data [Testpattern: Dynamic PV (block volmode)] provisioning should provision storage with snapshot data source [Feature:VolumeSnapshotDataSource] [Testpattern: Dynamic PV (block volmode)] volumeMode should not mount / map unused volumes in a pod [LinuxOnly] [Testpattern: Dynamic Snapshot (retain policy)] snapshottable[Feature:VolumeSnapshotDataSource] volume snapshot controller should check snapshot fields, check restore correctly works after modifying source data, check deletion
*** Bug 1948535 has been marked as a duplicate of this bug. ***
What I noticed today is that when our CI job enables featureSet: TechPreviewNoUpgrade, it starts the tests relatively quickly afterwards. But the FeatureSet enables also CSI migration and MCO starts draining / restarting machines when the CSI tests are running. I don't think it's the root cause of *all* test failures, but at least it increases flakiness of the CI job. I am trying to wait until the CSI migration is applied everywhere before starting the tests in https://github.com/openshift/release/pull/15360. I.e. when testing manually, wait ~10 minutes after setting the FeatureSet (or watch `oc get node -w` until everything is restarted).
Yes, we need to wait for about 10 minutes for the feature gates are enabled in other components. Tried to wait for all the components are ready and ran the csi verificaiton tool, still found some cases are failed. I think we can use this bug to track the fix in the release repo, and use different bugs to track other issues. So, I'll reopen bug 1948535 (marked as a duplicated bug with this bug) and try to verify this bug first. Hi Fabio, If you think bug 1948535 is still a duplicated bug, feel free to close it.
For PR https://github.com/openshift/release/pull/15360 is not merged yet, I'll update the status to post first.
Currently, there are 2 category of tests that are still failing with Azure Disk CSI driver: snapshots and volume expansion tests. Regarding the snapshot tests, openshift/origin needs to get a k8s.io/* bump so that it contains commit [1]. This should be done in PR [2]. Regarding the expansion tests, the rebase done in PR [3] should have fixed some of the failing tests, but there's still some investigation needed to identify if more fixes for the driver are required. [1] https://github.com/openshift/kubernetes/commit/ad4f896bdef4619f63b9df878a6e78213db4eef0 [2] https://github.com/openshift/origin/pull/26126 [3] https://github.com/openshift/azure-disk-csi-driver/pull/6
this is blocking https://github.com/openshift/origin/pull/26131 sample failure: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/26131/pull-ci-openshift-origin-master-e2e-aws-csi/1392813571842248704 If we are not close to a fix, can we get these tests temporarily disabled to unblock teams?
Ben, this ticket isn't related to that failure (this is Azure CSI driver). The correct ticket tracking this issue is bug 1913974. There is some ongoing work to fix that, I'll check if we can disable the test in the meantime.
For reference, this is the the upstream PR that tries to fix the snapshot issue: https://github.com/kubernetes/kubernetes/pull/102021.
Verified pass
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759