Bug 1948603
Summary: | Azure CSI driver does not pass e2e-azure-csi tests | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jan Safranek <jsafrane> |
Component: | Storage | Assignee: | Fabio Bertinatto <fbertina> |
Storage sub component: | Kubernetes External Components | QA Contact: | Wei Duan <wduan> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | unspecified | CC: | aos-bugs, bparees, fbertina, piqin |
Version: | 4.8 | ||
Target Milestone: | --- | ||
Target Release: | 4.9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-10-18 17:29:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jan Safranek
2021-04-12 15:03:13 UTC
*** Bug 1948535 has been marked as a duplicate of this bug. *** What I noticed today is that when our CI job enables featureSet: TechPreviewNoUpgrade, it starts the tests relatively quickly afterwards. But the FeatureSet enables also CSI migration and MCO starts draining / restarting machines when the CSI tests are running. I don't think it's the root cause of *all* test failures, but at least it increases flakiness of the CI job. I am trying to wait until the CSI migration is applied everywhere before starting the tests in https://github.com/openshift/release/pull/15360. I.e. when testing manually, wait ~10 minutes after setting the FeatureSet (or watch `oc get node -w` until everything is restarted). Yes, we need to wait for about 10 minutes for the feature gates are enabled in other components. Tried to wait for all the components are ready and ran the csi verificaiton tool, still found some cases are failed. I think we can use this bug to track the fix in the release repo, and use different bugs to track other issues. So, I'll reopen bug 1948535 (marked as a duplicated bug with this bug) and try to verify this bug first. Hi Fabio, If you think bug 1948535 is still a duplicated bug, feel free to close it. *** Bug 1948535 has been marked as a duplicate of this bug. *** For PR https://github.com/openshift/release/pull/15360 is not merged yet, I'll update the status to post first. Currently, there are 2 category of tests that are still failing with Azure Disk CSI driver: snapshots and volume expansion tests. Regarding the snapshot tests, openshift/origin needs to get a k8s.io/* bump so that it contains commit [1]. This should be done in PR [2]. Regarding the expansion tests, the rebase done in PR [3] should have fixed some of the failing tests, but there's still some investigation needed to identify if more fixes for the driver are required. [1] https://github.com/openshift/kubernetes/commit/ad4f896bdef4619f63b9df878a6e78213db4eef0 [2] https://github.com/openshift/origin/pull/26126 [3] https://github.com/openshift/azure-disk-csi-driver/pull/6 this is blocking https://github.com/openshift/origin/pull/26131 sample failure: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/26131/pull-ci-openshift-origin-master-e2e-aws-csi/1392813571842248704 If we are not close to a fix, can we get these tests temporarily disabled to unblock teams? Ben, this ticket isn't related to that failure (this is Azure CSI driver). The correct ticket tracking this issue is bug 1913974. There is some ongoing work to fix that, I'll check if we can disable the test in the meantime. For reference, this is the the upstream PR that tries to fix the snapshot issue: https://github.com/kubernetes/kubernetes/pull/102021. Verified pass Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |