Bug 1990190

Summary: e2e testing failed with basic manifest: reason/ExternalProvisioning waiting for a volume to be created
Product: OpenShift Container Platform Reporter: Emilien Macchi <emacchi>
Component: StorageAssignee: Emilien Macchi <emacchi>
Storage sub component: OpenStack CSI Drivers QA Contact: rlobillo
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, pprinett, rlobillo
Version: 4.9   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-12 04:37:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Emilien Macchi 2021-08-05 00:32:16 UTC
When running the e2e CSI tests with Manila manifest, this events happens too frequently:

event happened 23 times, something is wrong: ns/e2e-volumemode-6704 persistentvolumeclaim/pvc-k9l59 - reason/ExternalProvisioning waiting for a volume to be created, either by external provisioner "manila.csi.openstack.org" or manually created by system administrator

Full logs:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_csi-driver-manila-operator/113/pull-ci-openshift-csi-driver-manila-operator-master-e2e-openstack-csi/1422978191517028352/artifacts/e2e-openstack-csi/openshift-e2e-test/artifacts/e2e.log

Aug 04 18:41:03.000 I ns/e2e-volumemode-6704 persistentvolumeclaim/pvc-k9l59 reason/ExternalProvisioning waiting for a volume to be created, either by external provisioner "manila.csi.openstack.org" or manually created by system administrator
Aug 04 18:41:03.000 I ns/e2e-multivolume-8534 persistentvolumeclaim/manila.csi.openstack.org2bzts reason/ExternalProvisioning waiting for a volume to be created, either by external provisioner "manila.csi.openstack.org" or manually created by system administrator
Aug 04 18:41:03.000 I ns/e2e-multivolume-8534 persistentvolumeclaim/manila.csi.openstack.org2bzts reason/Provisioning External provisioner is provisioning volume for claim "e2e-multivolume-8534/manila.csi.openstack.org2bzts"
Aug 04 18:41:03.000 I ns/e2e-volumemode-6704 persistentvolumeclaim/pvc-k9l59 reason/Provisioning External provisioner is provisioning volume for claim "e2e-volumemode-6704/pvc-k9l59"
Aug 04 18:41:03.000 W ns/e2e-volumemode-6704 persistentvolumeclaim/pvc-k9l59 reason/ProvisioningFailed failed to provision volume with StorageClass "e2e-volumemode-6704-e2e-sc8xhxl": rpc error: code = InvalidArgument desc = block access type not allowed
Aug 04 18:41:03.000 W ns/e2e-provisioning-2579 pod/pod-subpath-test-dynamicpv-tnff node/jdmc36rw-1ba69-fpd97-worker-0-ndtwt reason/Unhealthy Liveness probe failed: cat: can't open '/probe-volume/probe-file': No such file or directory\n (3 times)

Comment 2 egarcia 2021-10-18 13:53:08 UTC
Without this bugfix, the test suite will always fail. This is an important fix for the reliability of this test suite.

Comment 5 rlobillo 2021-11-04 15:39:35 UTC
[1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/23227/rehearse-23227-periodic-ci-openshift-release-master-nightly-4.10-e2e-openstack-csi-manila/1455875387245465600/artifacts/e2e-openstack-csi-manila/openshift-e2e-test/artifacts/e2e.log

On [1] it is observed that the test is passing in one second:

started: (0/177/218) "External Storage [Driver: manila.csi.openstack.org] [Testpattern: Dynamic PV (block volmode)] volumeMode should fail in binding dynamic provisioned PV to PVC [Slow][LinuxOnly]"
passed: (1s) 2021-11-03T13:28:53 "External Storage [Driver: manila.csi.openstack.org] [Testpattern: Dynamic PV (block volmode)] volumeMode should fail in binding dynamic provisioned PV to PVC [Slow][LinuxOnly]"

and the event is not repeated and the synthetic test is passing now:

Flaky invariants:

[sig-arch] Monitor cluster while tests execute

Writing JUnit report to /logs/artifacts/junit/junit_e2e_20211103-133038.xml

29 pass, 189 skip (2m54s)

Moreover, the tests are passing too also on a running setup with 4.10.0-0.nightly-2021-10-28-211203 on top of OSP16.1 (RHOS-16.1-RHEL-8-20210818.n.0):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-10-28-211203   True        False         62m     Cluster version is 4.10.0-0.nightly-2021-10-28-211203


$ TEST_CSI_DRIVER_FILES=./manifest.yaml KUBECONFIG=./.kube/config ./openshift-tests run openshift/csi --output-file manilacsi_results/manilacsi_ocp-test.log --junit-dir=manilacsi_results/
[...]
Flaky invariants:

[sig-arch] Monitor cluster while tests execute

Writing JUnit report to manilacsi_results/junit_e2e_20211104-150950.xml

29 pass, 189 skip (21m46s)

$ grep "volumeMode should fail in binding dynamic provisioned PV to PVC" manilacsi_ocp-test.log 
started: (0/125/218) "External Storage [Driver: manila.csi.openstack.org] [Testpattern: Dynamic PV (block volmode)] volumeMode should fail in binding dynamic provisioned PV to PVC [Slow][LinuxOnly]"
passed: (2.7s) 2021-11-04T15:00:51 "External Storage [Driver: manila.csi.openstack.org] [Testpattern: Dynamic PV (block volmode)] volumeMode should fail in binding dynamic provisioned PV to PVC [Slow][LinuxOnly]"


where:

$ cat manifest.yaml
# Test manifest for https://github.com/kubernetes/kubernetes/tree/master/test/e2e/storage/external
ShortName: manila
StorageClass:
  # The Storage Class is generated by the Manila CSI Operator, based on available
  # share types that have been created in Manila. For CI purpose, we assume
  # that a "default" share type exists and will be used for the StorageClass.
  # If you run this manifest against an OpenStack cloud where there is no "default" share type,
  # you'll need to change the value.
  FromExistingClassName: csi-manila-default
SnapshotClass:
  FromName: true
DriverInfo:
  Name: manila.csi.openstack.org
  SupportedSizeRange:
    Min: 1Gi
    Max: 16Ti

Comment 7 ShiftStack Bugwatcher 2021-11-25 16:12:04 UTC
Removing the Triaged keyword because:

* the QE automation assessment (flag qe_test_coverage) is missing

Comment 12 errata-xmlrpc 2022-03-12 04:37:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056