Bug 2084463

Summary: 5 control plane replica tests fail on ephemeral volumes
Product: OpenShift Container Platform Reporter: Thomas Jungblut <tjungblu>
Component: StorageAssignee: Fabio Bertinatto <fbertina>
Storage sub component: Storage QA Contact: Wei Duan <wduan>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: jsafrane, lcosic
Version: 4.11   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:11:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Thomas Jungblut 2022-05-12 08:13:14 UTC
looking into the 5 control plane replica test and I saw some failures related to in-tree ephemeral volumes:

Failing tests:
[sig-storage] In-tree Volumes [Driver: gcepd] [Testpattern: Generic Ephemeral-volume (default fs) (immediate-binding)] ephemeral should create read-only inline ephemeral volume [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: gcepd] [Testpattern: Generic Ephemeral-volume (default fs) (immediate-binding)] ephemeral should create read/write inline ephemeral volume [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: gcepd] [Testpattern: Generic Ephemeral-volume (default fs) (immediate-binding)] ephemeral should support two pods which have the same volume definition [Suite:openshift/conformance/parallel] [Suite:k8s]


example build:
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-etc[…]er-e2e-gcp-five-control-plane-replicas/1522236579484012544

you can grab any PR build in CEO https://github.com/openshift/cluster-etcd-operator/pulls and check, this is perma failing since Jan 21st.

alternatively ci search:
https://search.ci.openshift.org/?search=ephemeral+should+create+read-only+inline+ephemeral+volume&maxAge=168h&context=1&type=junit&name=.*five-control-plane-replicas&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 1 Thomas Jungblut 2022-05-12 08:15:03 UTC
*** Bug 1999964 has been marked as a duplicate of this bug. ***

Comment 2 Fabio Bertinatto 2022-05-19 20:12:31 UTC
These tests are failing because pods cannot start due to NodeAffinity requirements in the volumes.

This is triggered by two events common to this job:

1. There are master nodes in all available zones, but that's not the case for worker nodes.
2. The StorageClass field volumebindingMode is set to Immediate.

This can cause volumes to be provisioned in a zone where there are no worker nodes available. As a result, the pod referring to that volume will be unable to start.

Comment 6 errata-xmlrpc 2022-08-10 11:11:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069