Bug 2105334 - vmware-vsphere-csi-driver-controller can't use host port error on e2e-vsphere-serial
Summary: vmware-vsphere-csi-driver-controller can't use host port error on e2e-vsphere...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.11.0
Assignee: Hemant Kumar
QA Contact: Wei Duan
URL:
Whiteboard:
Depends On: 2099864
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-08 15:00 UTC by Hemant Kumar
Modified: 2022-08-10 11:21 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2099864
Environment:
Last Closed: 2022-08-10 11:20:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift vmware-vsphere-csi-driver-operator pull 98 0 None open Bug 2105334: Reorder static resources to create RBAC first 2022-07-08 22:35:56 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:21:15 UTC

Description Hemant Kumar 2022-07-08 15:00:43 UTC
+++ This bug was initially created as a clone of Bug #2099864 +++

e2e-vsphere-serial is failing [sig-auth][Feature:SCC][Early] should not have pod creation failures during install


Error message:

error creating: pods "vmware-vsphere-csi-driver-controller-6865b8f64d-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.containers[0].securityContext.containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, spec.containers[0].securityContext.containers[1].hostPort: Invalid value: 9201: Host ports are not allowed to be used,

--- Additional comment from Stephen Benjamin on 2022-06-21 20:44:06 UTC ---

Looks like it's regular vsphere jobs too, not just serial.

Example failure: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere-serial/1539184288304467968



Sippy history:

https://sippy.dptools.openshift.org/sippy-ng/tests/4.11/analysis?filters=%257B%2522items%2522%253A%255B%257B%2522columnField%2522%253A%2522name%2522%252C%2522operatorValue%2522%253A%2522equals%2522%252C%2522value%2522%253A%2522%255Bsig-auth%255D%255BFeature%253ASCC%255D%255BEarly%255D%2520should%2520not%2520have%2520pod%2520creation%2520failures%2520during%2520install%2520%255BSuite%253Aopenshift%252Fconformance%252Fparallel%255D%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522not%2522%253Afalse%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522vsphere-ipi%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522amd64%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522not%2522%253Atrue%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522upgrade%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522sdn%2522%257D%255D%252C%2522linkOperator%2522%253A%2522and%2522%257D&test=%5Bsig-auth%5D%5BFeature%3ASCC%5D%5BEarly%5D%20should%20not%20have%20pod%20creation%20failures%20during%20install%20%5BSuite%3Aopenshift%2Fconformance%2Fparallel%5D

--- Additional comment from Hemant Kumar on 2022-06-21 22:11:53 UTC ---

This seems like a duplicate of - https://bugzilla.redhat.com/show_bug.cgi?id=1913069 . I am not sure if it is the operator that is at fault here.

--- Additional comment from Corey Daley on 2022-06-25 00:58:01 UTC ---

This looks to be an issue with the vsphere driver, not the shared resource driver.

--- Additional comment from Jan Safranek on 2022-06-28 13:54:20 UTC ---

Reading the operator logs from [1], it creates Deployment that uses node ports at:

> I0627 12:49:35.812513       1 event.go:285] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cluster-csi-drivers", Name:"alibaba-disk-csi-driver-operator", UID:"a61db61a-d1db-4a1d-98ed-bc55709a7593", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'DeploymentCreated' Created Deployment.apps/alibaba-disk-csi-driver-controller -n openshift-cluster-csi-drivers because it was missing

And the ClusterRole + Binding to privileged SCC is created ~10 seconds later:

> I0627 12:49:46.108721       1 event.go:285] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cluster-csi-drivers", Name:"alibaba-disk-csi-driver-operator", UID:"a61db61a-d1db-4a1d-98ed-bc55709a7593", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ClusterRoleCreated' Created ClusterRole.rbac.authorization.k8s.io/alibaba-disk-privileged-role because it was missing
> I0627 12:49:46.123020       1 event.go:285] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cluster-csi-drivers", Name:"alibaba-disk-csi-driver-operator", UID:"a61db61a-d1db-4a1d-98ed-bc55709a7593", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ClusterRoleBindingCreated' Created ClusterRoleBinding.rbac.authorization.k8s.io/alibaba-disk-controller-privileged-binding because it was missing

So any Pod created by the Deployment will fail for these 10 seconds.

1: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-alibaba/1541398229000654848
It's Alibaba, not vSphere, but all clouds show the same symptoms, https://search.ci.openshift.org/?search=Host+ports+are+not+allowed+to+be+used&maxAge=24h&context=1&type=junit

--- Additional comment from Jan Safranek on 2022-06-28 14:11:21 UTC ---

Alibaba in the previous comment is not very representative, it flakes very rarely there. It's very reproducible on vSphere.

--- Additional comment from OpenShift BugZilla Robot on 2022-06-29 10:52:04 UTC ---

Bug status changed to NEW as previous linked PR https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/93 has been closed

--- Additional comment from OpenShift Automated Release Tooling on 2022-07-01 15:38:11 UTC ---

Elliott changed bug status from MODIFIED to ON_QA.
This bug is expected to ship in the next 4.12 release.

--- Additional comment from Wei Duan on 2022-07-04 10:13:32 UTC ---

1. Run openshift-test locally for this SCC case but not reprodeced in 4.11.0-0.nightly-2022-06-30-005428, maybe it doesn't happenevery time
2. As all the vSphere test in https://testgrid.k8s.io/redhat-openshift-ocp-release-4.12-informing doesn't work well, checked pre-merged test, no failed.
3. Check the 4.12 cluster that the RBAC resources are created before the deployment and deamontset used in CSI Driver.

So move to the "VERIFIED" status. 
Will keep eyes when 4.12 CI works well.

--- Additional comment from Wei Duan on 2022-07-04 13:16:28 UTC ---

Hi Jan, do you think we need backport?

Comment 4 errata-xmlrpc 2022-08-10 11:20:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.