Bug 2099864
Summary: | vmware-vsphere-csi-driver-controller can't use host port error on e2e-vsphere-serial | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Stephen Benjamin <stbenjam> | |
Component: | Storage | Assignee: | Jan Safranek <jsafrane> | |
Storage sub component: | Kubernetes External Components | QA Contact: | Wei Duan <wduan> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | unspecified | CC: | cdaley, hekumar, jsafrane, wking | |
Version: | 4.11 | |||
Target Milestone: | --- | |||
Target Release: | 4.12.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2105334 (view as bug list) | Environment: | ||
Last Closed: | 2023-01-17 19:50:08 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2105334 |
Description
Stephen Benjamin
2022-06-21 20:41:51 UTC
Looks like it's regular vsphere jobs too, not just serial. Example failure: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-vsphere-serial/1539184288304467968 Sippy history: https://sippy.dptools.openshift.org/sippy-ng/tests/4.11/analysis?filters=%257B%2522items%2522%253A%255B%257B%2522columnField%2522%253A%2522name%2522%252C%2522operatorValue%2522%253A%2522equals%2522%252C%2522value%2522%253A%2522%255Bsig-auth%255D%255BFeature%253ASCC%255D%255BEarly%255D%2520should%2520not%2520have%2520pod%2520creation%2520failures%2520during%2520install%2520%255BSuite%253Aopenshift%252Fconformance%252Fparallel%255D%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522not%2522%253Afalse%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522vsphere-ipi%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522amd64%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522not%2522%253Atrue%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522upgrade%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522sdn%2522%257D%255D%252C%2522linkOperator%2522%253A%2522and%2522%257D&test=%5Bsig-auth%5D%5BFeature%3ASCC%5D%5BEarly%5D%20should%20not%20have%20pod%20creation%20failures%20during%20install%20%5BSuite%3Aopenshift%2Fconformance%2Fparallel%5D This seems like a duplicate of - https://bugzilla.redhat.com/show_bug.cgi?id=1913069 . I am not sure if it is the operator that is at fault here. Reading the operator logs from [1], it creates Deployment that uses node ports at: > I0627 12:49:35.812513 1 event.go:285] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cluster-csi-drivers", Name:"alibaba-disk-csi-driver-operator", UID:"a61db61a-d1db-4a1d-98ed-bc55709a7593", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'DeploymentCreated' Created Deployment.apps/alibaba-disk-csi-driver-controller -n openshift-cluster-csi-drivers because it was missing And the ClusterRole + Binding to privileged SCC is created ~10 seconds later: > I0627 12:49:46.108721 1 event.go:285] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cluster-csi-drivers", Name:"alibaba-disk-csi-driver-operator", UID:"a61db61a-d1db-4a1d-98ed-bc55709a7593", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ClusterRoleCreated' Created ClusterRole.rbac.authorization.k8s.io/alibaba-disk-privileged-role because it was missing > I0627 12:49:46.123020 1 event.go:285] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cluster-csi-drivers", Name:"alibaba-disk-csi-driver-operator", UID:"a61db61a-d1db-4a1d-98ed-bc55709a7593", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'ClusterRoleBindingCreated' Created ClusterRoleBinding.rbac.authorization.k8s.io/alibaba-disk-controller-privileged-binding because it was missing So any Pod created by the Deployment will fail for these 10 seconds. 1: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-alibaba/1541398229000654848 It's Alibaba, not vSphere, but all clouds show the same symptoms, https://search.ci.openshift.org/?search=Host+ports+are+not+allowed+to+be+used&maxAge=24h&context=1&type=junit Alibaba in the previous comment is not very representative, it flakes very rarely there. It's very reproducible on vSphere. 1. Run openshift-test locally for this SCC case but not reprodeced in 4.11.0-0.nightly-2022-06-30-005428, maybe it doesn't happenevery time 2. As all the vSphere test in https://testgrid.k8s.io/redhat-openshift-ocp-release-4.12-informing doesn't work well, checked pre-merged test, no failed. 3. Check the 4.12 cluster that the RBAC resources are created before the deployment and deamontset used in CSI Driver. So move to the "VERIFIED" status. Will keep eyes when 4.12 CI works well. Hemant is already backporting it in https://bugzilla.redhat.com/show_bug.cgi?id=2105334 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399 |