Bug 1977179
Summary: | PVC keeps in pending when using hostpath-provisioner | |||
---|---|---|---|---|
Product: | Container Native Virtualization (CNV) | Reporter: | Yan Du <yadu> | |
Component: | Storage | Assignee: | Alexander Wels <awels> | |
Status: | CLOSED ERRATA | QA Contact: | Yan Du <yadu> | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 4.8.0 | CC: | akalenyu, awels, brybacki, bschmaus, cnv-qe-bugs, danken, dholler, fdeutsch, pelauter, ycui | |
Target Milestone: | --- | Keywords: | Regression | |
Target Release: | 4.8.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | hostpath-provisioner-rhel8-operator v4.8.0-17 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1977756 (view as bug list) | Environment: | ||
Last Closed: | 2021-07-27 14:32:39 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1977383 | |||
Bug Blocks: | 1977756 |
Description
Yan Du
2021-06-29 06:38:38 UTC
hpp build: hostpath-provisioner-operator-container-v4.8.0-16 Created fix in attached PR link Basically needed to modify the SCC to included projected volumes. Something that is enabled by default for all pods in 4.8 backport to 4.8 branch fixed in:hostpath-provisioner-operator v4.8.0-17 hco bundle: v4.8.0-444 @akalenyu can you point to the OpenShift RC.1 change that triggered this bug (bz, jira, pr)? Is there any workaround for this bug that an admin could perform? (In reply to Fabian Deutsch from comment #10) > Is there any workaround for this bug that an admin could perform? I don't think we can work around this in production as any W/A will involve scaling down our operator so the SCC doesn't get reconciled: - Scale down HPP operator - Manually add `- projected` to the HPP SCC's .volumes[] (named hostpath-provisioner) - Edit daemonset to trigger attempt to start pods (named hostpath-provisioner as well) (Or if we want to verify this without redeploying CNV, we could scale down HCO and replace the HPP operator image in the cluster). (In reply to Dan Kenigsberg from comment #9) > @akalenyu can you point to the OpenShift RC.1 change that > triggered this bug (bz, jira, or)? The reason we hit this is that BoundServiceAccountTokenVolume feature gate is now enabled by default in k8s 1.21: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#overview https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume (Thanks awels for these findings). As for the reason we only see it in RC of OCP, changelog of FC.9 (https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4-stable/release/4.8.0-fc.9) points to Bug 1946479 which would explain why the feature gate was disabled on the OCP side before. Alex, would it be possible for a customer to disable this openshift feature gate? If so it would be a valid workaround that should be documented here. IIUIC then these are three distinct steps: - Scale down HPP operator --> oc scale - Manually add `- projected` to the HPP SCC's .volumes[] (named hostpath-provisioner) --> oc patch? - Edit daemonset to trigger attempt to start pods (named hostpath-provisioner as well) --> oc delete -l… ? Once we have a fix then it's about scaling it up again: oc scale … Is this correct? (In reply to Dan Kenigsberg from comment #12) > Alex, would it be possible for a customer to disable this openshift feature > gate? If so it would be a valid workaround that should be documented here. not sure if I understand that right, I think if we want to disable the BoundServiceAccountTokenVolume feature gate, we need to stop kubelet in all nodes, and restart the kubelet without BoundServiceAccountTokenVolume para. Maybe scale down HCO and replace the HPP operator image is better than this. Verified the workaround and it works well 1. oc scale deployment/hostpath-provisioner-operator --replicas=0 2. oc edit scc hostpath-provisioner with '- prjected': volumes: - hostPath - secret - projected 3. edit/patch the daemonset to trigger a reconcile on it 4. oc scale deployment/hostpath-provisioner-operator --replicas=1 After that, create the dv and vm with hostpath-provisioner sc: $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE dv1 Bound pvc-af0d07a3-8b6b-436b-91b9-83af8bd00213 245Gi RWO hostpath-provisioner 25s $ oc get vmi NAME AGE PHASE IP NODENAME vm1 19s Running 10.131.1.223 infra-debug3a-5g7ld-worker-0-2hd8n Test passed with hostpath-provisioner-operator-container-v4.8.0-17 still waiting for new OCP rc1 build to verify the bug Issue has been fixed with below version: Client Version: 4.8.0-202106281541.p0.git.1077b05.assembly.stream-1077b05 Server Version: 4.8.0-rc.1 Kubernetes Version: v1.21.0-rc.0+766a5fe hostpath-provisioner-operator-container-v4.8.0-17 Retest on Server Version: 4.8.0-rc.3, hpp works fine. Move bug to verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2920 |