Description of problem: PVC keeps in pending when using hostpath-provisioner Version-Release number of selected component (if applicable): Client Version: 4.8.0-202106281541.p0.git.1077b05.assembly.stream-1077b05 Server Version: 4.8.0-rc.1 Kubernetes Version: v1.21.0-rc.0+766a5fe $ oc get csv -A NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-cnv kubevirt-hyperconverged-operator.v4.8.0 OpenShift Virtualization 4.8.0 kubevirt-hyperconverged-operator.v2.6.5 Succeeded openshift-local-storage local-storage-operator.4.7.0-202102110027.p0 Local Storage 4.7.0-202102110027.p0 Succeeded openshift-operator-lifecycle-manager packageserver Package Server 0.17.0 Succeeded openshift-storage ocs-operator.v4.8.0-431.ci OpenShift Container Storage 4.8.0-431.ci Succeeded How reproducible: Always Steps to Reproduce: 1. Create a dv with hostpath-provisioner --- apiVersion: cdi.kubevirt.io/v1alpha1 kind: DataVolume metadata: name: dv1 spec: source: http: url: http://cnv-qe-server.rhevdev.lab.eng.rdu2.redhat.com/files/cnv-tests/cirros-images/cirros-0.4.0-x86_64-disk.qcow2 pvc: accessModes: - ReadWriteOnce resources: requests: storage: 2Gi storageClassName: hostpath-provisioner volumeMode: Filesystem contentType: kubevirt 2. Create a vm to consume the dv --- apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: name: vm1 spec: template: spec: domain: resources: requests: memory: 512M devices: rng: {} disks: - disk: bus: virtio name: dv1 volumes: - name: dv1 dataVolume: name: dv1 running: true 3. Actual results: [cnv-qe-jenkins@infra-debug3b-cgz2l-executor cnv-tests]$ oc get dv NAME PHASE PROGRESS RESTARTS AGE dv1 WaitForFirstConsumer N/A 21m [cnv-qe-jenkins@infra-debug3b-cgz2l-executor cnv-tests]$ oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE dv1 Pending hostpath-provisioner 22m [cnv-qe-jenkins@infra-debug3b-cgz2l-executor cnv-tests]$ oc get pod NAME READY STATUS RESTARTS AGE virt-launcher-vm1-crrvd 0/1 Pending 0 21m [cnv-qe-jenkins@infra-debug3b-cgz2l-executor cnv-tests]$ oc get vm NAME AGE VOLUME vm1 23m [cnv-qe-jenkins@infra-debug3b-cgz2l-executor cnv-tests]$ oc get vmi NAME AGE PHASE IP NODENAME vm1 23m Pending Expected results: The pvc can be bound and the vm can be running Additional info: Detailed log attached
hpp build: hostpath-provisioner-operator-container-v4.8.0-16
Created fix in attached PR link Basically needed to modify the SCC to included projected volumes. Something that is enabled by default for all pods in 4.8
backport to 4.8 branch
fixed in:hostpath-provisioner-operator v4.8.0-17 hco bundle: v4.8.0-444
@akalenyu can you point to the OpenShift RC.1 change that triggered this bug (bz, jira, pr)?
Is there any workaround for this bug that an admin could perform?
(In reply to Fabian Deutsch from comment #10) > Is there any workaround for this bug that an admin could perform? I don't think we can work around this in production as any W/A will involve scaling down our operator so the SCC doesn't get reconciled: - Scale down HPP operator - Manually add `- projected` to the HPP SCC's .volumes[] (named hostpath-provisioner) - Edit daemonset to trigger attempt to start pods (named hostpath-provisioner as well) (Or if we want to verify this without redeploying CNV, we could scale down HCO and replace the HPP operator image in the cluster). (In reply to Dan Kenigsberg from comment #9) > @akalenyu can you point to the OpenShift RC.1 change that > triggered this bug (bz, jira, or)? The reason we hit this is that BoundServiceAccountTokenVolume feature gate is now enabled by default in k8s 1.21: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#overview https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume (Thanks awels for these findings). As for the reason we only see it in RC of OCP, changelog of FC.9 (https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4-stable/release/4.8.0-fc.9) points to Bug 1946479 which would explain why the feature gate was disabled on the OCP side before.
Alex, would it be possible for a customer to disable this openshift feature gate? If so it would be a valid workaround that should be documented here.
IIUIC then these are three distinct steps: - Scale down HPP operator --> oc scale - Manually add `- projected` to the HPP SCC's .volumes[] (named hostpath-provisioner) --> oc patch? - Edit daemonset to trigger attempt to start pods (named hostpath-provisioner as well) --> oc delete -l… ? Once we have a fix then it's about scaling it up again: oc scale … Is this correct?
(In reply to Dan Kenigsberg from comment #12) > Alex, would it be possible for a customer to disable this openshift feature > gate? If so it would be a valid workaround that should be documented here. not sure if I understand that right, I think if we want to disable the BoundServiceAccountTokenVolume feature gate, we need to stop kubelet in all nodes, and restart the kubelet without BoundServiceAccountTokenVolume para. Maybe scale down HCO and replace the HPP operator image is better than this.
Verified the workaround and it works well 1. oc scale deployment/hostpath-provisioner-operator --replicas=0 2. oc edit scc hostpath-provisioner with '- prjected': volumes: - hostPath - secret - projected 3. edit/patch the daemonset to trigger a reconcile on it 4. oc scale deployment/hostpath-provisioner-operator --replicas=1 After that, create the dv and vm with hostpath-provisioner sc: $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE dv1 Bound pvc-af0d07a3-8b6b-436b-91b9-83af8bd00213 245Gi RWO hostpath-provisioner 25s $ oc get vmi NAME AGE PHASE IP NODENAME vm1 19s Running 10.131.1.223 infra-debug3a-5g7ld-worker-0-2hd8n
Test passed with hostpath-provisioner-operator-container-v4.8.0-17 still waiting for new OCP rc1 build to verify the bug
Issue has been fixed with below version: Client Version: 4.8.0-202106281541.p0.git.1077b05.assembly.stream-1077b05 Server Version: 4.8.0-rc.1 Kubernetes Version: v1.21.0-rc.0+766a5fe hostpath-provisioner-operator-container-v4.8.0-17
Retest on Server Version: 4.8.0-rc.3, hpp works fine. Move bug to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2920