Description of problem: A pod mounting persistent volume with hostpath character device failed as character device are not recogzed as character device in 3.9, while 3.10.83 works fine. Version-Release number of selected component (if applicable): oc v3.9.57 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://preserve-stage-39-master-etcd-nfs-1:8443 openshift v3.9.57 kubernetes v1.9.1+a0ce1bc657 How reproducible: Always Steps to Reproduce: 1. Create a hostpath pv with character device. 2. Login as end user, create project. Then add scc privileged to the user. 3. Create pvc/privilege pod. 3. Check the pod. Actual results: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 41s default-scheduler Successfully assigned mypod to preserve-stage-39-nrr-1 Normal SuccessfulMountVolume 41s kubelet, preserve-stage-39-nrr-1 MountVolume.SetUp succeeded for volume "default-token-mqxwl" Warning FailedMount 9s (x7 over 41s) kubelet, preserve-stage-39-nrr-1 MountVolume.SetUp failed for volume "pv-qexoq" : hostPath type check failed: /dev/zero is not a character device Expected results: Pod is up and running. PV Dump: # oc get pv pv-qexoq -o yaml --export apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/bound-by-controller: "yes" creationTimestamp: null name: pv-qexoq selfLink: /api/v1/persistentvolumes/pv-qexoq spec: accessModes: - ReadWriteOnce capacity: storage: 1Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: pvc0001 namespace: v6tpj resourceVersion: "25668" uid: 7f5c2443-fc51-11e8-9517-fa163e083653 hostPath: path: /dev/zero type: CharDevice persistentVolumeReclaimPolicy: Delete status: {} PVC Dump: # oc get pvc -n v6tpj -o yaml --export apiVersion: v1 items: - apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" creationTimestamp: 2018-12-10T07:59:19Z name: pvc0001 namespace: v6tpj resourceVersion: "25673" selfLink: /api/v1/namespaces/v6tpj/persistentvolumeclaims/pvc0001 uid: 7f5c2443-fc51-11e8-9517-fa163e083653 spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: "" volumeName: pv-qexoq status: accessModes: - ReadWriteOnce capacity: storage: 1Gi phase: Bound kind: List metadata: resourceVersion: "" selfLink: "" Additional info: # ll /dev/zero crw-rw-rw-. 1 root root 1, 5 Dec 10 00:55 /dev/zero # cat pod.yaml kind: Pod apiVersion: v1 metadata: name: mypod spec: containers: - name: mycontainer image: aosqe/hello-openshift securityContext: privileged: true volumeMounts: - mountPath: "/mnt/ocp" name: my-volume volumes: - name: my-volume persistentVolumeClaim: claimName: pvc0001 Logs from node where pod is scheduled to, Dec 10 00:27:53 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[20866]: I1210 00:27:53.528208 20866 operation_executor.go:895] Starting operationExecutor.MountVolume for volume "pv-npr5s" (UniqueName: "kubernetes.io/host-path/56ec1acd-fc3c-11e8-9607-fa163ef30df0-pv-npr5s") pod "mypod" (UID: "56ec1acd-fc3c-11e8-9607-fa163ef30df0") Dec 10 00:27:53 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[20866]: I1210 00:27:53.528235 20866 volume_host.go:218] using default mounter/exec for kubernetes.io/host-path Dec 10 00:27:53 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[20866]: I1210 00:27:53.628461 20866 operation_executor.go:895] Starting operationExecutor.MountVolume for volume "pv-npr5s" (UniqueName: "kubernetes.io/host-path/56ec1acd-fc3c-11e8-9607-fa163ef30df0-pv-npr5s") pod "mypod" (UID: "56ec1acd-fc3c-11e8-9607-fa163ef30df0") Dec 10 00:27:53 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[20866]: I1210 00:27:53.628489 20866 volume_host.go:218] using default mounter/exec for kubernetes.io/host-path Dec 10 00:27:53 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[20866]: I1210 00:27:53.628519 20866 reconciler.go:262] operationExecutor.MountVolume started for volume "pv-npr5s" (UniqueName: "kubernetes.io/host-path/56ec1acd-fc3c-11e8-9607-fa163ef30df0-pv-npr5s") pod "mypod" (UID: "56ec1acd-fc3c-11e8-9607-fa163ef30df0") Dec 10 00:27:53 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[20866]: E1210 00:27:53.628617 20866 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/host-path/56ec1acd-fc3c-11e8-9607-fa163ef30df0-pv-npr5s\" (\"56ec1acd-fc3c-11e8-9607-fa163ef30df0\")" failed. No retries permitted until 2018-12-10 00:27:54.628593782 -0500 EST m=+10546.451238567 (durationBeforeRetry 1s). Error: "MountVolume.SetUp failed for volume \"pv-npr5s\" (UniqueName: \"kubernetes.io/host-path/56ec1acd-fc3c-11e8-9607-fa163ef30df0-pv-npr5s\") pod \"mypod\" (UID: \"56ec1acd-fc3c-11e8-9607-fa163ef30df0\") : hostPath type check failed: /dev/zero is not a character device" Dec 10 00:27:53 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[20866]: I1210 00:27:53.629127 20866 server.go:290] Event(v1.ObjectReference{Kind:"Pod", Namespace:"pm8om", Name:"mypod", UID:"56ec1acd-fc3c-11e8-9607-fa163ef30df0", APIVersion:"v1", ResourceVersion:"25349", FieldPath:""}): type: 'Warning' reason: 'FailedMount' MountVolume.SetUp failed for volume "pv-npr5s" : hostPath type check failed: /dev/zero is not a character device
Indeed, the code is wrong for character devices, kubernetes detects them as block devices: https://github.com/openshift/ose/blob/745e58e4cfa2e376ea638068b9c9d6b6c4aeaf45/vendor/k8s.io/kubernetes/pkg/util/mount/mount_linux.go#L441-L442 It's trivial to fix, however, does it make sense in 3.9? No customer is complaining and the bug is fixed in 3.10 and 3.11.
Kubernetes 1.9 does not have such issue, https://github.com/kubernetes/kubernetes/blob/release-1.9/pkg/util/mount/mount_linux.go#L424 https://github.com/kubernetes/kubernetes/blob/release-1.9/pkg/util/mount/mount.go#L331 OCP 3.9 do have the issue, https://github.com/openshift/ose/blob/enterprise-3.9/vendor/k8s.io/kubernetes/pkg/util/mount/mount_linux.go#L424 So QE would like the issue be fixed.
This way you could require backport of any fix in kubernetes 1.9.x that has not been fixed in 3.9.z and there's plenty of them ;-) For this time: 3.9 PR: https://github.com/openshift/ose/pull/1483
PR https://github.com/openshift/ose/pull/1483 merged, (Merge date is 2019-01-11) but no build contains the code change yet. (Latest build is atomic-openshift-3.9.64-1.git.0.13cd345.el7, which was build on 2019-01-05 09:42:35) QE will check again when new build is there.
Move back to modified since still no build contains the fix.
First, QE verified that the fix is in the build. Changelog * Mon Jan 21 2019 AOS Automation Release Team <***@redhat.com> 3.9.65-1 - UPSTREAM: 60510: fix bug where character devices are not recognized (jsafrane) - UPSTREAM: 62304: Remove isNotDir error check (jsafrane) Then, QE set up two clusters with the same version, one is containerized, one is rpm. # oc version oc v3.9.65 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://qe-lxia-39-container-master-etcd-1:8443 openshift v3.9.65 kubernetes v1.9.1+a0ce1bc657 # oc version oc v3.9.65 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://qe-lxia-39-rpm-master-etcd-1:8443 openshift v3.9.65 kubernetes v1.9.1+a0ce1bc657 And verified path /dev/zero exist on both cluster nodes. [root@qe-lxia-39-container-master-etcd-1 ~]# ls -l /dev/zero crw-rw-rw-. 1 root root 1, 5 Jan 23 02:37 /dev/zero [root@qe-lxia-39-rpm-master-etcd-1 ~]# ls -l /dev/zero crw-rw-rw-. 1 root root 1, 5 Jan 22 23:24 /dev/zero Then with the same content of pv/pvc/pod, pod on rpm cluster is up and running. But pod on containerized cluster failed with error: hostPath type check failed: /dev/zero is not a character device # oc describe pod mypod Name: mypod Namespace: default Node: qe-lxia-39-container-master-etcd-1/172.16.122.23 Start Time: Wed, 23 Jan 2019 05:10:50 +0000 Labels: <none> Annotations: openshift.io/scc=privileged Status: Pending IP: Containers: mycontainer: Container ID: Image: aosqe/hello-openshift Image ID: Port: <none> State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /mnt/ocp from my-volume (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-gvcjv (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: my-volume: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvc0001 ReadOnly: false default-token-gvcjv: Type: Secret (a volume populated by a Secret) SecretName: default-token-gvcjv Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 5m default-scheduler Successfully assigned mypod to qe-lxia-39-container-master-etcd-1 Normal SuccessfulMountVolume 5m kubelet, qe-lxia-39-container-master-etcd-1 MountVolume.SetUp succeeded for volume "default-token-gvcjv" Warning FailedMount 1m (x10 over 5m) kubelet, qe-lxia-39-container-master-etcd-1 MountVolume.SetUp failed for volume "pv-qexoq" : hostPath type check failed: /dev/zero is not a character device Warning FailedMount 1m (x2 over 3m) kubelet, qe-lxia-39-container-master-etcd-1 Unable to mount volumes for pod "mypod_default(406a38d5-1ecd-11e9-91e3-fa163e7184ee)": timeout expired waiting for volumes to attach/mount for pod "default"/"mypod". list of unattached/unmounted volumes=[my-volume] Logs from node, Jan 23 05:19:04 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:04.880174 55543 volume_host.go:218] using default mounter/exec for kubernetes.io/host-path Jan 23 05:19:04 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:04.980361 55543 operation_executor.go:895] Starting operationExecutor.MountVolume for volume "pv-qexoq" (UniqueName: "kubernetes.io/host-path/406a38d5-1ecd-11e9-91e3-fa163e7184ee-pv-qexoq") pod "mypod" (UID: "406a38d5-1ecd-11e9-91e3-fa163e7184ee") Jan 23 05:19:04 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:04.980414 55543 volume_host.go:218] using default mounter/exec for kubernetes.io/host-path Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:05.080632 55543 operation_executor.go:895] Starting operationExecutor.MountVolume for volume "pv-qexoq" (UniqueName: "kubernetes.io/host-path/406a38d5-1ecd-11e9-91e3-fa163e7184ee-pv-qexoq") pod "mypod" (UID: "406a38d5-1ecd-11e9-91e3-fa163e7184ee") Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:05.080678 55543 volume_host.go:218] using default mounter/exec for kubernetes.io/host-path Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:05.177422 55543 kubelet.go:2123] Container runtime status: Runtime Conditions: RuntimeReady=true reason: message:, NetworkReady=true reason: message: Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:05.180849 55543 operation_executor.go:895] Starting operationExecutor.MountVolume for volume "pv-qexoq" (UniqueName: "kubernetes.io/host-path/406a38d5-1ecd-11e9-91e3-fa163e7184ee-pv-qexoq") pod "mypod" (UID: "406a38d5-1ecd-11e9-91e3-fa163e7184ee") Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:05.180903 55543 volume_host.go:218] using default mounter/exec for kubernetes.io/host-path Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:05.180946 55543 reconciler.go:262] operationExecutor.MountVolume started for volume "pv-qexoq" (UniqueName: "kubernetes.io/host-path/406a38d5-1ecd-11e9-91e3-fa163e7184ee-pv-qexoq") pod "mypod" (UID: "406a38d5-1ecd-11e9-91e3-fa163e7184ee") Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:05.181001 55543 nsenter.go:107] Running nsenter command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/stat -L --printf "%F" /dev/zero] Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: E0123 05:19:05.183314 55543 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/host-path/406a38d5-1ecd-11e9-91e3-fa163e7184ee-pv-qexoq\" (\"406a38d5-1ecd-11e9-91e3-fa163e7184ee\")" failed. No retries permitted until 2019-01-23 05:21:07.183283219 +0000 UTC m=+5819.451739875 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"pv-qexoq\" (UniqueName: \"kubernetes.io/host-path/406a38d5-1ecd-11e9-91e3-fa163e7184ee-pv-qexoq\") pod \"mypod\" (UID: \"406a38d5-1ecd-11e9-91e3-fa163e7184ee\") : hostPath type check failed: /dev/zero is not a character device" Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:05.183690 55543 server.go:290] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"mypod", UID:"406a38d5-1ecd-11e9-91e3-fa163e7184ee", APIVersion:"v1", ResourceVersion:"23724", FieldPath:""}): type: 'Warning' reason: 'FailedMount' MountVolume.SetUp failed for volume "pv-qexoq" : hostPath type check failed: /dev/zero is not a character device
Hi Jan, Can this issue be dropped from 3.9.z errata?
# oc version oc v3.9.72 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://qe-lxia-39-master-etcd-1:8443 openshift v3.9.72 kubernetes v1.9.1+a0ce1bc657 ========================================================= # ls -l /dev/zero crw-rw-rw-. 1 root root 1, 5 Mar 13 07:33 /dev/zero ========================================================= # oc get pods NAME READY STATUS RESTARTS AGE dynamic 1/1 Running 0 2m ========================================================= # oc rsh dynamic / # ls -lh /mnt/ocp_pv crw-rw-rw- 1 root root 1, 5 Mar 13 07:33 /mnt/ocp_pv ========================================================= # cat installation_matrix atomic-openshift version: v3.9.72 Operation System: Red Hat Enterprise Linux Atomic Host release 7.5 Cluster Install Method: docker container Docker Version: docker-1.13.1-58.git87f2fab.el7.x86_64 Docker Storage Driver: overlay2 OpenvSwitch Version: openvswitch-2.9.0-97.el7fdp.x86_64 etcd Version: etcd-3.2.22-1.el7.x86_64 Network Plugin: redhat/openshift-ovs-subnet Auth Method: allowall Registry Deployment Method: deploymentconfig Secure Registry: True Registry Backend Storage: cinder Load Balancer: None Docker System Container: False CRI-O Enable: False Firewall Service: iptables
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0619