Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1657668

Summary:	Persistent volume HostPath type check failed for character device
Product:	OpenShift Container Platform	Reporter:	Liang Xia <lxia>
Component:	Storage	Assignee:	Jan Safranek <jsafrane>
Status:	CLOSED ERRATA	QA Contact:	Liang Xia <lxia>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.9.0	CC:	aos-bugs, aos-storage-staff, jsafrane, piqin
Target Milestone:	---
Target Release:	3.9.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-04-09 14:20:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Liang Xia 2018-12-10 09:07:59 UTC

Description of problem:
A pod mounting persistent volume with hostpath character device failed as character device are not recogzed as character device in 3.9, while 3.10.83 works fine.


Version-Release number of selected component (if applicable):
oc v3.9.57
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://preserve-stage-39-master-etcd-nfs-1:8443
openshift v3.9.57
kubernetes v1.9.1+a0ce1bc657


How reproducible:
Always

Steps to Reproduce:
1. Create a hostpath pv with character device.
2. Login as end user, create project. Then add scc privileged to the user.
3. Create pvc/privilege pod.
3. Check the pod.


Actual results:
Events:
  Type     Reason                 Age               From                              Message
  ----     ------                 ----              ----                              -------
  Normal   Scheduled              41s               default-scheduler                 Successfully assigned mypod to preserve-stage-39-nrr-1
  Normal   SuccessfulMountVolume  41s               kubelet, preserve-stage-39-nrr-1  MountVolume.SetUp succeeded for volume "default-token-mqxwl"
  Warning  FailedMount            9s (x7 over 41s)  kubelet, preserve-stage-39-nrr-1  MountVolume.SetUp failed for volume "pv-qexoq" : hostPath type check failed: /dev/zero is not a character device


Expected results:
Pod is up and running.


PV Dump:
# oc get pv pv-qexoq -o yaml --export
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/bound-by-controller: "yes"
  creationTimestamp: null
  name: pv-qexoq
  selfLink: /api/v1/persistentvolumes/pv-qexoq
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 1Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: pvc0001
    namespace: v6tpj
    resourceVersion: "25668"
    uid: 7f5c2443-fc51-11e8-9517-fa163e083653
  hostPath:
    path: /dev/zero
    type: CharDevice
  persistentVolumeReclaimPolicy: Delete
status: {}

PVC Dump:
# oc get pvc -n v6tpj -o yaml --export
apiVersion: v1
items:
- apiVersion: v1
  kind: PersistentVolumeClaim
  metadata:
    annotations:
      pv.kubernetes.io/bind-completed: "yes"
      pv.kubernetes.io/bound-by-controller: "yes"
    creationTimestamp: 2018-12-10T07:59:19Z
    name: pvc0001
    namespace: v6tpj
    resourceVersion: "25673"
    selfLink: /api/v1/namespaces/v6tpj/persistentvolumeclaims/pvc0001
    uid: 7f5c2443-fc51-11e8-9517-fa163e083653
  spec:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 1Gi
    storageClassName: ""
    volumeName: pv-qexoq
  status:
    accessModes:
    - ReadWriteOnce
    capacity:
      storage: 1Gi
    phase: Bound
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


Additional info:
# ll /dev/zero
crw-rw-rw-. 1 root root 1, 5 Dec 10 00:55 /dev/zero

# cat pod.yaml
kind: Pod
apiVersion: v1
metadata:
  name: mypod
spec:
  containers:
    - name: mycontainer
      image: aosqe/hello-openshift
      securityContext:
        privileged: true
      volumeMounts:
      - mountPath: "/mnt/ocp"
        name: my-volume
  volumes:
    - name: my-volume
      persistentVolumeClaim:
        claimName: pvc0001


Logs from node where pod is scheduled to,
Dec 10 00:27:53 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[20866]: I1210 00:27:53.528208   20866 operation_executor.go:895] Starting operationExecutor.MountVolume for volume "pv-npr5s" (UniqueName: "kubernetes.io/host-path/56ec1acd-fc3c-11e8-9607-fa163ef30df0-pv-npr5s") pod "mypod" (UID: "56ec1acd-fc3c-11e8-9607-fa163ef30df0")
Dec 10 00:27:53 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[20866]: I1210 00:27:53.528235   20866 volume_host.go:218] using default mounter/exec for kubernetes.io/host-path
Dec 10 00:27:53 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[20866]: I1210 00:27:53.628461   20866 operation_executor.go:895] Starting operationExecutor.MountVolume for volume "pv-npr5s" (UniqueName: "kubernetes.io/host-path/56ec1acd-fc3c-11e8-9607-fa163ef30df0-pv-npr5s") pod "mypod" (UID: "56ec1acd-fc3c-11e8-9607-fa163ef30df0")
Dec 10 00:27:53 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[20866]: I1210 00:27:53.628489   20866 volume_host.go:218] using default mounter/exec for kubernetes.io/host-path
Dec 10 00:27:53 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[20866]: I1210 00:27:53.628519   20866 reconciler.go:262] operationExecutor.MountVolume started for volume "pv-npr5s" (UniqueName: "kubernetes.io/host-path/56ec1acd-fc3c-11e8-9607-fa163ef30df0-pv-npr5s") pod "mypod" (UID: "56ec1acd-fc3c-11e8-9607-fa163ef30df0")
Dec 10 00:27:53 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[20866]: E1210 00:27:53.628617   20866 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/host-path/56ec1acd-fc3c-11e8-9607-fa163ef30df0-pv-npr5s\" (\"56ec1acd-fc3c-11e8-9607-fa163ef30df0\")" failed. No retries permitted until 2018-12-10 00:27:54.628593782 -0500 EST m=+10546.451238567 (durationBeforeRetry 1s). Error: "MountVolume.SetUp failed for volume \"pv-npr5s\" (UniqueName: \"kubernetes.io/host-path/56ec1acd-fc3c-11e8-9607-fa163ef30df0-pv-npr5s\") pod \"mypod\" (UID: \"56ec1acd-fc3c-11e8-9607-fa163ef30df0\") : hostPath type check failed: /dev/zero is not a character device"
Dec 10 00:27:53 preserve-qe-lxia-39-nrr-1 atomic-openshift-node[20866]: I1210 00:27:53.629127   20866 server.go:290] Event(v1.ObjectReference{Kind:"Pod", Namespace:"pm8om", Name:"mypod", UID:"56ec1acd-fc3c-11e8-9607-fa163ef30df0", APIVersion:"v1", ResourceVersion:"25349", FieldPath:""}): type: 'Warning' reason: 'FailedMount' MountVolume.SetUp failed for volume "pv-npr5s" : hostPath type check failed: /dev/zero is not a character device

Comment 1 Jan Safranek 2019-01-02 10:43:06 UTC

Indeed, the code is wrong for character devices, kubernetes detects them as block devices: https://github.com/openshift/ose/blob/745e58e4cfa2e376ea638068b9c9d6b6c4aeaf45/vendor/k8s.io/kubernetes/pkg/util/mount/mount_linux.go#L441-L442

It's trivial to fix, however, does it make sense in 3.9? No customer is complaining and the bug is fixed in 3.10 and 3.11.

Comment 2 Liang Xia 2019-01-04 09:51:15 UTC

Kubernetes 1.9 does not have such issue,
https://github.com/kubernetes/kubernetes/blob/release-1.9/pkg/util/mount/mount_linux.go#L424
https://github.com/kubernetes/kubernetes/blob/release-1.9/pkg/util/mount/mount.go#L331

OCP 3.9 do have the issue,
https://github.com/openshift/ose/blob/enterprise-3.9/vendor/k8s.io/kubernetes/pkg/util/mount/mount_linux.go#L424

So QE would like the issue be fixed.

Comment 3 Jan Safranek 2019-01-07 15:30:55 UTC

This way you could require backport of any fix in kubernetes 1.9.x that has not been fixed in 3.9.z  and there's plenty of them ;-)

For this time: 3.9 PR: https://github.com/openshift/ose/pull/1483

Comment 5 Liang Xia 2019-01-14 09:53:55 UTC

PR https://github.com/openshift/ose/pull/1483 merged,
(Merge date is 2019-01-11)

but no build contains the code change yet.
(Latest build is atomic-openshift-3.9.64-1.git.0.13cd345.el7, which was build on 2019-01-05 09:42:35)

QE will check again when new build is there.

Comment 6 Liang Xia 2019-01-17 09:02:33 UTC

Move back to modified since still no build contains the fix.

Comment 8 Liang Xia 2019-01-23 05:32:12 UTC

First, QE verified that the fix is in the build.
Changelog 	
* Mon Jan 21 2019 AOS Automation Release Team <***@redhat.com> 3.9.65-1
- UPSTREAM: 60510: fix bug where character devices are not recognized
  (jsafrane)
- UPSTREAM: 62304: Remove isNotDir error check (jsafrane)


Then, QE set up two clusters with the same version, one is containerized, one is rpm.
# oc version
oc v3.9.65
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://qe-lxia-39-container-master-etcd-1:8443
openshift v3.9.65
kubernetes v1.9.1+a0ce1bc657

# oc version
oc v3.9.65
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://qe-lxia-39-rpm-master-etcd-1:8443
openshift v3.9.65
kubernetes v1.9.1+a0ce1bc657


And verified path /dev/zero exist on both cluster nodes.
[root@qe-lxia-39-container-master-etcd-1 ~]# ls -l /dev/zero
crw-rw-rw-. 1 root root 1, 5 Jan 23 02:37 /dev/zero

[root@qe-lxia-39-rpm-master-etcd-1 ~]# ls -l /dev/zero
crw-rw-rw-. 1 root root 1, 5 Jan 22 23:24 /dev/zero


Then with the same content of pv/pvc/pod, pod on rpm cluster is up and running.
But pod on containerized cluster failed with error: hostPath type check failed: /dev/zero is not a character device
# oc describe pod mypod
Name:         mypod
Namespace:    default
Node:         qe-lxia-39-container-master-etcd-1/172.16.122.23
Start Time:   Wed, 23 Jan 2019 05:10:50 +0000
Labels:       <none>
Annotations:  openshift.io/scc=privileged
Status:       Pending
IP:           
Containers:
  mycontainer:
    Container ID:   
    Image:          aosqe/hello-openshift
    Image ID:       
    Port:           <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /mnt/ocp from my-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-gvcjv (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  my-volume:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pvc0001
    ReadOnly:   false
  default-token-gvcjv:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-gvcjv
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <none>
Events:
  Type     Reason                 Age               From                                         Message
  ----     ------                 ----              ----                                         -------
  Normal   Scheduled              5m                default-scheduler                            Successfully assigned mypod to qe-lxia-39-container-master-etcd-1
  Normal   SuccessfulMountVolume  5m                kubelet, qe-lxia-39-container-master-etcd-1  MountVolume.SetUp succeeded for volume "default-token-gvcjv"
  Warning  FailedMount            1m (x10 over 5m)  kubelet, qe-lxia-39-container-master-etcd-1  MountVolume.SetUp failed for volume "pv-qexoq" : hostPath type check failed: /dev/zero is not a character device
  Warning  FailedMount            1m (x2 over 3m)   kubelet, qe-lxia-39-container-master-etcd-1  Unable to mount volumes for pod "mypod_default(406a38d5-1ecd-11e9-91e3-fa163e7184ee)": timeout expired waiting for volumes to attach/mount for pod "default"/"mypod". list of unattached/unmounted volumes=[my-volume]


Logs from node,
Jan 23 05:19:04 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:04.880174   55543 volume_host.go:218] using default mounter/exec for kubernetes.io/host-path
Jan 23 05:19:04 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:04.980361   55543 operation_executor.go:895] Starting operationExecutor.MountVolume for volume "pv-qexoq" (UniqueName: "kubernetes.io/host-path/406a38d5-1ecd-11e9-91e3-fa163e7184ee-pv-qexoq") pod "mypod" (UID: "406a38d5-1ecd-11e9-91e3-fa163e7184ee")
Jan 23 05:19:04 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:04.980414   55543 volume_host.go:218] using default mounter/exec for kubernetes.io/host-path
Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:05.080632   55543 operation_executor.go:895] Starting operationExecutor.MountVolume for volume "pv-qexoq" (UniqueName: "kubernetes.io/host-path/406a38d5-1ecd-11e9-91e3-fa163e7184ee-pv-qexoq") pod "mypod" (UID: "406a38d5-1ecd-11e9-91e3-fa163e7184ee")
Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:05.080678   55543 volume_host.go:218] using default mounter/exec for kubernetes.io/host-path
Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:05.177422   55543 kubelet.go:2123] Container runtime status: Runtime Conditions: RuntimeReady=true reason: message:, NetworkReady=true reason: message:
Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:05.180849   55543 operation_executor.go:895] Starting operationExecutor.MountVolume for volume "pv-qexoq" (UniqueName: "kubernetes.io/host-path/406a38d5-1ecd-11e9-91e3-fa163e7184ee-pv-qexoq") pod "mypod" (UID: "406a38d5-1ecd-11e9-91e3-fa163e7184ee")
Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:05.180903   55543 volume_host.go:218] using default mounter/exec for kubernetes.io/host-path
Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:05.180946   55543 reconciler.go:262] operationExecutor.MountVolume started for volume "pv-qexoq" (UniqueName: "kubernetes.io/host-path/406a38d5-1ecd-11e9-91e3-fa163e7184ee-pv-qexoq") pod "mypod" (UID: "406a38d5-1ecd-11e9-91e3-fa163e7184ee")
Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:05.181001   55543 nsenter.go:107] Running nsenter command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/stat -L --printf "%F" /dev/zero]
Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: E0123 05:19:05.183314   55543 nestedpendingoperations.go:267] Operation for "\"kubernetes.io/host-path/406a38d5-1ecd-11e9-91e3-fa163e7184ee-pv-qexoq\" (\"406a38d5-1ecd-11e9-91e3-fa163e7184ee\")" failed. No retries permitted until 2019-01-23 05:21:07.183283219 +0000 UTC m=+5819.451739875 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"pv-qexoq\" (UniqueName: \"kubernetes.io/host-path/406a38d5-1ecd-11e9-91e3-fa163e7184ee-pv-qexoq\") pod \"mypod\" (UID: \"406a38d5-1ecd-11e9-91e3-fa163e7184ee\") : hostPath type check failed: /dev/zero is not a character device"
Jan 23 05:19:05 qe-lxia-39-container-master-etcd-1 atomic-openshift-node[55517]: I0123 05:19:05.183690   55543 server.go:290] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"mypod", UID:"406a38d5-1ecd-11e9-91e3-fa163e7184ee", APIVersion:"v1", ResourceVersion:"23724", FieldPath:""}): type: 'Warning' reason: 'FailedMount' MountVolume.SetUp failed for volume "pv-qexoq" : hostPath type check failed: /dev/zero is not a character device

Comment 9 Qin Ping 2019-01-23 05:46:56 UTC

Hi Jan,

Can this issue be dropped from 3.9.z errata?

Comment 13 Liang Xia 2019-03-13 07:47:56 UTC

# oc version
oc v3.9.72
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://qe-lxia-39-master-etcd-1:8443
openshift v3.9.72
kubernetes v1.9.1+a0ce1bc657

=========================================================

# ls -l /dev/zero 
crw-rw-rw-. 1 root root 1, 5 Mar 13 07:33 /dev/zero

=========================================================

# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
dynamic                    1/1       Running   0          2m

=========================================================

# oc rsh dynamic
/ # ls -lh /mnt/ocp_pv 
crw-rw-rw-    1 root     root        1,   5 Mar 13 07:33 /mnt/ocp_pv

=========================================================

# cat installation_matrix 
atomic-openshift version: v3.9.72
Operation System: Red Hat Enterprise Linux Atomic Host release 7.5
Cluster Install Method: docker container
Docker Version: docker-1.13.1-58.git87f2fab.el7.x86_64
Docker Storage Driver:  overlay2
OpenvSwitch Version: openvswitch-2.9.0-97.el7fdp.x86_64
etcd Version: etcd-3.2.22-1.el7.x86_64
Network Plugin: redhat/openshift-ovs-subnet
Auth Method: allowall
Registry Deployment Method: deploymentconfig
Secure Registry: True
Registry Backend Storage: cinder
Load Balancer: None
Docker System Container: False
CRI-O Enable: False
Firewall Service: iptables

Comment 15 errata-xmlrpc 2019-04-09 14:20:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0619