Description of problem: - After restarting system (with yum update), registry pod doesn't mount pv so doesn't start correctly. Version-Release number of selected component (if applicable): - 3.1.1.x How reproducible: - restart system, but it is not 100% reproduceable Actual results: Mar 16 13:14:01 node1.example.com atomic-openshift-node[2624]: E0316 13:14:01.148425 2624 nfs.go:178] IsLikelyNotMountPoint check failed: stat /var/lib/origin/openshift.local.volumes/pods/e707918c-eba9-11e5-992b-fa163e37a7eb/volumes/kubernetes.io~nfs/registry-volume: no such file or directory Mar 16 13:14:01 node1.example.com atomic-openshift-node[2624]: E0316 13:14:01.148461 2624 kubelet.go:1521] Unable to mount volumes for pod "docker-registry-13-5fist_default": Mount failed: exit status 32 Mar 16 13:14:01 node1.example.com atomic-openshift-node[2624]: Mounting arguments: 10.31.132.45:/IPC1_ePaasv2_eng2_DReg /var/lib/origin/openshift.local.volumes/pods/e707918c-eba9-11e5-992b-fa163e37a7eb/volumes/kubernetes.io~nfs/registry-volume nfs [] Mar 16 13:14:01 node1.example.com atomic-openshift-node[2624]: Output: mount.nfs: mounting 10.31.132.45:/IPC1_ePaasv2_eng2_DReg failed, reason given by server: No such file or directory Mar 16 13:14:01 node1.example.com atomic-openshift-node[2624]: ; skipping pod Mar 16 13:14:01 node1.example.com atomic-openshift-node[2624]: E0316 13:14:01.149884 2624 pod_workers.go:113] Error syncing pod e707918c-eba9-11e5-992b-fa163e37a7eb, skipping: Mount failed: exit status 32 Mar 16 13:14:01 node1.example.com atomic-openshift-node[2624]: Mounting arguments: 10.31.132.45:/IPC1_ePaasv2_eng2_DReg /var/lib/origin/openshift.local.volumes/pods/e707918c-eba9-11e5-992b-fa163e37a7eb/volumes/kubernetes.io~nfs/registry-volume nfs [] Mar 16 13:14:01 node1.example.com atomic-openshift-node[2624]: Output: mount.nfs: mounting 10.31.132.45:/IPC1_ePaasv2_eng2_DReg failed, reason given by server: No such file or directory Mar 16 13:14:02 node1.example.com atomic-openshift-node[2624]: W0316 13:14:02.046270 2624 kubelet.go:1750] Orphaned volume "e707918c-eba9-11e5-992b-fa163e37a7eb/registry-volume" found, tearing down volume Mar 16 13:15:10 node1.example.com atomic-openshift-node[2624]: E0316 13:15:10.228935 2624 nfs.go:178] IsLikelyNotMountPoint check failed: stat /var/lib/origin/openshift.local.volumes/pods/e707918c-eba9-11e5-992b-fa163e37a7eb/volumes/kubernetes.io~nfs/registry-volume: no such file or directory Expected results: - Without error, registry pod should mount the NFS pv. Additional info: - Upstream(k8s) has similar report https://github.com/kubernetes/kubernetes/issues/20734
Exit code 32 means the mountpoint is busy or already in use -- what does the mount table look like _after_ you restart the system, but _before_ you restart openshift?
This sure looks like https://github.com/kubernetes/kubernetes/issues/20734
I'm working on a backport for the fix of this to 3.1.1
Verified with: [root@openshift-143 ~]# openshift version openshift v3.1.1.6-33-g81eabcc kubernetes v1.1.0-origin-1107-g4c8e6f4 etcd 2.1.2 atomic-openshift-sdn-ovs-3.1.1.6-4.git.32.adf8ec9.el7aos.x86_64 atomic-openshift-3.1.1.6-4.git.32.adf8ec9.el7aos.x86_64 tuned-profiles-atomic-openshift-node-3.1.1.6-4.git.32.adf8ec9.el7aos.x86_64 atomic-openshift-node-3.1.1.6-4.git.32.adf8ec9.el7aos.x86_64 atomic-openshift-clients-3.1.1.6-4.git.32.adf8ec9.el7aos.x86_64 atomic-openshift-master-3.1.1.6-4.git.32.adf8ec9.el7aos.x86_64 Steps: 1. Stop atomic-openshift-node 2. On node where docker-registry is hosted: mount|grep nfs openshift-143.lab.sjc.redhat.com:/var/lib/exports/regpv on /var/lib/origin/openshift.local.volumes/pods/7ff08b81-eff0-11e5-9c0a-fa163e88dd03/volumes/kubernetes.io~nfs/regpv-volume type nfs4 (rw,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.10,local_lock=none,addr=10.14.6.143) 3. Reboot the node 4. atomic-openshift-node starts after node reboots 5. On node where docker-registry is hosted: mount|grep nfs openshift-143.lab.sjc.redhat.com:/var/lib/exports/regpv on /var/lib/origin/openshift.local.volumes/pods/7ff08b81-eff0-11e5-9c0a-fa163e88dd03/volumes/kubernetes.io~nfs/regpv-volume type nfs4 (rw,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.10,local_lock=none,addr=10.14.6.143) 6. oc get pods -n default The docker-registry pod is healthy and running
Jianwei- Were you able to reproduce the original issue on a build from before the patch went in?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:0510