1318472 – Registry pod doesn't mount persistent volume(NFS) after restarting system

Bug 1318472 - Registry pod doesn't mount persistent volume(NFS) after restarting system

Summary: Registry pod doesn't mount persistent volume(NFS) after restarting system

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	3.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	3.1.1
Assignee:	Paul Morie
QA Contact:	Jianwei Hou
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-03-17 01:08 UTC by Kenjiro Nakayama
Modified:	2019-10-10 11:34 UTC (History)
CC List:	7 users (show)
Fixed In Version:	atomic-openshift-3.1.1.6-4.git.32.adf8ec9.el7aos
Doc Type:	Bug Fix
Doc Text:	Cause: Persistent Volume Claims were added to the list of volumes to preserve rather than the actual name of the Persistent Volume associated with the Persistent Volume Claim. Consequence: The periodic cleanup process would unmount the volume if a pod utilizing the Persistent Volume Claim had not yet entered running state. Fix: The actual name of the Persistent Volume associated with a Persistent Volume Claim is used when determining which volumes can be cleaned up, preventing the cleanup process from considering them orphaned. Result: Persistent Volumes are no longer unmounted while the pod requiring the volume is starting.
Clone Of:
Environment:
Last Closed:	2016-03-24 15:54:02 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0510	0	normal	SHIPPED_LIVE	Red Hat OpenShift Enterprise bug fix update	2016-03-24 19:53:32 UTC

Description Kenjiro Nakayama 2016-03-17 01:08:52 UTC

Description of problem:

- After restarting system (with yum update), registry pod doesn't mount pv so doesn't start correctly.

Version-Release number of selected component (if applicable):

- 3.1.1.x


How reproducible:

- restart system, but it is not 100% reproduceable 

Actual results:

Mar 16 13:14:01 node1.example.com atomic-openshift-node[2624]: E0316 13:14:01.148425    2624 nfs.go:178] IsLikelyNotMountPoint check failed: stat /var/lib/origin/openshift.local.volumes/pods/e707918c-eba9-11e5-992b-fa163e37a7eb/volumes/kubernetes.io~nfs/registry-volume: no such file or directory
Mar 16 13:14:01 node1.example.com atomic-openshift-node[2624]: E0316 13:14:01.148461    2624 kubelet.go:1521] Unable to mount volumes for pod "docker-registry-13-5fist_default": Mount failed: exit status 32
Mar 16 13:14:01 node1.example.com atomic-openshift-node[2624]: Mounting arguments: 10.31.132.45:/IPC1_ePaasv2_eng2_DReg /var/lib/origin/openshift.local.volumes/pods/e707918c-eba9-11e5-992b-fa163e37a7eb/volumes/kubernetes.io~nfs/registry-volume nfs []
Mar 16 13:14:01 node1.example.com atomic-openshift-node[2624]: Output: mount.nfs: mounting 10.31.132.45:/IPC1_ePaasv2_eng2_DReg failed, reason given by server: No such file or directory
Mar 16 13:14:01 node1.example.com atomic-openshift-node[2624]: ; skipping pod
Mar 16 13:14:01 node1.example.com atomic-openshift-node[2624]: E0316 13:14:01.149884    2624 pod_workers.go:113] Error syncing pod e707918c-eba9-11e5-992b-fa163e37a7eb, skipping: Mount failed: exit status 32
Mar 16 13:14:01 node1.example.com atomic-openshift-node[2624]: Mounting arguments: 10.31.132.45:/IPC1_ePaasv2_eng2_DReg /var/lib/origin/openshift.local.volumes/pods/e707918c-eba9-11e5-992b-fa163e37a7eb/volumes/kubernetes.io~nfs/registry-volume nfs []
Mar 16 13:14:01 node1.example.com atomic-openshift-node[2624]: Output: mount.nfs: mounting 10.31.132.45:/IPC1_ePaasv2_eng2_DReg failed, reason given by server: No such file or directory
Mar 16 13:14:02 node1.example.com atomic-openshift-node[2624]: W0316 13:14:02.046270    2624 kubelet.go:1750] Orphaned volume "e707918c-eba9-11e5-992b-fa163e37a7eb/registry-volume" found, tearing down volume
Mar 16 13:15:10 node1.example.com atomic-openshift-node[2624]: E0316 13:15:10.228935    2624 nfs.go:178] IsLikelyNotMountPoint check failed: stat /var/lib/origin/openshift.local.volumes/pods/e707918c-eba9-11e5-992b-fa163e37a7eb/volumes/kubernetes.io~nfs/registry-volume: no such file or directory

Expected results:

- Without error, registry pod should mount the NFS pv.

Additional info:

- Upstream(k8s) has similar report https://github.com/kubernetes/kubernetes/issues/20734

Comment 1 Paul Morie 2016-03-17 15:31:54 UTC

Exit code 32 means the mountpoint is busy or already in use -- what does the mount table look like _after_ you restart the system, but _before_ you restart openshift?

Comment 4 Andy Goldstein 2016-03-18 16:11:32 UTC

This sure looks like https://github.com/kubernetes/kubernetes/issues/20734

Comment 6 Paul Morie 2016-03-18 19:20:55 UTC

I'm working on a backport for the fix of this to 3.1.1

Comment 11 Jianwei Hou 2016-03-22 06:02:58 UTC

Verified with:

[root@openshift-143 ~]# openshift version
openshift v3.1.1.6-33-g81eabcc
kubernetes v1.1.0-origin-1107-g4c8e6f4
etcd 2.1.2

atomic-openshift-sdn-ovs-3.1.1.6-4.git.32.adf8ec9.el7aos.x86_64
atomic-openshift-3.1.1.6-4.git.32.adf8ec9.el7aos.x86_64
tuned-profiles-atomic-openshift-node-3.1.1.6-4.git.32.adf8ec9.el7aos.x86_64
atomic-openshift-node-3.1.1.6-4.git.32.adf8ec9.el7aos.x86_64
atomic-openshift-clients-3.1.1.6-4.git.32.adf8ec9.el7aos.x86_64
atomic-openshift-master-3.1.1.6-4.git.32.adf8ec9.el7aos.x86_64

Steps:
1. Stop atomic-openshift-node
2. On node where docker-registry is hosted: mount|grep nfs
openshift-143.lab.sjc.redhat.com:/var/lib/exports/regpv on /var/lib/origin/openshift.local.volumes/pods/7ff08b81-eff0-11e5-9c0a-fa163e88dd03/volumes/kubernetes.io~nfs/regpv-volume type nfs4 (rw,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.10,local_lock=none,addr=10.14.6.143)

3. Reboot the node
4. atomic-openshift-node starts after node reboots
5. On node where docker-registry is hosted: mount|grep nfs
openshift-143.lab.sjc.redhat.com:/var/lib/exports/regpv on /var/lib/origin/openshift.local.volumes/pods/7ff08b81-eff0-11e5-9c0a-fa163e88dd03/volumes/kubernetes.io~nfs/regpv-volume type nfs4 (rw,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.10,local_lock=none,addr=10.14.6.143)
6. oc get pods -n default
The docker-registry pod is healthy and running

Comment 12 Paul Morie 2016-03-23 20:56:26 UTC

Jianwei-

Were you able to reproduce the original issue on a build from before the patch went in?

Comment 15 errata-xmlrpc 2016-03-24 15:54:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:0510

Note You need to log in before you can comment on or make changes to this bug.