1319985 – NFS Recycling error: Pod was active on the node longer than specified deadline, while no pod is using the PV

Bug 1319985 - NFS Recycling error: Pod was active on the node longer than specified deadline, while no pod is using the PV

Summary: NFS Recycling error: Pod was active on the node longer than specified deadlin...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	3.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Jan Safranek
QA Contact:	Jianwei Hou
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-03-22 02:46 UTC by Liang Xia
Modified:	2017-03-08 18:43 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-01-18 12:39:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:0066	0	normal	SHIPPED_LIVE	Red Hat OpenShift Container Platform 3.4 RPM Release Advisory	2017-01-18 17:23:26 UTC

Description Liang Xia 2016-03-22 02:46:08 UTC

Description of problem:
NFS Recycling error: Pod was active on the node longer than specified deadline, while no pod is using the PV.

Version-Release number of selected component (if applicable):
openshift v3.2.0.5
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5

How reproducible:
Always

Steps to Reproduce:
1. Create an NFS PV with reclaim policy to recycle
2. Create an PVC
3. Check PV, PVC
4. Delete PVC
5. Check PV status

Actual results:
In step 5, PV stays in release status for a long time(1 hour +), then become failed
$ oc describe pv nfs --config=/home/lxia/tmp/admin.kubeconfig
Name:        nfs
Labels:        <none>
Status:        Failed
Claim:        lxiap/nfsc
Reclaim Policy:    Recycle
Access Modes:    RWO
Capacity:    5Gi
Message:    Recycling error: Pod was active on the node longer than specified deadline
Source:
    Type:    NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    10.66.79.133
    Path:    /nfsshare
    ReadOnly:    false

Expected results:
PV should become Available status.

Additional info:

Comment 1 Jan Safranek 2016-03-23 11:21:44 UTC

Liang, can you please post logs from openshift-master process and from openshift-node where the pod was scheduled, both with log level 5? (openshift start --loglevel 5 ... &>log).

Also, is the NFS server working? Can you start a pod with the PVC and check that it can really access the NFS volume at step 3?


I have no idea what could be wrong here.

Comment 2 Liang Xia 2016-03-24 11:38:15 UTC

I'm not able to reproduce the bug today.

But I would like to give it another try tomorrow.


I will provides the logs requested in #comment 1 if reproduced. Or else will close it as not a bug.

Comment 3 Liang Xia 2016-03-25 02:12:40 UTC

I'm not able to reproduce it today, again.


This can be closed as not a bug now.

Comment 4 Liang Xia 2016-03-25 09:41:07 UTC

Finally, reproduced.

I0325 08:18:19.201993   16249 kubelet.go:2439] SyncLoop (SYNC): 1 pods; pv-recycler-nfs-rc9xz_openshift-infra(ea71be7d-f261-11e5-9fc4-54ee75528544)
I0325 08:18:19.202033   16249 kubelet.go:3254] Generating status for "pv-recycler-nfs-rc9xz_openshift-infra(ea71be7d-f261-11e5-9fc4-54ee75528544)"
I0325 08:18:19.202058   16249 kubelet.go:3221] pod waiting > 0, pending
I0325 08:18:19.202130   16249 manager.go:277] Ignoring same status for pod "pv-recycler-nfs-rc9xz_openshift-infra(ea71be7d-f261-11e5-9fc4-54ee75528544)", status: {Phase:Pending Conditions:[{Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2016-03-25 08:16:46 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [pv-recycler]}] Message: Reason: HostIP:10.66.136.78 PodIP: StartTime:2016-03-25 08:16:46 +0000 UTC ContainerStatuses:[{Name:pv-recycler State:{Waiting:0xc826f08600 Running:<nil> Terminated:<nil>} LastTerminationState:{Waiting:<nil> Running:<nil> Terminated:<nil>} Ready:false RestartCount:0 Image:openshift/origin-recycler:v1.1.4 ImageID: ContainerID:}]}
I0325 08:18:19.202274   16249 nsenter_mount.go:174] findmnt command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/findmnt -o target --noheadings --target /var/lib/origin/openshift.local.volumes/pods/ea71be7d-f261-11e5-9fc4-54ee75528544/volumes/kubernetes.io~nfs/vol]
I0325 08:18:19.217954   16249 nsenter_mount.go:179] Failed findmnt command: exit status 1
I0325 08:18:19.217986   16249 nfs.go:167] NFS mount set up: /var/lib/origin/openshift.local.volumes/pods/ea71be7d-f261-11e5-9fc4-54ee75528544/volumes/kubernetes.io~nfs/vol false <nil>
I0325 08:18:19.218112   16249 nsenter_mount.go:114] nsenter Mounting 10.66.79.133:/nfsshare /var/lib/origin/openshift.local.volumes/pods/ea71be7d-f261-11e5-9fc4-54ee75528544/volumes/kubernetes.io~nfs/vol nfs []
I0325 08:18:19.218126   16249 nsenter_mount.go:117] Mount command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/mount -t nfs 10.66.79.133:/nfsshare /var/lib/origin/openshift.local.volumes/pods/ea71be7d-f261-11e5-9fc4-54ee75528544/volumes/kubernetes.io~nfs/vol]
I0325 08:18:19.239073   16249 nsenter_mount.go:121] Output from mount command: mount.nfs: mount point /var/lib/origin/openshift.local.volumes/pods/ea71be7d-f261-11e5-9fc4-54ee75528544/volumes/kubernetes.io~nfs/vol does not exist
I0325 08:18:19.239108   16249 nsenter_mount.go:174] findmnt command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/findmnt -o target --noheadings --target /var/lib/origin/openshift.local.volumes/pods/ea71be7d-f261-11e5-9fc4-54ee75528544/volumes/kubernetes.io~nfs/vol]
I0325 08:18:19.253980   16249 nsenter_mount.go:179] Failed findmnt command: exit status 1
E0325 08:18:19.254087   16249 kubelet.go:1796] Unable to mount volumes for pod "pv-recycler-nfs-rc9xz_openshift-infra(ea71be7d-f261-11e5-9fc4-54ee75528544)": exit status 32; skipping pod
E0325 08:18:19.254099   16249 pod_workers.go:138] Error syncing pod ea71be7d-f261-11e5-9fc4-54ee75528544, skipping: exit status 32
I0325 08:18:19.254141   16249 server.go:606] Event(api.ObjectReference{Kind:"Pod", Namespace:"openshift-infra", Name:"pv-recycler-nfs-rc9xz", UID:"ea71be7d-f261-11e5-9fc4-54ee75528544", APIVersion:"v1", ResourceVersion:"613", FieldPath:""}): type: 'Warning' reason: 'FailedSync' Error syncing pod, skipping: exit status 32
I0325 08:18:19.254183   16249 server.go:606] Event(api.ObjectReference{Kind:"Pod", Namespace:"openshift-infra", Name:"pv-recycler-nfs-rc9xz", UID:"ea71be7d-f261-11e5-9fc4-54ee75528544", APIVersion:"v1", ResourceVersion:"613", FieldPath:""}): type: 'Warning' reason: 'FailedMount' Unable to mount volumes for pod "pv-recycler-nfs-rc9xz_openshift-infra(ea71be7d-f261-11e5-9fc4-54ee75528544)": exit status 32
I0325 08:18:19.254421   16249 request.go:541] Request Body: {"count":8,"lastTimestamp":"2016-03-25T08:18:19Z"}
I0325 08:18:19.254469   16249 round_trippers.go:264] PATCH https://10.66.136.78:8443/api/v1/namespaces/openshift-infra/events/pv-recycler-nfs-rc9xz.143f082ff9c43822
I0325 08:18:19.254475   16249 round_trippers.go:271] Request Headers:
I0325 08:18:19.254478   16249 round_trippers.go:274]     Accept: application/json, */*
I0325 08:18:19.254481   16249 round_trippers.go:274]     Content-Type: application/strategic-merge-patch+json
I0325 08:18:19.254484   16249 round_trippers.go:274]     User-Agent: openshift/v1.2.0 (linux/amd64) kubernetes/1b00abe
I0325 08:18:19.257292   16249 round_trippers.go:289] Response Status: 200 OK in 2 milliseconds
I0325 08:18:19.257301   16249 round_trippers.go:292] Response Headers:
I0325 08:18:19.257304   16249 round_trippers.go:295]     Content-Length: 808
I0325 08:18:19.257309   16249 round_trippers.go:295]     Cache-Control: no-store
I0325 08:18:19.257315   16249 round_trippers.go:295]     Content-Type: application/json
I0325 08:18:19.257322   16249 round_trippers.go:295]     Date: Fri, 25 Mar 2016 08:18:19 GMT
I0325 08:18:19.257356   16249 request.go:870] Response Body: {"kind":"Event","apiVersion":"v1","metadata":{"name":"pv-recycler-nfs-rc9xz.143f082ff9c43822","namespace":"openshift-infra","selfLink":"/api/v1/namespaces/openshift-infra/events/pv-recycler-nfs-rc9xz.143f082ff9c43822","uid":"ea7bb16e-f261-11e5-9fc4-54ee75528544","resourceVersion":"639","creationTimestamp":"2016-03-25T08:16:46Z","deletionTimestamp":"2016-03-25T10:18:19Z"},"involvedObject":{"kind":"Pod","namespace":"openshift-infra","name":"pv-recycler-nfs-rc9xz","uid":"ea71be7d-f261-11e5-9fc4-54ee75528544","apiVersion":"v1","resourceVersion":"613"},"reason":"FailedSync","message":"Error syncing pod, skipping: exit status 32","source":{"component":"kubelet","host":"dhcp-136-78.nay.redhat.com"},"firstTimestamp":"2016-03-25T08:16:46Z","lastTimestamp":"2016-03-25T08:18:19Z","count":8,"type":"Warning"}
I0325 08:18:19.257740   16249 request.go:541] Request Body: {"count":8,"lastTimestamp":"2016-03-25T08:18:19Z"}
I0325 08:18:19.257773   16249 round_trippers.go:264] PATCH https://10.66.136.78:8443/api/v1/namespaces/openshift-infra/events/pv-recycler-nfs-rc9xz.143f082ff9c382ca
I0325 08:18:19.257778   16249 round_trippers.go:271] Request Headers:
I0325 08:18:19.257782   16249 round_trippers.go:274]     User-Agent: openshift/v1.2.0 (linux/amd64) kubernetes/1b00abe
I0325 08:18:19.257786   16249 round_trippers.go:274]     Accept: application/json, */*
I0325 08:18:19.257789   16249 round_trippers.go:274]     Content-Type: application/strategic-merge-patch+json
I0325 08:18:19.276684   16249 round_trippers.go:289] Response Status: 200 OK in 18 milliseconds
I0325 08:18:19.276699   16249 round_trippers.go:292] Response Headers:
I0325 08:18:19.276703   16249 round_trippers.go:295]     Date: Fri, 25 Mar 2016 08:18:19 GMT
I0325 08:18:19.276706   16249 round_trippers.go:295]     Content-Length: 893
I0325 08:18:19.276710   16249 round_trippers.go:295]     Cache-Control: no-store
I0325 08:18:19.276712   16249 round_trippers.go:295]     Content-Type: application/json
I0325 08:18:19.276749   16249 request.go:870] Response Body: {"kind":"Event","apiVersion":"v1","metadata":{"name":"pv-recycler-nfs-rc9xz.143f082ff9c382ca","namespace":"openshift-infra","selfLink":"/api/v1/namespaces/openshift-infra/events/pv-recycler-nfs-rc9xz.143f082ff9c382ca","uid":"ea7c02a1-f261-11e5-9fc4-54ee75528544","resourceVersion":"640","creationTimestamp":"2016-03-25T08:16:46Z","deletionTimestamp":"2016-03-25T10:18:19Z"},"involvedObject":{"kind":"Pod","namespace":"openshift-infra","name":"pv-recycler-nfs-rc9xz","uid":"ea71be7d-f261-11e5-9fc4-54ee75528544","apiVersion":"v1","resourceVersion":"613"},"reason":"FailedMount","message":"Unable to mount volumes for pod \"pv-recycler-nfs-rc9xz_openshift-infra(ea71be7d-f261-11e5-9fc4-54ee75528544)\": exit status 32","source":{"component":"kubelet","host":"dhcp-136-78.nay.redhat.com"},"firstTimestamp":"2016-03-25T08:16:46Z","lastTimestamp":"2016-03-25T08:18:19Z","count":8,"type":"Warning"}




I0325 09:23:53.061337   16249 persistentvolume_recycler_controller.go:158] Recycler: checking PersistentVolume[nfs]
I0325 09:23:53.061380   16249 round_trippers.go:264] GET https://10.66.136.78:8443/api/v1/persistentvolumes/nfs
I0325 09:23:53.061383   16249 round_trippers.go:271] Request Headers:
I0325 09:23:53.061386   16249 round_trippers.go:274]     Accept: application/json, */*
I0325 09:23:53.061389   16249 round_trippers.go:274]     User-Agent: system:serviceaccount:openshift-infra:pv-recycler-controller
I0325 09:23:53.061392   16249 round_trippers.go:274]     Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3B
hY2UiOiJvcGVuc2hpZnQtaW5mcmEiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoicHYtcmVjeWNsZXItY29udHJvbGxlci10b2tlbi1ieXFyMyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJwdi
1yZWN5Y2xlci1jb250cm9sbGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiYmMwOTY5ZjMtZjI1ZS0xMWU1LTlmYzQtNTRlZTc1NTI4NTQ0Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1pbmZyYTpwd
i1yZWN5Y2xlci1jb250cm9sbGVyIn0.UwYQ2sy9Wkcl2pmfMyZM5eGOgRoI3-tK9oJItvDxEFGYT4s3Wy2EeAPKTGhrgsDKKlqJchiBC5wte5J539x36iNyA4kBow__RXO_6m5Gu8tKaM4SwMAcCud8wXjsi25WUq75zNOL0HPH0Tm_-_YKFkUInxvtwntFx14KrLvpJXIDIx6RtVVt
EZEHOCIB_EOmx89_BCwGuoHbe1l4C9HrXuFcnADKFHGYxj0-gN_RCz2ahhJEMjQOdwucS2GRwxG5IGXAXU2tMnf3Jl9Lw2ZcHc9H-KiK49KOX7UKM7XdltWgkbSkFs7JndsldgpzMAB1ku4jXVA8MIDFpbYvErQtHw
I0325 09:23:53.062188   16249 round_trippers.go:289] Response Status: 200 OK in 1 milliseconds
I0325 09:23:53.062196   16249 round_trippers.go:292] Response Headers:
I0325 09:23:53.062198   16249 round_trippers.go:295]     Cache-Control: no-store
I0325 09:23:53.062201   16249 round_trippers.go:295]     Date: Fri, 25 Mar 2016 09:23:53 GMT
I0325 09:23:53.062248   16249 round_trippers.go:289] Response Status: 200 OK in 0 milliseconds
I0325 09:23:53.062252   16249 round_trippers.go:292] Response Headers:
I0325 09:23:53.062255   16249 round_trippers.go:295]     Cache-Control: no-store
I0325 09:23:53.062257   16249 round_trippers.go:295]     Date: Fri, 25 Mar 2016 09:23:53 GMT
I0325 09:23:53.065250   16249 round_trippers.go:264] GET https://10.66.136.78:8443/api/v1/persistentvolumes?resourceVersion=0
I0325 09:23:53.065258   16249 round_trippers.go:271] Request Headers:
I0325 09:23:53.065263   16249 round_trippers.go:274]     Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtaW5mcmEiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoicHYtYmluZGVyLWNvbnRyb2xsZXItdG9rZW4tdTlueTgiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHYtYmluZGVyLWNvbnRyb2xsZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJiYmYwZjBjMy1mMjVlLTExZTUtOWZjNC01NGVlNzU1Mjg1NDQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6b3BlbnNoaWZ0LWluZnJhOnB2LWJpbmRlci1jb250cm9sbGVyIn0.dEzCT2sK_9g6NqxlK3ZghBlgSWHJnNHQFLP_BqYxndHD2rAikEagwuEX240OKBLdDS7ASyFQR8Px1gbmFhTWcRH8b7SrHkkXq9aLd9B9fNiNqdpK4eYL462_G_VqjHRpAxYxW82QTt8EBsBlfsrE_aOQLJFa9fwV92GLw9fKTe43cRBG9hAcPBWpUaUhkdgfKM20PPH9bd5d6dGksZwzVjnL030ch4BbMKImDJvWRNa6ho8wXOcbXuUoQheA5wRmEuCWQWrmQwMedotZLxg2t_yFgkY9SUM89pIJ1aYlPnhllbfK4u8dStzRA-Zo4hZawPnOOwj-UPlCt2Z-QLJCjw
I0325 09:23:53.065268   16249 round_trippers.go:274]     Accept: application/json, */*
I0325 09:23:53.065273   16249 round_trippers.go:274]     User-Agent: system:serviceaccount:openshift-infra:pv-binder-controller
I0325 09:23:53.069782   16249 round_trippers.go:289] Response Status: 200 OK in 8 milliseconds
I0325 09:23:53.069791   16249 round_trippers.go:292] Response Headers:
I0325 09:23:53.069794   16249 round_trippers.go:295]     Date: Fri, 25 Mar 2016 09:23:53 GMT
I0325 09:23:53.069797   16249 round_trippers.go:295]     Content-Length: 673
I0325 09:23:53.069799   16249 round_trippers.go:295]     Cache-Control: no-store
I0325 09:23:53.069801   16249 round_trippers.go:295]     Content-Type: application/json
I0325 09:23:53.069822   16249 request.go:870] Response Body: {"kind":"PersistentVolume","apiVersion":"v1","metadata":{"name":"nfs","selfLink":"/api/v1/persistentvolumes/nfs","uid":"d8561af3-f25e-11e5-9fc4-54ee75528544","resourceVersion":"1206","creationTimestamp":"2016-03-25T07:54:47Z"},"spec":{"capacity":{"storage":"5Gi"},"nfs":{"server":"10.66.79.133","path":"/nfsshare"},"accessModes":["ReadWriteOnce"],"claimRef":{"kind":"PersistentVolumeClaim","namespace":"default","name":"nfsc","uid":"dc7aa9f0-f25e-11e5-9fc4-54ee75528544","apiVersion":"v1","resourceVersion":"269"},"persistentVolumeReclaimPolicy":"Recycle"},"status":{"phase":"Failed","message":"Recycling error: Pod was active on the node longer than specified deadline"}}
I0325 09:23:53.069843   16249 round_trippers.go:289] Response Status: 200 OK in 4 milliseconds
I0325 09:23:53.069849   16249 round_trippers.go:292] Response Headers:
I0325 09:23:53.069853   16249 round_trippers.go:295]     Cache-Control: no-store
I0325 09:23:53.069856   16249 round_trippers.go:295]     Content-Type: application/json
I0325 09:23:53.069859   16249 round_trippers.go:295]     Date: Fri, 25 Mar 2016 09:23:53 GMT
I0325 09:23:53.069861   16249 round_trippers.go:295]     Content-Length: 766
I0325 09:23:53.069885   16249 request.go:870] Response Body: {"kind":"PersistentVolumeList","apiVersion":"v1","metadata":{"selfLink":"/api/v1/persistentvolumes","resourceVersion":"1238"},"items":[{"metadata":{"n
ame":"nfs","selfLink":"/api/v1/persistentvolumes/nfs","uid":"d8561af3-f25e-11e5-9fc4-54ee75528544","resourceVersion":"1206","creationTimestamp":"2016-03-25T07:54:47Z"},"spec":{"capacity":{"storage":"5Gi"},"nfs":
{"server":"10.66.79.133","path":"/nfsshare"},"accessModes":["ReadWriteOnce"],"claimRef":{"kind":"PersistentVolumeClaim","namespace":"default","name":"nfsc","uid":"dc7aa9f0-f25e-11e5-9fc4-54ee75528544","apiVersio
n":"v1","resourceVersion":"269"},"persistentVolumeReclaimPolicy":"Recycle"},"status":{"phase":"Failed","message":"Recycling error: Pod was active on the node longer than specified deadline"}}]}
I0325 09:23:53.070022   16249 persistentvolume_recycler_controller.go:189] PersistentVolume[nfs] phase Failed - skipping: The volume is not in 'Released' phase
I0325 09:23:53.070104   16249 round_trippers.go:264] GET https://10.66.136.78:8443/api/v1/watch/persistentvolumes?resourceVersion=1238&timeoutSeconds=596
I0325 09:23:53.070110   16249 round_trippers.go:271] Request Headers:
I0325 09:23:53.070112   16249 round_trippers.go:274]     User-Agent: system:serviceaccount:openshift-infra:pv-binder-controller
I0325 09:23:53.070115   16249 round_trippers.go:274]     Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3B
hY2UiOiJvcGVuc2hpZnQtaW5mcmEiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoicHYtYmluZGVyLWNvbnRyb2xsZXItdG9rZW4tdTlueTgiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHYtYm
luZGVyLWNvbnRyb2xsZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJiYmYwZjBjMy1mMjVlLTExZTUtOWZjNC01NGVlNzU1Mjg1NDQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6b3BlbnNoaWZ0LWluZnJhOnB2LWJpb
mRlci1jb250cm9sbGVyIn0.dEzCT2sK_9g6NqxlK3ZghBlgSWHJnNHQFLP_BqYxndHD2rAikEagwuEX240OKBLdDS7ASyFQR8Px1gbmFhTWcRH8b7SrHkkXq9aLd9B9fNiNqdpK4eYL462_G_VqjHRpAxYxW82QTt8EBsBlfsrE_aOQLJFa9fwV92GLw9fKTe43cRBG9hAcPBWpUaUh
kdgfKM20PPH9bd5d6dGksZwzVjnL030ch4BbMKImDJvWRNa6ho8wXOcbXuUoQheA5wRmEuCWQWrmQwMedotZLxg2t_yFgkY9SUM89pIJ1aYlPnhllbfK4u8dStzRA-Zo4hZawPnOOwj-UPlCt2Z-QLJCjw
I0325 09:23:53.070168   16249 persistentvolume_claim_binder_controller.go:194] Synchronizing PersistentVolume[nfs], current phase: Failed
I0325 09:23:53.070223   16249 round_trippers.go:264] GET https://10.66.136.78:8443/api/v1/persistentvolumes/nfs
I0325 09:23:53.070238   16249 round_trippers.go:271] Request Headers:
I0325 09:23:53.070247   16249 round_trippers.go:274]     User-Agent: system:serviceaccount:openshift-infra:pv-binder-controller
I0325 09:23:53.070252   16249 round_trippers.go:274]     Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3B
hY2UiOiJvcGVuc2hpZnQtaW5mcmEiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoicHYtYmluZGVyLWNvbnRyb2xsZXItdG9rZW4tdTlueTgiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHYtYmluZGVyLWNvbnRyb2xsZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJiYmYwZjBjMy1mMjVlLTExZTUtOWZjNC01NGVlNzU1Mjg1NDQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6b3BlbnNoaWZ0LWluZnJhOnB2LWJpbmRlci1jb250cm9sbGVyIn0.dEzCT2sK_9g6NqxlK3ZghBlgSWHJnNHQFLP_BqYxndHD2rAikEagwuEX240OKBLdDS7ASyFQR8Px1gbmFhTWcRH8b7SrHkkXq9aLd9B9fNiNqdpK4eYL462_G_VqjHRpAxYxW82QTt8EBsBlfsrE_aOQLJFa9fwV92GLw9fKTe43cRBG9hAcPBWpUaUhkdgfKM20PPH9bd5d6dGksZwzVjnL030ch4BbMKImDJvWRNa6ho8wXOcbXuUoQheA5wRmEuCWQWrmQwMedotZLxg2t_yFgkY9SUM89pIJ1aYlPnhllbfK4u8dStzRA-Zo4hZawPnOOwj-UPlCt2Z-QLJCjw
I0325 09:23:53.070258   16249 round_trippers.go:274]     Accept: application/json, */*
I0325 09:23:53.071194   16249 round_trippers.go:289] Response Status: 200 OK in 1 milliseconds
I0325 09:23:53.071202   16249 round_trippers.go:292] Response Headers:
I0325 09:23:53.071207   16249 round_trippers.go:295]     Cache-Control: no-store
I0325 09:23:53.071213   16249 round_trippers.go:295]     Date: Fri, 25 Mar 2016 09:23:53 GMT
I0325 09:23:53.074072   16249 round_trippers.go:289] Response Status: 200 OK in 3 milliseconds
I0325 09:23:53.074082   16249 round_trippers.go:292] Response Headers:
I0325 09:23:53.074087   16249 round_trippers.go:295]     Cache-Control: no-store
I0325 09:23:53.074091   16249 round_trippers.go:295]     Content-Type: application/json
I0325 09:23:53.074095   16249 round_trippers.go:295]     Date: Fri, 25 Mar 2016 09:23:53 GMT
I0325 09:23:53.074100   16249 round_trippers.go:295]     Content-Length: 673
I0325 09:23:53.074127   16249 request.go:870] Response Body: {"kind":"PersistentVolume","apiVersion":"v1","metadata":{"name":"nfs","selfLink":"/api/v1/persistentvolumes/nfs","uid":"d8561af3-f25e-11e5-9fc4-54ee75528544","resourceVersion":"1206","creationTimestamp":"2016-03-25T07:54:47Z"},"spec":{"capacity":{"storage":"5Gi"},"nfs":{"server":"10.66.79.133","path":"/nfsshare"},"accessModes":["ReadWriteOnce"],"claimRef":{"kind":"PersistentVolumeClaim","namespace":"default","name":"nfsc","uid":"dc7aa9f0-f25e-11e5-9fc4-54ee75528544","apiVersion":"v1","resourceVersion":"269"},"persistentVolumeReclaimPolicy":"Recycle"},"status":{"phase":"Failed","message":"Recycling error: Pod was active on the node longer than specified deadline"}}
I0325 09:23:53.074381   16249 persistentvolume_claim_binder_controller.go:334] PersistentVolume[nfs] previously failed recycling.  Skipping.

Comment 5 Jan Safranek 2016-03-30 18:43:00 UTC

> Pod was active on the node longer than specified deadline

Looking at the code now, it looks like the recycler pod was running rm -rf on the volume for longer time than expected. Recycler calculates the timeout based on controller-manager parameters --pv-recycler-increment-timeout-nfs and --pv-recycler-minimum-timeout-nfs, meaning that we wait for <pv-recycler-increment-timeout-nfs> * <volume size in GiB> seconds. This is by default 30 * <volume size in GiB> seconds, 300 second at minimum.

The recycler is doing the right thing giving up after this timeout - we can't wait forever, the message is just not clear enough.

How many files are on the volume? What it its size? How long does it take if you mount the volume and do simple rm -rf /mnt/volume/*?

What I can do is to rephrase the message and that would be it.

Comment 6 Liang Xia 2016-03-31 02:08:16 UTC

We don't have too much files, and the file size is really small.

[root@NAY-NFS ~]# ls -lhR /nfsshare/
/nfsshare/:
total 0
-rw-r--r--. 1 1001100000 nfsnobody 0 Mar 25 14:15 file
-rw-r--r--. 1 1000050000 nfsnobody 0 Mar 30 11:21 file-rox
[root@NAY-NFS ~]# ls -lhd /nfsshare/
drwxrwx---. 2 nfsnobody nfsnobody 32 Mar 30 11:21 /nfsshare/

Comment 7 Jan Safranek 2016-04-07 16:02:28 UTC

Liang, could you provide me access to a machine where you can reproduce this? There must be something wrong in NFS setup there - it can't take 500 seconds to delete two files. Also, it might be related to 1313210 (containerized OpenShift, it's not Cinder-specific).

Comment 8 Liang Xia 2016-04-08 08:13:28 UTC

A simpler way to reproduce:


1. docker pull openshift/origin
2. docker run -d --name "origin"  --privileged --pid=host --net=host  -v /:/rootfs:ro -v /var/run:/var/run:rw -v /sys:/sys -v /var/lib/docker:/var/lib/docker:rw  -v /var/lib/origin/openshift.local.volumes:/var/lib/origin/openshift.local.volumes  openshift/origin start  --loglevel=5
3. docker exec -it origin bash
4. create a PV with valid NFS server IP and invalid PATH. (e.g, https://github.com/openshift-qe/v3-testfiles/blob/master/persistent-volumes/nfs/pv-invalid-path.json , replace #NFS-Service-IP with valid IP)
5. create PVC. (e.g, https://github.com/openshift-qe/v3-testfiles/blob/master/persistent-volumes/nfs/claim-rwx.json)
6. check the PV/PVC are bound.
7. delete the PVC.
8. wait, and check the PV.

# oc get pv ; oc describe pv nfs                                                             
NAME      CAPACITY   ACCESSMODES   STATUS    CLAIM          REASON    AGE
nfs       1Gi        RWX           Failed    default/nfsc             1h
Name:		nfs
Labels:		<none>
Status:		Failed
Claim:		default/nfsc
Reclaim Policy:	Recycle
Access Modes:	RWX
Capacity:	1Gi
Message:	Recycling error: Pod was active on the node longer than specified deadline
Source:
    Type:	NFS (an NFS mount that lasts the lifetime of a pod)
    Server:	10.66.79.133
    Path:	/non-exist-path
    ReadOnly:	false


I will provide a environment if above steps does not reproduce the issue.

Comment 9 Jan Safranek 2016-04-08 15:35:14 UTC

1) This fails with:

persistentvolume_recycler_controller.go:245] PersistentVolume[nfs] failed recycling: Unexpected error creating recycler pod:  pods "pv-recycler-nfs-" is forbidden: unable to validate against any security context constraint: [securityContext.runAsUser: Invalid value: 0: UID on container pv-recycler does not match required range.  Found 0, required min: 1000020000 max: 1000029999

2) That's not what you reported in comment #6 - you say that the recycler can't remove 2 files in 5 minutes. Not that it times out when mounting an unavailable export!

Comment 10 Paul Morie 2016-04-08 19:47:23 UTC

The failure Jan pointed out is due to bug 1324418, which is fixed by https://github.com/openshift/origin/pull/8432 -- I suggest recreating after that PR is merged.

Comment 11 Jan Safranek 2016-04-11 11:46:07 UTC

With PR 8432 and PersistentVolume.Spec pointing to an invalid NFS share, we schedule a recycler pod that is assigned to a kubelet. Kubelet tries to mount the NFS share every now and then and after 5 minutes it gives up with "Pod was active on the node longer than specified deadline".

IMO, this is a correct behavior. We can't try forever. On the other hand, the message in `oc describe pv` should be clearer. Paul, can we get more descriptive error from the terminated pod? Like "Could not mount XXX for YYY seconds"?

Comment 12 Jan Safranek 2016-04-11 11:47:11 UTC

Anyway, this is not a blocker, everything works as expected, just the failure message to users is not clear.

Comment 13 Jan Safranek 2016-06-15 13:15:33 UTC

Started working on this, the easiest and IMO best thing is to pass all Warning events from a recycler pod to the PV that is being recycled. When the pod dies the events will be still available in the PV.


`kubectl describe pv` would show:

Name:           nfs
Labels:         <none>
Status:         Failed
Claim:          default/nfs
Reclaim Policy: Recycle
Access Modes:   RWX
Capacity:	1Mi
Message:        Recycling error: Pod was active on the node longer than specified deadline
Source:
    Type:	NFS (an NFS mount that lasts the lifetime of a pod)
    Server:     10.999.999.999
    Path:	/
    ReadOnly:   false
Events:
  FirstSeen	LastSeen	Count	From			SubobjectPath	Type		Reason			Message
  ---------	--------	-----	----			-------------	--------	------			-------
  1m		1m		3	{default-scheduler }			Warning		FailedScheduling	Recycling: no nodes available to schedule pods
  1m		1m		1	{default-scheduler }			Normal		Scheduled		Recycling: Successfully assigned recycler-for-nfs to 127.0.0.1
  1m		12s		5	{kubelet 127.0.0.1}			Warning		FailedMount		Recycling: Unable to mount volumes for pod "recycler-for-nfs_default(e061c5cb-32fa-11e6-80e2-3c970e965218)": Mount failed: exit status 32

Mounting arguments: 10.999.999.999:/ /var/lib/kubelet/pods/e061c5cb-32fa-11e6-80e2-3c970e965218/volumes/kubernetes.io~nfs/vol nfs []
Output: mount.nfs: Failed to resolve server 10.999.999.999: No address associated with hostname

Comment 14 Jan Safranek 2016-06-15 13:19:16 UTC

You can still see quite cryptic "Pod was active on the node longer than specified deadline" in Message field, events should help to diagnose the problem further.

Also, these events could be sent to PV as they appear in the pod, even before the PV gets "Failed".

Comment 15 Jan Safranek 2016-06-20 11:52:06 UTC

Filled https://github.com/kubernetes/kubernetes/pull/27714

Comment 18 Jan Safranek 2016-11-21 08:51:17 UTC

It has been merged in https://github.com/openshift/origin/pull/11258

Comment 19 Jan Safranek 2016-11-21 08:52:39 UTC

Oops, sorry, this is the right one: https://github.com/openshift/origin/pull/11259

Comment 20 Liang Xia 2016-11-21 09:46:31 UTC

Verified the issue has been fixed. 

# openshift version
openshift v3.4.0.28+dfe3a66
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

# oc describe pv nfs-sh3oy
Name:		nfs-sh3oy
Labels:		<none>
StorageClass:	
Status:		Released
Claim:		sh3oy/nfsc-sh3oy
Reclaim Policy:	Recycle
Access Modes:	RWX
Capacity:	5Gi
Message:	
Source:
    Type:	NFS (an NFS mount that lasts the lifetime of a pod)
    Server:	172.30.3.151
    Path:	/non-exist-path
    ReadOnly:	false
Events:
  FirstSeen	LastSeen	Count	From				SubobjectPath	Type		Reason		Message
  ---------	--------	-----	----				-------------	--------	------		-------
  2m		2m		1	{persistentvolume-controller }			Normal		RecyclerPod	Recycler pod: Successfully assigned recycler-for-nfs-sh3oy to qe-lxia-ocp34-node-registry-router-1
  17s		17s		1	{persistentvolume-controller }			Warning		RecyclerPod	Recycler pod: Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs-sh3oy"/"openshift-infra". list of unattached/unmounted volumes=[vol]
  16s		16s		1	{persistentvolume-controller }			Warning		RecyclerPod	Recycler pod: Unable to mount volumes for pod "recycler-for-nfs-sh3oy_openshift-infra(8f30b0d4-afce-11e6-9347-42010af00019)": timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs-sh3oy"/"openshift-infra". list of unattached/unmounted volumes=[vol]
  16s		16s		1	{persistentvolume-controller }			Warning		RecyclerPod	Recycler pod: MountVolume.SetUp failed for volume "kubernetes.io/nfs/8f30b0d4-afce-11e6-9347-42010af00019-vol" (spec.Name: "vol") pod "8f30b0d4-afce-11e6-9347-42010af00019" (UID: "8f30b0d4-afce-11e6-9347-42010af00019") with: mount failed: exit status 32
Mounting arguments: 172.30.3.151:/non-exist-path /var/lib/origin/openshift.local.volumes/pods/8f30b0d4-afce-11e6-9347-42010af00019/volumes/kubernetes.io~nfs/vol nfs []
Output: mount.nfs: Connection timed out

Comment 21 Troy Dawson 2016-11-23 21:28:57 UTC

This has been merged into ocp and is in OCP v3.4.0.29 or newer.

Comment 23 Liang Xia 2016-11-24 04:43:47 UTC

Move to verified based on #comment 20

Comment 25 errata-xmlrpc 2017-01-18 12:39:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066

Note You need to log in before you can comment on or make changes to this bug.