Description of problem: NFS Recycling error: Pod was active on the node longer than specified deadline, while no pod is using the PV. Version-Release number of selected component (if applicable): openshift v3.2.0.5 kubernetes v1.2.0-36-g4a3f9c5 etcd 2.2.5 How reproducible: Always Steps to Reproduce: 1. Create an NFS PV with reclaim policy to recycle 2. Create an PVC 3. Check PV, PVC 4. Delete PVC 5. Check PV status Actual results: In step 5, PV stays in release status for a long time(1 hour +), then become failed $ oc describe pv nfs --config=/home/lxia/tmp/admin.kubeconfig Name: nfs Labels: <none> Status: Failed Claim: lxiap/nfsc Reclaim Policy: Recycle Access Modes: RWO Capacity: 5Gi Message: Recycling error: Pod was active on the node longer than specified deadline Source: Type: NFS (an NFS mount that lasts the lifetime of a pod) Server: 10.66.79.133 Path: /nfsshare ReadOnly: false Expected results: PV should become Available status. Additional info:
Liang, can you please post logs from openshift-master process and from openshift-node where the pod was scheduled, both with log level 5? (openshift start --loglevel 5 ... &>log). Also, is the NFS server working? Can you start a pod with the PVC and check that it can really access the NFS volume at step 3? I have no idea what could be wrong here.
I'm not able to reproduce the bug today. But I would like to give it another try tomorrow. I will provides the logs requested in #comment 1 if reproduced. Or else will close it as not a bug.
I'm not able to reproduce it today, again. This can be closed as not a bug now.
Finally, reproduced. I0325 08:18:19.201993 16249 kubelet.go:2439] SyncLoop (SYNC): 1 pods; pv-recycler-nfs-rc9xz_openshift-infra(ea71be7d-f261-11e5-9fc4-54ee75528544) I0325 08:18:19.202033 16249 kubelet.go:3254] Generating status for "pv-recycler-nfs-rc9xz_openshift-infra(ea71be7d-f261-11e5-9fc4-54ee75528544)" I0325 08:18:19.202058 16249 kubelet.go:3221] pod waiting > 0, pending I0325 08:18:19.202130 16249 manager.go:277] Ignoring same status for pod "pv-recycler-nfs-rc9xz_openshift-infra(ea71be7d-f261-11e5-9fc4-54ee75528544)", status: {Phase:Pending Conditions:[{Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2016-03-25 08:16:46 +0000 UTC Reason:ContainersNotReady Message:containers with unready status: [pv-recycler]}] Message: Reason: HostIP:10.66.136.78 PodIP: StartTime:2016-03-25 08:16:46 +0000 UTC ContainerStatuses:[{Name:pv-recycler State:{Waiting:0xc826f08600 Running:<nil> Terminated:<nil>} LastTerminationState:{Waiting:<nil> Running:<nil> Terminated:<nil>} Ready:false RestartCount:0 Image:openshift/origin-recycler:v1.1.4 ImageID: ContainerID:}]} I0325 08:18:19.202274 16249 nsenter_mount.go:174] findmnt command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/findmnt -o target --noheadings --target /var/lib/origin/openshift.local.volumes/pods/ea71be7d-f261-11e5-9fc4-54ee75528544/volumes/kubernetes.io~nfs/vol] I0325 08:18:19.217954 16249 nsenter_mount.go:179] Failed findmnt command: exit status 1 I0325 08:18:19.217986 16249 nfs.go:167] NFS mount set up: /var/lib/origin/openshift.local.volumes/pods/ea71be7d-f261-11e5-9fc4-54ee75528544/volumes/kubernetes.io~nfs/vol false <nil> I0325 08:18:19.218112 16249 nsenter_mount.go:114] nsenter Mounting 10.66.79.133:/nfsshare /var/lib/origin/openshift.local.volumes/pods/ea71be7d-f261-11e5-9fc4-54ee75528544/volumes/kubernetes.io~nfs/vol nfs [] I0325 08:18:19.218126 16249 nsenter_mount.go:117] Mount command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/mount -t nfs 10.66.79.133:/nfsshare /var/lib/origin/openshift.local.volumes/pods/ea71be7d-f261-11e5-9fc4-54ee75528544/volumes/kubernetes.io~nfs/vol] I0325 08:18:19.239073 16249 nsenter_mount.go:121] Output from mount command: mount.nfs: mount point /var/lib/origin/openshift.local.volumes/pods/ea71be7d-f261-11e5-9fc4-54ee75528544/volumes/kubernetes.io~nfs/vol does not exist I0325 08:18:19.239108 16249 nsenter_mount.go:174] findmnt command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/findmnt -o target --noheadings --target /var/lib/origin/openshift.local.volumes/pods/ea71be7d-f261-11e5-9fc4-54ee75528544/volumes/kubernetes.io~nfs/vol] I0325 08:18:19.253980 16249 nsenter_mount.go:179] Failed findmnt command: exit status 1 E0325 08:18:19.254087 16249 kubelet.go:1796] Unable to mount volumes for pod "pv-recycler-nfs-rc9xz_openshift-infra(ea71be7d-f261-11e5-9fc4-54ee75528544)": exit status 32; skipping pod E0325 08:18:19.254099 16249 pod_workers.go:138] Error syncing pod ea71be7d-f261-11e5-9fc4-54ee75528544, skipping: exit status 32 I0325 08:18:19.254141 16249 server.go:606] Event(api.ObjectReference{Kind:"Pod", Namespace:"openshift-infra", Name:"pv-recycler-nfs-rc9xz", UID:"ea71be7d-f261-11e5-9fc4-54ee75528544", APIVersion:"v1", ResourceVersion:"613", FieldPath:""}): type: 'Warning' reason: 'FailedSync' Error syncing pod, skipping: exit status 32 I0325 08:18:19.254183 16249 server.go:606] Event(api.ObjectReference{Kind:"Pod", Namespace:"openshift-infra", Name:"pv-recycler-nfs-rc9xz", UID:"ea71be7d-f261-11e5-9fc4-54ee75528544", APIVersion:"v1", ResourceVersion:"613", FieldPath:""}): type: 'Warning' reason: 'FailedMount' Unable to mount volumes for pod "pv-recycler-nfs-rc9xz_openshift-infra(ea71be7d-f261-11e5-9fc4-54ee75528544)": exit status 32 I0325 08:18:19.254421 16249 request.go:541] Request Body: {"count":8,"lastTimestamp":"2016-03-25T08:18:19Z"} I0325 08:18:19.254469 16249 round_trippers.go:264] PATCH https://10.66.136.78:8443/api/v1/namespaces/openshift-infra/events/pv-recycler-nfs-rc9xz.143f082ff9c43822 I0325 08:18:19.254475 16249 round_trippers.go:271] Request Headers: I0325 08:18:19.254478 16249 round_trippers.go:274] Accept: application/json, */* I0325 08:18:19.254481 16249 round_trippers.go:274] Content-Type: application/strategic-merge-patch+json I0325 08:18:19.254484 16249 round_trippers.go:274] User-Agent: openshift/v1.2.0 (linux/amd64) kubernetes/1b00abe I0325 08:18:19.257292 16249 round_trippers.go:289] Response Status: 200 OK in 2 milliseconds I0325 08:18:19.257301 16249 round_trippers.go:292] Response Headers: I0325 08:18:19.257304 16249 round_trippers.go:295] Content-Length: 808 I0325 08:18:19.257309 16249 round_trippers.go:295] Cache-Control: no-store I0325 08:18:19.257315 16249 round_trippers.go:295] Content-Type: application/json I0325 08:18:19.257322 16249 round_trippers.go:295] Date: Fri, 25 Mar 2016 08:18:19 GMT I0325 08:18:19.257356 16249 request.go:870] Response Body: {"kind":"Event","apiVersion":"v1","metadata":{"name":"pv-recycler-nfs-rc9xz.143f082ff9c43822","namespace":"openshift-infra","selfLink":"/api/v1/namespaces/openshift-infra/events/pv-recycler-nfs-rc9xz.143f082ff9c43822","uid":"ea7bb16e-f261-11e5-9fc4-54ee75528544","resourceVersion":"639","creationTimestamp":"2016-03-25T08:16:46Z","deletionTimestamp":"2016-03-25T10:18:19Z"},"involvedObject":{"kind":"Pod","namespace":"openshift-infra","name":"pv-recycler-nfs-rc9xz","uid":"ea71be7d-f261-11e5-9fc4-54ee75528544","apiVersion":"v1","resourceVersion":"613"},"reason":"FailedSync","message":"Error syncing pod, skipping: exit status 32","source":{"component":"kubelet","host":"dhcp-136-78.nay.redhat.com"},"firstTimestamp":"2016-03-25T08:16:46Z","lastTimestamp":"2016-03-25T08:18:19Z","count":8,"type":"Warning"} I0325 08:18:19.257740 16249 request.go:541] Request Body: {"count":8,"lastTimestamp":"2016-03-25T08:18:19Z"} I0325 08:18:19.257773 16249 round_trippers.go:264] PATCH https://10.66.136.78:8443/api/v1/namespaces/openshift-infra/events/pv-recycler-nfs-rc9xz.143f082ff9c382ca I0325 08:18:19.257778 16249 round_trippers.go:271] Request Headers: I0325 08:18:19.257782 16249 round_trippers.go:274] User-Agent: openshift/v1.2.0 (linux/amd64) kubernetes/1b00abe I0325 08:18:19.257786 16249 round_trippers.go:274] Accept: application/json, */* I0325 08:18:19.257789 16249 round_trippers.go:274] Content-Type: application/strategic-merge-patch+json I0325 08:18:19.276684 16249 round_trippers.go:289] Response Status: 200 OK in 18 milliseconds I0325 08:18:19.276699 16249 round_trippers.go:292] Response Headers: I0325 08:18:19.276703 16249 round_trippers.go:295] Date: Fri, 25 Mar 2016 08:18:19 GMT I0325 08:18:19.276706 16249 round_trippers.go:295] Content-Length: 893 I0325 08:18:19.276710 16249 round_trippers.go:295] Cache-Control: no-store I0325 08:18:19.276712 16249 round_trippers.go:295] Content-Type: application/json I0325 08:18:19.276749 16249 request.go:870] Response Body: {"kind":"Event","apiVersion":"v1","metadata":{"name":"pv-recycler-nfs-rc9xz.143f082ff9c382ca","namespace":"openshift-infra","selfLink":"/api/v1/namespaces/openshift-infra/events/pv-recycler-nfs-rc9xz.143f082ff9c382ca","uid":"ea7c02a1-f261-11e5-9fc4-54ee75528544","resourceVersion":"640","creationTimestamp":"2016-03-25T08:16:46Z","deletionTimestamp":"2016-03-25T10:18:19Z"},"involvedObject":{"kind":"Pod","namespace":"openshift-infra","name":"pv-recycler-nfs-rc9xz","uid":"ea71be7d-f261-11e5-9fc4-54ee75528544","apiVersion":"v1","resourceVersion":"613"},"reason":"FailedMount","message":"Unable to mount volumes for pod \"pv-recycler-nfs-rc9xz_openshift-infra(ea71be7d-f261-11e5-9fc4-54ee75528544)\": exit status 32","source":{"component":"kubelet","host":"dhcp-136-78.nay.redhat.com"},"firstTimestamp":"2016-03-25T08:16:46Z","lastTimestamp":"2016-03-25T08:18:19Z","count":8,"type":"Warning"} I0325 09:23:53.061337 16249 persistentvolume_recycler_controller.go:158] Recycler: checking PersistentVolume[nfs] I0325 09:23:53.061380 16249 round_trippers.go:264] GET https://10.66.136.78:8443/api/v1/persistentvolumes/nfs I0325 09:23:53.061383 16249 round_trippers.go:271] Request Headers: I0325 09:23:53.061386 16249 round_trippers.go:274] Accept: application/json, */* I0325 09:23:53.061389 16249 round_trippers.go:274] User-Agent: system:serviceaccount:openshift-infra:pv-recycler-controller I0325 09:23:53.061392 16249 round_trippers.go:274] Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3B hY2UiOiJvcGVuc2hpZnQtaW5mcmEiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoicHYtcmVjeWNsZXItY29udHJvbGxlci10b2tlbi1ieXFyMyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJwdi 1yZWN5Y2xlci1jb250cm9sbGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiYmMwOTY5ZjMtZjI1ZS0xMWU1LTlmYzQtNTRlZTc1NTI4NTQ0Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1pbmZyYTpwd i1yZWN5Y2xlci1jb250cm9sbGVyIn0.UwYQ2sy9Wkcl2pmfMyZM5eGOgRoI3-tK9oJItvDxEFGYT4s3Wy2EeAPKTGhrgsDKKlqJchiBC5wte5J539x36iNyA4kBow__RXO_6m5Gu8tKaM4SwMAcCud8wXjsi25WUq75zNOL0HPH0Tm_-_YKFkUInxvtwntFx14KrLvpJXIDIx6RtVVt EZEHOCIB_EOmx89_BCwGuoHbe1l4C9HrXuFcnADKFHGYxj0-gN_RCz2ahhJEMjQOdwucS2GRwxG5IGXAXU2tMnf3Jl9Lw2ZcHc9H-KiK49KOX7UKM7XdltWgkbSkFs7JndsldgpzMAB1ku4jXVA8MIDFpbYvErQtHw I0325 09:23:53.062188 16249 round_trippers.go:289] Response Status: 200 OK in 1 milliseconds I0325 09:23:53.062196 16249 round_trippers.go:292] Response Headers: I0325 09:23:53.062198 16249 round_trippers.go:295] Cache-Control: no-store I0325 09:23:53.062201 16249 round_trippers.go:295] Date: Fri, 25 Mar 2016 09:23:53 GMT I0325 09:23:53.062248 16249 round_trippers.go:289] Response Status: 200 OK in 0 milliseconds I0325 09:23:53.062252 16249 round_trippers.go:292] Response Headers: I0325 09:23:53.062255 16249 round_trippers.go:295] Cache-Control: no-store I0325 09:23:53.062257 16249 round_trippers.go:295] Date: Fri, 25 Mar 2016 09:23:53 GMT I0325 09:23:53.065250 16249 round_trippers.go:264] GET https://10.66.136.78:8443/api/v1/persistentvolumes?resourceVersion=0 I0325 09:23:53.065258 16249 round_trippers.go:271] Request Headers: I0325 09:23:53.065263 16249 round_trippers.go:274] Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtaW5mcmEiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoicHYtYmluZGVyLWNvbnRyb2xsZXItdG9rZW4tdTlueTgiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHYtYmluZGVyLWNvbnRyb2xsZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJiYmYwZjBjMy1mMjVlLTExZTUtOWZjNC01NGVlNzU1Mjg1NDQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6b3BlbnNoaWZ0LWluZnJhOnB2LWJpbmRlci1jb250cm9sbGVyIn0.dEzCT2sK_9g6NqxlK3ZghBlgSWHJnNHQFLP_BqYxndHD2rAikEagwuEX240OKBLdDS7ASyFQR8Px1gbmFhTWcRH8b7SrHkkXq9aLd9B9fNiNqdpK4eYL462_G_VqjHRpAxYxW82QTt8EBsBlfsrE_aOQLJFa9fwV92GLw9fKTe43cRBG9hAcPBWpUaUhkdgfKM20PPH9bd5d6dGksZwzVjnL030ch4BbMKImDJvWRNa6ho8wXOcbXuUoQheA5wRmEuCWQWrmQwMedotZLxg2t_yFgkY9SUM89pIJ1aYlPnhllbfK4u8dStzRA-Zo4hZawPnOOwj-UPlCt2Z-QLJCjw I0325 09:23:53.065268 16249 round_trippers.go:274] Accept: application/json, */* I0325 09:23:53.065273 16249 round_trippers.go:274] User-Agent: system:serviceaccount:openshift-infra:pv-binder-controller I0325 09:23:53.069782 16249 round_trippers.go:289] Response Status: 200 OK in 8 milliseconds I0325 09:23:53.069791 16249 round_trippers.go:292] Response Headers: I0325 09:23:53.069794 16249 round_trippers.go:295] Date: Fri, 25 Mar 2016 09:23:53 GMT I0325 09:23:53.069797 16249 round_trippers.go:295] Content-Length: 673 I0325 09:23:53.069799 16249 round_trippers.go:295] Cache-Control: no-store I0325 09:23:53.069801 16249 round_trippers.go:295] Content-Type: application/json I0325 09:23:53.069822 16249 request.go:870] Response Body: {"kind":"PersistentVolume","apiVersion":"v1","metadata":{"name":"nfs","selfLink":"/api/v1/persistentvolumes/nfs","uid":"d8561af3-f25e-11e5-9fc4-54ee75528544","resourceVersion":"1206","creationTimestamp":"2016-03-25T07:54:47Z"},"spec":{"capacity":{"storage":"5Gi"},"nfs":{"server":"10.66.79.133","path":"/nfsshare"},"accessModes":["ReadWriteOnce"],"claimRef":{"kind":"PersistentVolumeClaim","namespace":"default","name":"nfsc","uid":"dc7aa9f0-f25e-11e5-9fc4-54ee75528544","apiVersion":"v1","resourceVersion":"269"},"persistentVolumeReclaimPolicy":"Recycle"},"status":{"phase":"Failed","message":"Recycling error: Pod was active on the node longer than specified deadline"}} I0325 09:23:53.069843 16249 round_trippers.go:289] Response Status: 200 OK in 4 milliseconds I0325 09:23:53.069849 16249 round_trippers.go:292] Response Headers: I0325 09:23:53.069853 16249 round_trippers.go:295] Cache-Control: no-store I0325 09:23:53.069856 16249 round_trippers.go:295] Content-Type: application/json I0325 09:23:53.069859 16249 round_trippers.go:295] Date: Fri, 25 Mar 2016 09:23:53 GMT I0325 09:23:53.069861 16249 round_trippers.go:295] Content-Length: 766 I0325 09:23:53.069885 16249 request.go:870] Response Body: {"kind":"PersistentVolumeList","apiVersion":"v1","metadata":{"selfLink":"/api/v1/persistentvolumes","resourceVersion":"1238"},"items":[{"metadata":{"n ame":"nfs","selfLink":"/api/v1/persistentvolumes/nfs","uid":"d8561af3-f25e-11e5-9fc4-54ee75528544","resourceVersion":"1206","creationTimestamp":"2016-03-25T07:54:47Z"},"spec":{"capacity":{"storage":"5Gi"},"nfs": {"server":"10.66.79.133","path":"/nfsshare"},"accessModes":["ReadWriteOnce"],"claimRef":{"kind":"PersistentVolumeClaim","namespace":"default","name":"nfsc","uid":"dc7aa9f0-f25e-11e5-9fc4-54ee75528544","apiVersio n":"v1","resourceVersion":"269"},"persistentVolumeReclaimPolicy":"Recycle"},"status":{"phase":"Failed","message":"Recycling error: Pod was active on the node longer than specified deadline"}}]} I0325 09:23:53.070022 16249 persistentvolume_recycler_controller.go:189] PersistentVolume[nfs] phase Failed - skipping: The volume is not in 'Released' phase I0325 09:23:53.070104 16249 round_trippers.go:264] GET https://10.66.136.78:8443/api/v1/watch/persistentvolumes?resourceVersion=1238&timeoutSeconds=596 I0325 09:23:53.070110 16249 round_trippers.go:271] Request Headers: I0325 09:23:53.070112 16249 round_trippers.go:274] User-Agent: system:serviceaccount:openshift-infra:pv-binder-controller I0325 09:23:53.070115 16249 round_trippers.go:274] Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3B hY2UiOiJvcGVuc2hpZnQtaW5mcmEiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoicHYtYmluZGVyLWNvbnRyb2xsZXItdG9rZW4tdTlueTgiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHYtYm luZGVyLWNvbnRyb2xsZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJiYmYwZjBjMy1mMjVlLTExZTUtOWZjNC01NGVlNzU1Mjg1NDQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6b3BlbnNoaWZ0LWluZnJhOnB2LWJpb mRlci1jb250cm9sbGVyIn0.dEzCT2sK_9g6NqxlK3ZghBlgSWHJnNHQFLP_BqYxndHD2rAikEagwuEX240OKBLdDS7ASyFQR8Px1gbmFhTWcRH8b7SrHkkXq9aLd9B9fNiNqdpK4eYL462_G_VqjHRpAxYxW82QTt8EBsBlfsrE_aOQLJFa9fwV92GLw9fKTe43cRBG9hAcPBWpUaUh kdgfKM20PPH9bd5d6dGksZwzVjnL030ch4BbMKImDJvWRNa6ho8wXOcbXuUoQheA5wRmEuCWQWrmQwMedotZLxg2t_yFgkY9SUM89pIJ1aYlPnhllbfK4u8dStzRA-Zo4hZawPnOOwj-UPlCt2Z-QLJCjw I0325 09:23:53.070168 16249 persistentvolume_claim_binder_controller.go:194] Synchronizing PersistentVolume[nfs], current phase: Failed I0325 09:23:53.070223 16249 round_trippers.go:264] GET https://10.66.136.78:8443/api/v1/persistentvolumes/nfs I0325 09:23:53.070238 16249 round_trippers.go:271] Request Headers: I0325 09:23:53.070247 16249 round_trippers.go:274] User-Agent: system:serviceaccount:openshift-infra:pv-binder-controller I0325 09:23:53.070252 16249 round_trippers.go:274] Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3B hY2UiOiJvcGVuc2hpZnQtaW5mcmEiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoicHYtYmluZGVyLWNvbnRyb2xsZXItdG9rZW4tdTlueTgiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHYtYmluZGVyLWNvbnRyb2xsZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJiYmYwZjBjMy1mMjVlLTExZTUtOWZjNC01NGVlNzU1Mjg1NDQiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6b3BlbnNoaWZ0LWluZnJhOnB2LWJpbmRlci1jb250cm9sbGVyIn0.dEzCT2sK_9g6NqxlK3ZghBlgSWHJnNHQFLP_BqYxndHD2rAikEagwuEX240OKBLdDS7ASyFQR8Px1gbmFhTWcRH8b7SrHkkXq9aLd9B9fNiNqdpK4eYL462_G_VqjHRpAxYxW82QTt8EBsBlfsrE_aOQLJFa9fwV92GLw9fKTe43cRBG9hAcPBWpUaUhkdgfKM20PPH9bd5d6dGksZwzVjnL030ch4BbMKImDJvWRNa6ho8wXOcbXuUoQheA5wRmEuCWQWrmQwMedotZLxg2t_yFgkY9SUM89pIJ1aYlPnhllbfK4u8dStzRA-Zo4hZawPnOOwj-UPlCt2Z-QLJCjw I0325 09:23:53.070258 16249 round_trippers.go:274] Accept: application/json, */* I0325 09:23:53.071194 16249 round_trippers.go:289] Response Status: 200 OK in 1 milliseconds I0325 09:23:53.071202 16249 round_trippers.go:292] Response Headers: I0325 09:23:53.071207 16249 round_trippers.go:295] Cache-Control: no-store I0325 09:23:53.071213 16249 round_trippers.go:295] Date: Fri, 25 Mar 2016 09:23:53 GMT I0325 09:23:53.074072 16249 round_trippers.go:289] Response Status: 200 OK in 3 milliseconds I0325 09:23:53.074082 16249 round_trippers.go:292] Response Headers: I0325 09:23:53.074087 16249 round_trippers.go:295] Cache-Control: no-store I0325 09:23:53.074091 16249 round_trippers.go:295] Content-Type: application/json I0325 09:23:53.074095 16249 round_trippers.go:295] Date: Fri, 25 Mar 2016 09:23:53 GMT I0325 09:23:53.074100 16249 round_trippers.go:295] Content-Length: 673 I0325 09:23:53.074127 16249 request.go:870] Response Body: {"kind":"PersistentVolume","apiVersion":"v1","metadata":{"name":"nfs","selfLink":"/api/v1/persistentvolumes/nfs","uid":"d8561af3-f25e-11e5-9fc4-54ee75528544","resourceVersion":"1206","creationTimestamp":"2016-03-25T07:54:47Z"},"spec":{"capacity":{"storage":"5Gi"},"nfs":{"server":"10.66.79.133","path":"/nfsshare"},"accessModes":["ReadWriteOnce"],"claimRef":{"kind":"PersistentVolumeClaim","namespace":"default","name":"nfsc","uid":"dc7aa9f0-f25e-11e5-9fc4-54ee75528544","apiVersion":"v1","resourceVersion":"269"},"persistentVolumeReclaimPolicy":"Recycle"},"status":{"phase":"Failed","message":"Recycling error: Pod was active on the node longer than specified deadline"}} I0325 09:23:53.074381 16249 persistentvolume_claim_binder_controller.go:334] PersistentVolume[nfs] previously failed recycling. Skipping.
> Pod was active on the node longer than specified deadline Looking at the code now, it looks like the recycler pod was running rm -rf on the volume for longer time than expected. Recycler calculates the timeout based on controller-manager parameters --pv-recycler-increment-timeout-nfs and --pv-recycler-minimum-timeout-nfs, meaning that we wait for <pv-recycler-increment-timeout-nfs> * <volume size in GiB> seconds. This is by default 30 * <volume size in GiB> seconds, 300 second at minimum. The recycler is doing the right thing giving up after this timeout - we can't wait forever, the message is just not clear enough. How many files are on the volume? What it its size? How long does it take if you mount the volume and do simple rm -rf /mnt/volume/*? What I can do is to rephrase the message and that would be it.
We don't have too much files, and the file size is really small. [root@NAY-NFS ~]# ls -lhR /nfsshare/ /nfsshare/: total 0 -rw-r--r--. 1 1001100000 nfsnobody 0 Mar 25 14:15 file -rw-r--r--. 1 1000050000 nfsnobody 0 Mar 30 11:21 file-rox [root@NAY-NFS ~]# ls -lhd /nfsshare/ drwxrwx---. 2 nfsnobody nfsnobody 32 Mar 30 11:21 /nfsshare/
Liang, could you provide me access to a machine where you can reproduce this? There must be something wrong in NFS setup there - it can't take 500 seconds to delete two files. Also, it might be related to 1313210 (containerized OpenShift, it's not Cinder-specific).
A simpler way to reproduce: 1. docker pull openshift/origin 2. docker run -d --name "origin" --privileged --pid=host --net=host -v /:/rootfs:ro -v /var/run:/var/run:rw -v /sys:/sys -v /var/lib/docker:/var/lib/docker:rw -v /var/lib/origin/openshift.local.volumes:/var/lib/origin/openshift.local.volumes openshift/origin start --loglevel=5 3. docker exec -it origin bash 4. create a PV with valid NFS server IP and invalid PATH. (e.g, https://github.com/openshift-qe/v3-testfiles/blob/master/persistent-volumes/nfs/pv-invalid-path.json , replace #NFS-Service-IP with valid IP) 5. create PVC. (e.g, https://github.com/openshift-qe/v3-testfiles/blob/master/persistent-volumes/nfs/claim-rwx.json) 6. check the PV/PVC are bound. 7. delete the PVC. 8. wait, and check the PV. # oc get pv ; oc describe pv nfs NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE nfs 1Gi RWX Failed default/nfsc 1h Name: nfs Labels: <none> Status: Failed Claim: default/nfsc Reclaim Policy: Recycle Access Modes: RWX Capacity: 1Gi Message: Recycling error: Pod was active on the node longer than specified deadline Source: Type: NFS (an NFS mount that lasts the lifetime of a pod) Server: 10.66.79.133 Path: /non-exist-path ReadOnly: false I will provide a environment if above steps does not reproduce the issue.
1) This fails with: persistentvolume_recycler_controller.go:245] PersistentVolume[nfs] failed recycling: Unexpected error creating recycler pod: pods "pv-recycler-nfs-" is forbidden: unable to validate against any security context constraint: [securityContext.runAsUser: Invalid value: 0: UID on container pv-recycler does not match required range. Found 0, required min: 1000020000 max: 1000029999 2) That's not what you reported in comment #6 - you say that the recycler can't remove 2 files in 5 minutes. Not that it times out when mounting an unavailable export!
The failure Jan pointed out is due to bug 1324418, which is fixed by https://github.com/openshift/origin/pull/8432 -- I suggest recreating after that PR is merged.
With PR 8432 and PersistentVolume.Spec pointing to an invalid NFS share, we schedule a recycler pod that is assigned to a kubelet. Kubelet tries to mount the NFS share every now and then and after 5 minutes it gives up with "Pod was active on the node longer than specified deadline". IMO, this is a correct behavior. We can't try forever. On the other hand, the message in `oc describe pv` should be clearer. Paul, can we get more descriptive error from the terminated pod? Like "Could not mount XXX for YYY seconds"?
Anyway, this is not a blocker, everything works as expected, just the failure message to users is not clear.
Started working on this, the easiest and IMO best thing is to pass all Warning events from a recycler pod to the PV that is being recycled. When the pod dies the events will be still available in the PV. `kubectl describe pv` would show: Name: nfs Labels: <none> Status: Failed Claim: default/nfs Reclaim Policy: Recycle Access Modes: RWX Capacity: 1Mi Message: Recycling error: Pod was active on the node longer than specified deadline Source: Type: NFS (an NFS mount that lasts the lifetime of a pod) Server: 10.999.999.999 Path: / ReadOnly: false Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1m 1m 3 {default-scheduler } Warning FailedScheduling Recycling: no nodes available to schedule pods 1m 1m 1 {default-scheduler } Normal Scheduled Recycling: Successfully assigned recycler-for-nfs to 127.0.0.1 1m 12s 5 {kubelet 127.0.0.1} Warning FailedMount Recycling: Unable to mount volumes for pod "recycler-for-nfs_default(e061c5cb-32fa-11e6-80e2-3c970e965218)": Mount failed: exit status 32 Mounting arguments: 10.999.999.999:/ /var/lib/kubelet/pods/e061c5cb-32fa-11e6-80e2-3c970e965218/volumes/kubernetes.io~nfs/vol nfs [] Output: mount.nfs: Failed to resolve server 10.999.999.999: No address associated with hostname
You can still see quite cryptic "Pod was active on the node longer than specified deadline" in Message field, events should help to diagnose the problem further. Also, these events could be sent to PV as they appear in the pod, even before the PV gets "Failed".
Filled https://github.com/kubernetes/kubernetes/pull/27714
It has been merged in https://github.com/openshift/origin/pull/11258
Oops, sorry, this is the right one: https://github.com/openshift/origin/pull/11259
Verified the issue has been fixed. # openshift version openshift v3.4.0.28+dfe3a66 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 # oc describe pv nfs-sh3oy Name: nfs-sh3oy Labels: <none> StorageClass: Status: Released Claim: sh3oy/nfsc-sh3oy Reclaim Policy: Recycle Access Modes: RWX Capacity: 5Gi Message: Source: Type: NFS (an NFS mount that lasts the lifetime of a pod) Server: 172.30.3.151 Path: /non-exist-path ReadOnly: false Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 2m 2m 1 {persistentvolume-controller } Normal RecyclerPod Recycler pod: Successfully assigned recycler-for-nfs-sh3oy to qe-lxia-ocp34-node-registry-router-1 17s 17s 1 {persistentvolume-controller } Warning RecyclerPod Recycler pod: Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs-sh3oy"/"openshift-infra". list of unattached/unmounted volumes=[vol] 16s 16s 1 {persistentvolume-controller } Warning RecyclerPod Recycler pod: Unable to mount volumes for pod "recycler-for-nfs-sh3oy_openshift-infra(8f30b0d4-afce-11e6-9347-42010af00019)": timeout expired waiting for volumes to attach/mount for pod "recycler-for-nfs-sh3oy"/"openshift-infra". list of unattached/unmounted volumes=[vol] 16s 16s 1 {persistentvolume-controller } Warning RecyclerPod Recycler pod: MountVolume.SetUp failed for volume "kubernetes.io/nfs/8f30b0d4-afce-11e6-9347-42010af00019-vol" (spec.Name: "vol") pod "8f30b0d4-afce-11e6-9347-42010af00019" (UID: "8f30b0d4-afce-11e6-9347-42010af00019") with: mount failed: exit status 32 Mounting arguments: 172.30.3.151:/non-exist-path /var/lib/origin/openshift.local.volumes/pods/8f30b0d4-afce-11e6-9347-42010af00019/volumes/kubernetes.io~nfs/vol nfs [] Output: mount.nfs: Connection timed out
This has been merged into ocp and is in OCP v3.4.0.29 or newer.
Move to verified based on #comment 20
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066