Description of problem: Related issue: https://bugzilla.redhat.com/show_bug.cgi?id=1277101 Uninstall logging stacks on OCP 3.6.0, the fluentd pod stayed in Terminating status. Deleting the namespace can't fix it: logging logging-fluentd-hthjp 0/1 Terminating 0 1h logs: atomic-openshift-node[19296]: I0509 07:50:34.147561 19368 reconciler.go:190] UnmountVolume operation started for volume "kubernetes.io/secret/813cefa3-3483-11e7-b539-fa163ee9df62-certs" (spec.Name: "certs") from pod "813cefa3-3483-11e7-b539-fa163ee9df62" (UID: "813cefa3-3483-11e7-b539-fa163ee9df62"). atomic-openshift-node[19296]: I0509 07:50:34.147691 19368 util.go:367] Tearing down volume certs for pod 813cefa3-3483-11e7-b539-fa163ee9df62 at /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/certs atomic-openshift-node[19296]: I0509 07:50:34.147745 19368 empty_dir_linux.go:38] Determining mount medium of /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/certs atomic-openshift-node[19296]: I0509 07:50:34.147765 19368 nsenter_mount.go:183] findmnt command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/findmnt -o target,fstype --noheadings --first-only --target /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/certs] atomic-openshift-node[19296]: I0509 07:50:34.153341 19368 nsenter_mount.go:196] IsLikelyNotMountPoint findmnt output for path /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/certs: /sys: atomic-openshift-node[19296]: I0509 07:50:34.153375 19368 nsenter_mount.go:202] IsLikelyNotMountPoint: /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/certs is not a mount point atomic-openshift-node[19296]: I0509 07:50:34.153393 19368 empty_dir_linux.go:48] Statfs_t of /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/certs: {Type:1481003842 Bsize:4096 Blocks:2618880 Bfree:2040566 Bavail:2040566 Files:5242880 Ffree:5219099 Fsid:{X__val:[64768 0]} Namelen:255 Frsize:4096 Flags:4128 Spare:[0 0 0 0]} atomic-openshift-node[19296]: E0509 07:50:34.153715 19368 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/813cefa3-3483-11e7-b539-fa163ee9df62-certs\" (\"813cefa3-3483-11e7-b539-fa163ee9df62\")" failed. No retries permitted until 2017-05-09 07:52:34.153682479 +0000 UTC (durationBeforeRetry 2m0s). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/secret/813cefa3-3483-11e7-b539-fa163ee9df62-certs" (volume.spec.Name: "certs") pod "813cefa3-3483-11e7-b539-fa163ee9df62" (UID: "813cefa3-3483-11e7-b539-fa163ee9df62") with: rename /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/certs /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/wrapped_certs.deleting~645411556: device or resource busy atomic-openshift-node[19296]: I0509 07:50:34.247947 19368 reconciler.go:190] UnmountVolume operation started for volume "kubernetes.io/secret/813cefa3-3483-11e7-b539-fa163ee9df62-aggregated-logging-fluentd-token-d4c7h" (spec.Name: "aggregated-logging-fluentd-token-d4c7h") from pod "813cefa3-3483-11e7-b539-fa163ee9df62" (UID: "813cefa3-3483-11e7-b539-fa163ee9df62"). atomic-openshift-node[19296]: I0509 07:50:34.248030 19368 util.go:367] Tearing down volume aggregated-logging-fluentd-token-d4c7h for pod 813cefa3-3483-11e7-b539-fa163ee9df62 at /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/aggregated-logging-fluentd-token-d4c7h atomic-openshift-node[19296]: I0509 07:50:34.248065 19368 empty_dir_linux.go:38] Determining mount medium of /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/aggregated-logging-fluentd-token-d4c7h atomic-openshift-node[19296]: I0509 07:50:34.248077 19368 nsenter_mount.go:183] findmnt command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/findmnt -o target,fstype --noheadings --first-only --target /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/aggregated-logging-fluentd-token-d4c7h] atomic-openshift-node[19296]: I0509 07:50:34.252525 19368 nsenter_mount.go:196] IsLikelyNotMountPoint findmnt output for path /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/aggregated-logging-fluentd-token-d4c7h: /sys: atomic-openshift-node[19296]: I0509 07:50:34.252547 19368 nsenter_mount.go:202] IsLikelyNotMountPoint: /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/aggregated-logging-fluentd-token-d4c7h is not a mount point atomic-openshift-node[19296]: I0509 07:50:34.252562 19368 empty_dir_linux.go:48] Statfs_t of /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/aggregated-logging-fluentd-token-d4c7h: {Type:1481003842 Bsize:4096 Blocks:2618880 Bfree:2040566 Bavail:2040566 Files:5242880 Ffree:5219098 Fsid:{X__val:[64768 0]} Namelen:255 Frsize:4096 Flags:4128 Spare:[0 0 0 0]} atomic-openshift-node[19296]: E0509 07:50:34.252728 19368 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/813cefa3-3483-11e7-b539-fa163ee9df62-aggregated-logging-fluentd-token-d4c7h\" (\"813cefa3-3483-11e7-b539-fa163ee9df62\")" failed. No retries permitted until 2017-05-09 07:52:34.252695141 +0000 UTC (durationBeforeRetry 2m0s). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/secret/813cefa3-3483-11e7-b539-fa163ee9df62-aggregated-logging-fluentd-token-d4c7h" (volume.spec.Name: "aggregated-logging-fluentd-token-d4c7h") pod "813cefa3-3483-11e7-b539-fa163ee9df62" (UID: "813cefa3-3483-11e7-b539-fa163ee9df62") with: rename /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/aggregated-logging-fluentd-token-d4c7h /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/wrapped_aggregated-logging-fluentd-token-d4c7h.deleting~053939955: device or resource busy atomic-openshift-node[19296]: I0509 07:50:35.083769 19368 kubelet_pods.go:1564] Orphaned pod "813cefa3-3483-11e7-b539-fa163ee9df62" found, but volumes not yet removed. Reducing cpu to minimum atomic-openshift-node[19296]: I0509 07:50:37.085321 19368 kubelet_pods.go:1564] Orphaned pod "813cefa3-3483-11e7-b539-fa163ee9df62" found, but volumes not yet removed. Reducing cpu to minimum atomic-openshift-node[19296]: I0509 07:50:39.085331 19368 kubelet_pods.go:1564] Orphaned pod "813cefa3-3483-11e7-b539-fa163ee9df62" found, but volumes not yet removed. Reducing cpu to minimum atomic-openshift-node[19296]: I0509 07:50:41.079437 19368 kubelet_pods.go:741] Pod "logging-fluentd-hthjp_logging(813cefa3-3483-11e7-b539-fa163ee9df62)" is terminated, but some volumes have not been cleaned up atomic-openshift-node[19296]: I0509 07:50:41.083960 19368 kubelet_pods.go:1564] Orphaned pod "813cefa3-3483-11e7-b539-fa163ee9df62" found, but volumes not yet removed. Reducing cpu to minimum atomic-openshift-node[19296]: I0509 07:50:43.085417 19368 kubelet_pods.go:1564] Orphaned pod "813cefa3-3483-11e7-b539-fa163ee9df62" found, but volumes not yet removed. Reducing cpu to minimum Version-Release number of selected component (if applicable): openshift v3.6.70 kubernetes v1.6.1+5115d708d7 etcd 3.1.0 How reproducible: Always Steps to Reproduce: 1.Deploy logging 3.6.0 stacks on openshift 3.6.0 master 2.Undeploy logging 3.6.0 stacks 3.Check the fluentd pod status Actual results: fluentd pod stayed in terminating status Expected results: fluentd pod should be deleted Additional info: Issue did not happen on OCP 3.5.0 Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
Is this something you can reproduce? (Or still have it sticking around?) Can you run https://github.com/rhvgoyal/misc/blob/master/find-busy-mnt.sh on the node where the pod was running? For this particular case I think you could run: find-busy-mnt.sh 813cefa3-3483-11e7-b539-fa163ee9df62 But if you reproduce with another pod the string would be different. Hopefully that will tell us what process is holding this mount point...
The root cause here seems to be a busy mountpoint: nestedpendingoperations.go:262] Operation for "\"kubernetes.io/secret/813cefa3-3483-11e7-b539-fa163ee9df62-aggregated-logging-fluentd-token-d4c7h\" (\"813cefa3-3483-11e7-b539-fa163ee9df62\")" failed. No retries permitted until 2017-05-09 07:52:34.252695141 +0000 UTC (durationBeforeRetry 2m0s). Error: UnmountVolume.TearDown failed for volume "kubernetes.io/secret/813cefa3-3483-11e7-b539-fa163ee9df62-aggregated-logging-fluentd-token-d4c7h" (volume.spec.Name: "aggregated-logging-fluentd-token-d4c7h") pod "813cefa3-3483-11e7-b539-fa163ee9df62" (UID: "813cefa3-3483-11e7-b539-fa163ee9df62") with: rename /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/aggregated-logging-fluentd-token-d4c7h /var/lib/origin/openshift.local.volumes/pods/813cefa3-3483-11e7-b539-fa163ee9df62/volumes/kubernetes.io~secret/wrapped_aggregated-logging-fluentd-token-d4c7h.deleting~053939955: device or resource busy notice: device or resource busy
I filed https://bugzilla.redhat.com/show_bug.cgi?id=1450426 as a request for the docker team to add logging to get us more data when this happens. For now, there is not much we can do here unless you can reproduce and/or collect the data requested in comment #3.
I'm going to close this as WONTFIX. We know about some busy mount issues. I can't immediately identify which this likely was, but if we reproduce please reopen or open a new bug. Thank you! I'm sorry we were so slow to try to collect more data.