Description of problem: Many pods after deleting will stick infinitely in Terminating status, due to inability to unmount volumes (UnmountVolume.TearDown failed) which are being used by configmap as subpath volumes. Error is as below: Error: "error cleaning subPath mounts for volume \"example-config\" (UniqueName: \"kubernetes.io/configmap/a87e947c-a51e-11e8-b942-00505688494d-application-config\") pod \"a87e947c-a51e-11e8-b942-00505688494d\" (UID: \"a87e947c-a51e-11e8-b942-00505688494d\") : error deleting /var/lib/origin/openshift.local.volumes/pods/a87e947c-a51e-11e8-b942-00505688494d/volume-subpaths/example-config/test/1: remove /var/lib/origin/openshift.local.volumes/pods/a87e947c-a51e-11e8-b942-00505688494d/volume-subpaths/example-config/test/1: device or resource busy" Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Pods failed to remove. When the pods are deleted with: --force --grace-period=0. The config-maps are still in the system and It cannot be deleted because they are mounted. Expected results: Pod gets removed and coonfigmap unmouted Additional info: https://github.com/kubernetes/kubernetes/issues/65879 https://github.com/kubernetes/kubernetes/issues/65110
Checked on below version, with the steps mentioned in #comment 7, the issue does not occur now. Move bug to verified. # oc version oc v3.9.59 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://preserve-qe-lxia-39-master-etcd-1:8443 openshift v3.9.59 kubernetes v1.9.1+a0ce1bc657 # oc describe pod testpod Name: testpod Namespace: test Node: preserve-qe-lxia-39-nrr-1/172.16.122.17 Start Time: Wed, 19 Dec 2018 03:02:52 -0500 Labels: name=test Annotations: openshift.io/scc=anyuid Status: Running IP: 10.129.0.93 Containers: busybox: Container ID: docker://dc1696e73c1cd63e89e28411b393734f5090244ccce22062cecd5c51ab0b5b2d Image: gcr.io/google_containers/busybox Image ID: docker-pullable://gcr.io/google_containers/busybox@sha256:d8d3bc2c183ed2f9f10e7258f84971202325ee6011ba137112e01e30f206de67 Port: <none> Command: /bin/sleep 10000 State: Running Started: Wed, 19 Dec 2018 03:02:55 -0500 Ready: True Restart Count: 0 Environment: <none> Mounts: /mnt/test from vol (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-92jcj (ro) Conditions: Type Status Initialized True Ready True PodScheduled True Volumes: vol: Type: ConfigMap (a volume populated by a ConfigMap) Name: my-config Optional: false default-token-92jcj: Type: Secret (a volume populated by a Secret) SecretName: default-token-92jcj Optional: false QoS Class: BestEffort Node-Selectors: node-role.kubernetes.io/compute=true Tolerations: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 35s default-scheduler Successfully assigned testpod to preserve-qe-lxia-39-nrr-1 Normal SuccessfulMountVolume 35s kubelet, preserve-qe-lxia-39-nrr-1 MountVolume.SetUp succeeded for volume "vol" Normal SuccessfulMountVolume 35s kubelet, preserve-qe-lxia-39-nrr-1 MountVolume.SetUp succeeded for volume "default-token-92jcj" Normal Pulling 33s kubelet, preserve-qe-lxia-39-nrr-1 pulling image "gcr.io/google_containers/busybox" Normal Pulled 32s kubelet, preserve-qe-lxia-39-nrr-1 Successfully pulled image "gcr.io/google_containers/busybox" Normal Created 32s kubelet, preserve-qe-lxia-39-nrr-1 Created container Normal Started 32s kubelet, preserve-qe-lxia-39-nrr-1 Started container # oc delete pod testpod pod "testpod" deleted # oc get pods No resources found.
Also, QE did a regression testing for subpath, no issue found. # realpath /var/lib/origin /var/lib/origin_via_link # ls -ld /var/lib/origin lrwxrwxrwx. 1 root root 24 Dec 19 02:56 /var/lib/origin -> /var/lib/origin_via_link # cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.6 (Maipo) # uname -a Linux preserve-qe-lxia-39-master-etcd-1 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 15 17:36:42 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0028