1623053 – Pods stuck in Terminating status when using configmap mounted using subpath volume

Bug 1623053 - Pods stuck in Terminating status when using configmap mounted using subpath volume

Summary: Pods stuck in Terminating status when using configmap mounted using subpath v...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.9.z
Assignee:	Jan Safranek
QA Contact:	Liang Xia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1640077 1640078 1640079 1696207 1696591
TreeView+	depends on / blocked

Reported:	2018-08-28 12:19 UTC by Jaspreet Kaur
Modified:	2020-05-20 19:50 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1640077 1640078 1640079 1696207 (view as bug list)
Environment:
Last Closed:	2019-01-10 08:55:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0028	0	None	None	None	2019-01-10 08:55:28 UTC

Description Jaspreet Kaur 2018-08-28 12:19:38 UTC

Description of problem: Many pods after deleting will stick infinitely in Terminating status, due to inability to unmount volumes (UnmountVolume.TearDown failed) which are being used by configmap as subpath volumes.

Error is as below:

Error: "error cleaning subPath mounts for volume \"example-config\" (UniqueName: \"kubernetes.io/configmap/a87e947c-a51e-11e8-b942-00505688494d-application-config\") pod \"a87e947c-a51e-11e8-b942-00505688494d\" (UID: \"a87e947c-a51e-11e8-b942-00505688494d\") : error deleting /var/lib/origin/openshift.local.volumes/pods/a87e947c-a51e-11e8-b942-00505688494d/volume-subpaths/example-config/test/1: remove /var/lib/origin/openshift.local.volumes/pods/a87e947c-a51e-11e8-b942-00505688494d/volume-subpaths/example-config/test/1: device or resource busy"


Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results: Pods failed to remove.  When the pods are deleted with: --force --grace-period=0. The config-maps are still in the system and It cannot be deleted because they are mounted.

Expected results: Pod gets removed and coonfigmap unmouted


Additional info:

https://github.com/kubernetes/kubernetes/issues/65879
https://github.com/kubernetes/kubernetes/issues/65110

Comment 12 Liang Xia 2018-12-19 08:07:22 UTC

Checked on below version, with the steps mentioned in #comment 7, the issue does not occur now. Move bug to verified.
# oc version
oc v3.9.59
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://preserve-qe-lxia-39-master-etcd-1:8443
openshift v3.9.59
kubernetes v1.9.1+a0ce1bc657


# oc describe pod testpod
Name:         testpod
Namespace:    test
Node:         preserve-qe-lxia-39-nrr-1/172.16.122.17
Start Time:   Wed, 19 Dec 2018 03:02:52 -0500
Labels:       name=test
Annotations:  openshift.io/scc=anyuid
Status:       Running
IP:           10.129.0.93
Containers:
  busybox:
    Container ID:  docker://dc1696e73c1cd63e89e28411b393734f5090244ccce22062cecd5c51ab0b5b2d
    Image:         gcr.io/google_containers/busybox
    Image ID:      docker-pullable://gcr.io/google_containers/busybox@sha256:d8d3bc2c183ed2f9f10e7258f84971202325ee6011ba137112e01e30f206de67
    Port:          <none>
    Command:
      /bin/sleep
      10000
    State:          Running
      Started:      Wed, 19 Dec 2018 03:02:55 -0500
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /mnt/test from vol (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-92jcj (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          True 
  PodScheduled   True 
Volumes:
  vol:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      my-config
    Optional:  false
  default-token-92jcj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-92jcj
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/compute=true
Tolerations:     <none>
Events:
  Type    Reason                 Age   From                                Message
  ----    ------                 ----  ----                                -------
  Normal  Scheduled              35s   default-scheduler                   Successfully assigned testpod to preserve-qe-lxia-39-nrr-1
  Normal  SuccessfulMountVolume  35s   kubelet, preserve-qe-lxia-39-nrr-1  MountVolume.SetUp succeeded for volume "vol"
  Normal  SuccessfulMountVolume  35s   kubelet, preserve-qe-lxia-39-nrr-1  MountVolume.SetUp succeeded for volume "default-token-92jcj"
  Normal  Pulling                33s   kubelet, preserve-qe-lxia-39-nrr-1  pulling image "gcr.io/google_containers/busybox"
  Normal  Pulled                 32s   kubelet, preserve-qe-lxia-39-nrr-1  Successfully pulled image "gcr.io/google_containers/busybox"
  Normal  Created                32s   kubelet, preserve-qe-lxia-39-nrr-1  Created container
  Normal  Started                32s   kubelet, preserve-qe-lxia-39-nrr-1  Started container

# oc delete pod testpod
pod "testpod" deleted

# oc get pods
No resources found.

Comment 13 Liang Xia 2018-12-19 08:11:15 UTC

Also, QE did a regression testing for subpath, no issue found.

# realpath /var/lib/origin
/var/lib/origin_via_link

# ls -ld /var/lib/origin
lrwxrwxrwx. 1 root root 24 Dec 19 02:56 /var/lib/origin -> /var/lib/origin_via_link


# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.6 (Maipo)

# uname -a
Linux preserve-qe-lxia-39-master-etcd-1 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 15 17:36:42 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Comment 15 errata-xmlrpc 2019-01-10 08:55:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0028

Note You need to log in before you can comment on or make changes to this bug.