Bug 1623053

Summary:	Pods stuck in Terminating status when using configmap mounted using subpath volume
Product:	OpenShift Container Platform	Reporter:	Jaspreet Kaur <jkaur>
Component:	Storage	Assignee:	Jan Safranek <jsafrane>
Status:	CLOSED ERRATA	QA Contact:	Liang Xia <lxia>
Severity:	high	Docs Contact:
Priority:	high
Version:	3.9.0	CC:	aos-bugs, aos-storage-staff, bchilds, jkaur, jmalde, jsafrane, lxia, rekhan
Target Milestone:	---	Keywords:	Rebase
Target Release:	3.9.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1640077 1640078 1640079 1696207 (view as bug list)		Environment:
Last Closed:	2019-01-10 08:55:23 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1640077, 1640078, 1640079, 1696207, 1696591

Description Jaspreet Kaur 2018-08-28 12:19:38 UTC

Description of problem: Many pods after deleting will stick infinitely in Terminating status, due to inability to unmount volumes (UnmountVolume.TearDown failed) which are being used by configmap as subpath volumes.

Error is as below:

Error: "error cleaning subPath mounts for volume \"example-config\" (UniqueName: \"kubernetes.io/configmap/a87e947c-a51e-11e8-b942-00505688494d-application-config\") pod \"a87e947c-a51e-11e8-b942-00505688494d\" (UID: \"a87e947c-a51e-11e8-b942-00505688494d\") : error deleting /var/lib/origin/openshift.local.volumes/pods/a87e947c-a51e-11e8-b942-00505688494d/volume-subpaths/example-config/test/1: remove /var/lib/origin/openshift.local.volumes/pods/a87e947c-a51e-11e8-b942-00505688494d/volume-subpaths/example-config/test/1: device or resource busy"


Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results: Pods failed to remove.  When the pods are deleted with: --force --grace-period=0. The config-maps are still in the system and It cannot be deleted because they are mounted.

Expected results: Pod gets removed and coonfigmap unmouted


Additional info:

https://github.com/kubernetes/kubernetes/issues/65879
https://github.com/kubernetes/kubernetes/issues/65110

Comment 12 Liang Xia 2018-12-19 08:07:22 UTC

Checked on below version, with the steps mentioned in #comment 7, the issue does not occur now. Move bug to verified.
# oc version
oc v3.9.59
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://preserve-qe-lxia-39-master-etcd-1:8443
openshift v3.9.59
kubernetes v1.9.1+a0ce1bc657


# oc describe pod testpod
Name:         testpod
Namespace:    test
Node:         preserve-qe-lxia-39-nrr-1/172.16.122.17
Start Time:   Wed, 19 Dec 2018 03:02:52 -0500
Labels:       name=test
Annotations:  openshift.io/scc=anyuid
Status:       Running
IP:           10.129.0.93
Containers:
  busybox:
    Container ID:  docker://dc1696e73c1cd63e89e28411b393734f5090244ccce22062cecd5c51ab0b5b2d
    Image:         gcr.io/google_containers/busybox
    Image ID:      docker-pullable://gcr.io/google_containers/busybox@sha256:d8d3bc2c183ed2f9f10e7258f84971202325ee6011ba137112e01e30f206de67
    Port:          <none>
    Command:
      /bin/sleep
      10000
    State:          Running
      Started:      Wed, 19 Dec 2018 03:02:55 -0500
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /mnt/test from vol (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-92jcj (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          True 
  PodScheduled   True 
Volumes:
  vol:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      my-config
    Optional:  false
  default-token-92jcj:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-92jcj
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/compute=true
Tolerations:     <none>
Events:
  Type    Reason                 Age   From                                Message
  ----    ------                 ----  ----                                -------
  Normal  Scheduled              35s   default-scheduler                   Successfully assigned testpod to preserve-qe-lxia-39-nrr-1
  Normal  SuccessfulMountVolume  35s   kubelet, preserve-qe-lxia-39-nrr-1  MountVolume.SetUp succeeded for volume "vol"
  Normal  SuccessfulMountVolume  35s   kubelet, preserve-qe-lxia-39-nrr-1  MountVolume.SetUp succeeded for volume "default-token-92jcj"
  Normal  Pulling                33s   kubelet, preserve-qe-lxia-39-nrr-1  pulling image "gcr.io/google_containers/busybox"
  Normal  Pulled                 32s   kubelet, preserve-qe-lxia-39-nrr-1  Successfully pulled image "gcr.io/google_containers/busybox"
  Normal  Created                32s   kubelet, preserve-qe-lxia-39-nrr-1  Created container
  Normal  Started                32s   kubelet, preserve-qe-lxia-39-nrr-1  Started container

# oc delete pod testpod
pod "testpod" deleted

# oc get pods
No resources found.

Comment 13 Liang Xia 2018-12-19 08:11:15 UTC

Also, QE did a regression testing for subpath, no issue found.

# realpath /var/lib/origin
/var/lib/origin_via_link

# ls -ld /var/lib/origin
lrwxrwxrwx. 1 root root 24 Dec 19 02:56 /var/lib/origin -> /var/lib/origin_via_link


# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.6 (Maipo)

# uname -a
Linux preserve-qe-lxia-39-master-etcd-1 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 15 17:36:42 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Comment 15 errata-xmlrpc 2019-01-10 08:55:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0028