Bug 1580485 - pod stuck in Terminating after adding/removing volume for non-existent secret
Summary: pod stuck in Terminating after adding/removing volume for non-existent secret
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.10.0
Hardware: x86_64
OS: Linux
Target Milestone: ---
: 3.10.z
Assignee: Hemant Kumar
QA Contact: Qin Ping
Depends On:
TreeView+ depends on / blocked
Reported: 2018-05-21 15:09 UTC by Mike Fiedler
Modified: 2018-07-30 19:16 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2018-07-30 19:16:09 UTC
Target Upstream Version:

Attachments (Terms of Use)
node log (268.41 KB, text/x-vhdl)
2018-05-21 15:09 UTC, Mike Fiedler
no flags Details

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1816 None None None 2018-07-30 19:16:29 UTC

Description Mike Fiedler 2018-05-21 15:09:46 UTC
Created attachment 1439659 [details]
node log

Description of problem:

Runtime is CRIO.   Mistakenly added a volume for a non-existent secret to a dc.  The redeploy stuck in ContainterCreating.   Removed the volume and the dc redeployed OK, but the pod that never started got stuck in Terminating and never went away.

On the CRIO side, crictl ps shows that the container is gone.

Version-Release number of selected component (if applicable): 3.10.0-0.47.0

How reproducible: Always

Steps to Reproduce:
1.  oc project default
2.  oc volume dc/docker-registry --add --name=dockersecrets -m /etc/registryconfig --type=secret --secret-name=badsecret
3.  oc get pods - new docker-registry pod is stuck in ContainerCreating
4.  oc volume dc/docker-registry --remove --name=dockersecrets
5.  wait a minute or two, oc get pods

Actual results:

root@ip-172-31-39-189: ~ # oc get pods -o wide -w
NAME                       READY     STATUS        RESTARTS   AGE       IP             NODE
docker-registry-2-crc7s    0/1       Terminating   0          1m        <none>         ip-172-31-15-75.us-west-2.compute.internal
docker-registry-3-s76rz    1/1       Running       0          21s     ip-172-31-15-75.us-west-2.compute.internal
registry-console-1-55n6p   1/1       Running       0          1h     ip-172-31-28-198.us-west-2.compute.internal
router-1-kt9pd             1/1       Running       0          1h   ip-172-31-15-75.us-west-2.compute.internal

Expected results:

Terminating pod eventually goes away

Additional info:

node log attached

Comment 1 Mike Fiedler 2018-05-21 15:11:10 UTC
Restarting the node makes the pod go away.

Comment 2 Mike Fiedler 2018-05-22 00:14:12 UTC
The output in the description shows the pod stuck for only a minute.   Here is the same scenario with the pod stuck for a longer period of time:

NAME                       READY     STATUS        RESTARTS   AGE
docker-registry-4-7qplq    0/1       Terminating   0          7h
docker-registry-5-bph4l    1/1       Running       0          7h
registry-console-1-55n6p   1/1       Running       0          10h
router-1-kt9pd             1/1       Running       0          10h

Comment 3 Avesh Agarwal 2018-05-23 22:11:04 UTC
I am able to reproduce this issue by following the given steps. Here is the pod's yaml output that got stuck in terminating:

apiVersion: v1
kind: Pod
    openshift.io/deployment-config.latest-version: "2"
    openshift.io/deployment-config.name: docker-registry
    openshift.io/deployment.name: docker-registry-2
    openshift.io/scc: restricted
  creationTimestamp: 2018-05-23T21:54:10Z
  deletionGracePeriodSeconds: 30
  deletionTimestamp: 2018-05-23T21:56:25Z
  generateName: docker-registry-2-
    deployment: docker-registry-2
    deploymentconfig: docker-registry
    docker-registry: default
  name: docker-registry-2-2fks5
  namespace: default
  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicationController
    name: docker-registry-2
    uid: d0f0415c-5ed3-11e8-9e6d-0eaff8b454fc
  resourceVersion: "13353"
  selfLink: /api/v1/namespaces/default/pods/docker-registry-2-2fks5
  uid: d2da4c0b-5ed3-11e8-9e6d-0eaff8b454fc
  - env:
      value: :5000
      value: tcp
      value: pCQCy5mQjxdRkFbhZMOuLEoStJQUquucJGJOF/WpK+8=
      value: "false"
      value: docker-registry.default.svc:5000
      value: /etc/secrets/registry.crt
      value: /etc/secrets/registry.key
    image: registry.reg-aws.openshift.com/openshift3/ose-docker-registry:v3.10
    imagePullPolicy: IfNotPresent
      failureThreshold: 3
        path: /healthz
        port: 5000
        scheme: HTTPS
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
    name: registry
    - containerPort: 5000
      protocol: TCP
      failureThreshold: 3
        path: /healthz
        port: 5000
        scheme: HTTPS
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
        cpu: 100m
        memory: 256Mi
        - KILL
        - MKNOD
        - SETGID
        - SETUID
      privileged: false
      runAsUser: 1000000000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    - mountPath: /registry
      name: registry-storage
    - mountPath: /etc/secrets
      name: registry-certificates
    - mountPath: /etc/registryconfig
      name: dockersecrets
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: registry-token-2n6gc
      readOnly: true
  dnsPolicy: ClusterFirst
  - name: registry-dockercfg-79875
  nodeName: ip-172-18-12-134.ec2.internal
    node-role.kubernetes.io/infra: "true"
  restartPolicy: Always
  schedulerName: default-scheduler
    fsGroup: 1000000000
      level: s0:c1,c0
  serviceAccount: registry
  serviceAccountName: registry
  terminationGracePeriodSeconds: 30
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - emptyDir: {}
    name: registry-storage
  - name: registry-certificates
      defaultMode: 420
      secretName: registry-certificates
  - name: dockersecrets
      defaultMode: 420
      secretName: badsecret
  - name: registry-token-2n6gc
      defaultMode: 420
      secretName: registry-token-2n6gc
  - lastProbeTime: null
    lastTransitionTime: 2018-05-23T21:54:10Z
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2018-05-23T21:54:10Z
    message: 'containers with unready status: [registry]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: 2018-05-23T21:54:10Z
    status: "True"
    type: PodScheduled
  - image: registry.reg-aws.openshift.com/openshift3/ose-docker-registry:v3.10
    imageID: ""
    lastState: {}
    name: registry
    ready: false
    restartCount: 0
        exitCode: 0
        finishedAt: null
        startedAt: null
  phase: Pending
  qosClass: Burstable
  startTime: 2018-05-23T21:54:10Z

Comment 4 Avesh Agarwal 2018-05-24 18:03:17 UTC
This issue is reproducible with docker too.

Comment 5 Seth Jennings 2018-05-25 21:08:33 UTC
Pod cleanup is bailing out here:

kubelet_pods.go:953] Pod "docker-registry-4-xdqvp_default(e2cd7904-6058-11e8-aa81-0ee0a45883f2)" is terminated, but some volumes have not been cleaned up

May 25 20:19:27 ip-172-18-11-127.ec2.internal atomic-openshift-node[16302]: I0525 20:19:27.167788   16302 server.go:435] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"docker-registry-4-xdqvp", UID:"e2cd7904-6058-11e8-aa81-0ee0a45883f2", APIVersion:"v1", ResourceVersion:"6312", FieldPath:""}): type: 'Warning' reason: 'FailedMount' MountVolume.SetUp failed for volume "dockersecrets" : secrets "badsecret" not found

        // If the directory has a different device as parent, then it is a mountpoint.
	if stat.Sys().(*syscall.Stat_t).Dev != rootStat.Sys().(*syscall.Stat_t).Dev {
		return false, nil

My theory is that the memory-backed emptydir that wraps the secret is mounted, but when the secret is not found, the secret volume plugin doesn't populate it and the secret mount fails.  However the emptydir _did_ mount successfully even if the higher-level secret did not.  Thus emptydir is mounted, blocking the pod from cleaning up and deleting cleanly.

kubelet_volumes.go:69] pod "8483696c-605c-11e8-aa81-0ee0a45883f2" found, but volumes are still mounted on disk [/var/lib/origin/openshift.local.volumes/pods/8483696c-605c-11e8-aa81-0ee0a45883f2/volumes/kubernetes.io~secret/dockersecrets]

Comment 6 Seth Jennings 2018-05-25 22:13:54 UTC
Small correction so I don't forget before the weekend.  The emptydir is not mounted but the _path_ does exist.  I think TearDown() on the emptydir would have done this but I'm not sure if it is called if the SetUp() on the secret failed.

Comment 8 Avesh Agarwal 2018-06-05 15:32:34 UTC
An update about this:

I tried to reproduce this on latest kube master and kube 1.10 but am not able to reproduce. Since "set volume" command does not exist in kube, I did following steps:

1. I manually edited (kubectl edit) deployments by adding bad volumemount and volume to container and pod specs respectively. 

2. After update to the deployment, a new pods is created and is stuck in containerCreating state. 

3. After 2, the deployment is edited again to remove bad volumemount and volume.

4. The pod goes into terminated state and get deleted completely.

When I tried the above steps with Openshift, in step 4, the pod remain in terminated state.

I am assuming that the above steps following in kube are similar to the original steps reported in this bz. I noticed some PRs that looked liked might have addressed it in kube, but then those PRs are already in openshift too.

So still working on it.

Comment 9 Avesh Agarwal 2018-06-05 16:17:42 UTC
Please ignore my previous comment https://bugzilla.redhat.com/show_bug.cgi?id=1580485#c8. Sorry for any confusion.

While debugging kube, I noticed that the flag (which is deprecated and is false by default) keepTerminatedPodVolumes was true, and that caused (erroneous) pods stuck in Terminating getting deleted completely. The reason this flag was true, because I was using hack/local-up-cluster.sh and where the flag is still true. To avoid this, we should get https://github.com/kubernetes/kubernetes/pull/57355 in asap.

OTOH, I am wondering we should keep the behaviour exhibited by keepTerminatedPodVolumes same for erroneous volumes. Anyway I will continue looking into this.

Comment 10 Avesh Agarwal 2018-06-05 16:56:03 UTC
While I am debugging this, I have also created an upstream issue: https://github.com/kubernetes/kubernetes/issues/64779

Please note the above issue has more problems than the problem noticed here.

Comment 11 Hemant Kumar 2018-06-06 22:08:54 UTC
opened PR to fix this https://github.com/kubernetes/kubernetes/pull/64855

Comment 12 Hemant Kumar 2018-06-08 02:41:15 UTC
Openshift PR - https://github.com/openshift/origin/pull/19939

Comment 14 Qin Ping 2018-06-13 02:58:42 UTC
Verified in openshift:
oc v3.10.0-0.66.0
openshift v3.10.0-0.66.0
kubernetes v1.10.0+b81c8f8

# uname -a
Linux qe-piqin-master-etcd-nfs-1 3.10.0-862.6.1.el7.x86_64 #1 SMP Mon Jun 4 15:33:25 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Comment 16 errata-xmlrpc 2018-07-30 19:16:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.