Created attachment 1439659 [details] node log Description of problem: Runtime is CRIO. Mistakenly added a volume for a non-existent secret to a dc. The redeploy stuck in ContainterCreating. Removed the volume and the dc redeployed OK, but the pod that never started got stuck in Terminating and never went away. On the CRIO side, crictl ps shows that the container is gone. Version-Release number of selected component (if applicable): 3.10.0-0.47.0 How reproducible: Always Steps to Reproduce: 1. oc project default 2. oc volume dc/docker-registry --add --name=dockersecrets -m /etc/registryconfig --type=secret --secret-name=badsecret 3. oc get pods - new docker-registry pod is stuck in ContainerCreating 4. oc volume dc/docker-registry --remove --name=dockersecrets 5. wait a minute or two, oc get pods Actual results: root@ip-172-31-39-189: ~ # oc get pods -o wide -w NAME READY STATUS RESTARTS AGE IP NODE docker-registry-2-crc7s 0/1 Terminating 0 1m <none> ip-172-31-15-75.us-west-2.compute.internal docker-registry-3-s76rz 1/1 Running 0 21s 172.21.2.4 ip-172-31-15-75.us-west-2.compute.internal registry-console-1-55n6p 1/1 Running 0 1h 172.21.0.3 ip-172-31-28-198.us-west-2.compute.internal router-1-kt9pd 1/1 Running 0 1h 172.31.15.75 ip-172-31-15-75.us-west-2.compute.internal Expected results: Terminating pod eventually goes away Additional info: node log attached
Restarting the node makes the pod go away.
The output in the description shows the pod stuck for only a minute. Here is the same scenario with the pod stuck for a longer period of time: NAME READY STATUS RESTARTS AGE docker-registry-4-7qplq 0/1 Terminating 0 7h docker-registry-5-bph4l 1/1 Running 0 7h registry-console-1-55n6p 1/1 Running 0 10h router-1-kt9pd 1/1 Running 0 10h
I am able to reproduce this issue by following the given steps. Here is the pod's yaml output that got stuck in terminating: apiVersion: v1 kind: Pod metadata: annotations: openshift.io/deployment-config.latest-version: "2" openshift.io/deployment-config.name: docker-registry openshift.io/deployment.name: docker-registry-2 openshift.io/scc: restricted creationTimestamp: 2018-05-23T21:54:10Z deletionGracePeriodSeconds: 30 deletionTimestamp: 2018-05-23T21:56:25Z generateName: docker-registry-2- labels: deployment: docker-registry-2 deploymentconfig: docker-registry docker-registry: default name: docker-registry-2-2fks5 namespace: default ownerReferences: - apiVersion: v1 blockOwnerDeletion: true controller: true kind: ReplicationController name: docker-registry-2 uid: d0f0415c-5ed3-11e8-9e6d-0eaff8b454fc resourceVersion: "13353" selfLink: /api/v1/namespaces/default/pods/docker-registry-2-2fks5 uid: d2da4c0b-5ed3-11e8-9e6d-0eaff8b454fc spec: containers: - env: - name: REGISTRY_HTTP_ADDR value: :5000 - name: REGISTRY_HTTP_NET value: tcp - name: REGISTRY_HTTP_SECRET value: pCQCy5mQjxdRkFbhZMOuLEoStJQUquucJGJOF/WpK+8= - name: REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_ENFORCEQUOTA value: "false" - name: REGISTRY_OPENSHIFT_SERVER_ADDR value: docker-registry.default.svc:5000 - name: REGISTRY_HTTP_TLS_CERTIFICATE value: /etc/secrets/registry.crt - name: REGISTRY_HTTP_TLS_KEY value: /etc/secrets/registry.key image: registry.reg-aws.openshift.com/openshift3/ose-docker-registry:v3.10 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /healthz port: 5000 scheme: HTTPS initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 name: registry ports: - containerPort: 5000 protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /healthz port: 5000 scheme: HTTPS periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 resources: requests: cpu: 100m memory: 256Mi securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID privileged: false runAsUser: 1000000000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /registry name: registry-storage - mountPath: /etc/secrets name: registry-certificates - mountPath: /etc/registryconfig name: dockersecrets - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: registry-token-2n6gc readOnly: true dnsPolicy: ClusterFirst imagePullSecrets: - name: registry-dockercfg-79875 nodeName: ip-172-18-12-134.ec2.internal nodeSelector: node-role.kubernetes.io/infra: "true" restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1000000000 seLinuxOptions: level: s0:c1,c0 serviceAccount: registry serviceAccountName: registry terminationGracePeriodSeconds: 30 tolerations: - effect: NoSchedule key: node.kubernetes.io/memory-pressure operator: Exists volumes: - emptyDir: {} name: registry-storage - name: registry-certificates secret: defaultMode: 420 secretName: registry-certificates - name: dockersecrets secret: defaultMode: 420 secretName: badsecret - name: registry-token-2n6gc secret: defaultMode: 420 secretName: registry-token-2n6gc status: conditions: - lastProbeTime: null lastTransitionTime: 2018-05-23T21:54:10Z status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: 2018-05-23T21:54:10Z message: 'containers with unready status: [registry]' reason: ContainersNotReady status: "False" type: Ready - lastProbeTime: null lastTransitionTime: 2018-05-23T21:54:10Z status: "True" type: PodScheduled containerStatuses: - image: registry.reg-aws.openshift.com/openshift3/ose-docker-registry:v3.10 imageID: "" lastState: {} name: registry ready: false restartCount: 0 state: terminated: exitCode: 0 finishedAt: null startedAt: null hostIP: 172.18.12.134 phase: Pending qosClass: Burstable startTime: 2018-05-23T21:54:10Z
This issue is reproducible with docker too.
Pod cleanup is bailing out here: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_pods.go#L916 kubelet_pods.go:953] Pod "docker-registry-4-xdqvp_default(e2cd7904-6058-11e8-aa81-0ee0a45883f2)" is terminated, but some volumes have not been cleaned up May 25 20:19:27 ip-172-18-11-127.ec2.internal atomic-openshift-node[16302]: I0525 20:19:27.167788 16302 server.go:435] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"docker-registry-4-xdqvp", UID:"e2cd7904-6058-11e8-aa81-0ee0a45883f2", APIVersion:"v1", ResourceVersion:"6312", FieldPath:""}): type: 'Warning' reason: 'FailedMount' MountVolume.SetUp failed for volume "dockersecrets" : secrets "badsecret" not found PodResourcesAreReclaimed() podVolumesExist() getMountedVolumePathListFromDisk() IsLikelyNotMountPoint() // If the directory has a different device as parent, then it is a mountpoint. if stat.Sys().(*syscall.Stat_t).Dev != rootStat.Sys().(*syscall.Stat_t).Dev { return false, nil } My theory is that the memory-backed emptydir that wraps the secret is mounted, but when the secret is not found, the secret volume plugin doesn't populate it and the secret mount fails. However the emptydir _did_ mount successfully even if the higher-level secret did not. Thus emptydir is mounted, blocking the pod from cleaning up and deleting cleanly. kubelet_volumes.go:69] pod "8483696c-605c-11e8-aa81-0ee0a45883f2" found, but volumes are still mounted on disk [/var/lib/origin/openshift.local.volumes/pods/8483696c-605c-11e8-aa81-0ee0a45883f2/volumes/kubernetes.io~secret/dockersecrets]
Small correction so I don't forget before the weekend. The emptydir is not mounted but the _path_ does exist. I think TearDown() on the emptydir would have done this but I'm not sure if it is called if the SetUp() on the secret failed.
An update about this: I tried to reproduce this on latest kube master and kube 1.10 but am not able to reproduce. Since "set volume" command does not exist in kube, I did following steps: 1. I manually edited (kubectl edit) deployments by adding bad volumemount and volume to container and pod specs respectively. 2. After update to the deployment, a new pods is created and is stuck in containerCreating state. 3. After 2, the deployment is edited again to remove bad volumemount and volume. 4. The pod goes into terminated state and get deleted completely. When I tried the above steps with Openshift, in step 4, the pod remain in terminated state. I am assuming that the above steps following in kube are similar to the original steps reported in this bz. I noticed some PRs that looked liked might have addressed it in kube, but then those PRs are already in openshift too. So still working on it.
Please ignore my previous comment https://bugzilla.redhat.com/show_bug.cgi?id=1580485#c8. Sorry for any confusion. While debugging kube, I noticed that the flag (which is deprecated and is false by default) keepTerminatedPodVolumes was true, and that caused (erroneous) pods stuck in Terminating getting deleted completely. The reason this flag was true, because I was using hack/local-up-cluster.sh and where the flag is still true. To avoid this, we should get https://github.com/kubernetes/kubernetes/pull/57355 in asap. OTOH, I am wondering we should keep the behaviour exhibited by keepTerminatedPodVolumes same for erroneous volumes. Anyway I will continue looking into this.
While I am debugging this, I have also created an upstream issue: https://github.com/kubernetes/kubernetes/issues/64779 Please note the above issue has more problems than the problem noticed here.
opened PR to fix this https://github.com/kubernetes/kubernetes/pull/64855
Openshift PR - https://github.com/openshift/origin/pull/19939
Verified in openshift: oc v3.10.0-0.66.0 openshift v3.10.0-0.66.0 kubernetes v1.10.0+b81c8f8 # uname -a Linux qe-piqin-master-etcd-nfs-1 3.10.0-862.6.1.el7.x86_64 #1 SMP Mon Jun 4 15:33:25 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux # cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.5 (Maipo)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816