Bug 1502129
Summary: | OpenShift Container Platform and CNS, pods stuck in terminating state | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Magnus Glantz <sudo> | ||||||||
Component: | Node | Assignee: | Joel Smith <joelsmith> | ||||||||
Status: | CLOSED DUPLICATE | QA Contact: | DeShuai Ma <dma> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 3.6.1 | CC: | aos-bugs, jokerman, mmccomas, sudo, tahonen, wmeng | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2017-10-20 16:54:14 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Magnus Glantz
2017-10-14 11:59:57 UTC
Created attachment 1338538 [details]
sosreport from master
Created attachment 1338539 [details]
sosreport from node associated to stuck pod
This node also runs CNS/infra, but that seems not to be related, I've seen pods stuck on nodes which does not run CNS.
Adding tahonen (SSA/OpenShift) who has also seen this issue. Restarting atomic-openshift-master-api and atomic-openshift-master-controllers does not resolve the issue. From stuck pod (which is detailed in sosreports) [root@ocpm-0 ~]# oc project test Now using project "test" on server "https://ocpb.eazdhewkr11upilhlavwerjpeb.fx.internal.cloudapp.net:8443". [root@ocpm-0 ~]# oc get all NAME READY STATUS RESTARTS AGE po/jenkins-1-pjhzn 0/1 Terminating 0 1h [root@ocpm-0 ~]# oc describe pod jenkins-1-pjhzn Name: jenkins-1-pjhzn Namespace: test Security Policy: restricted Node: ocpi-2.eazdhewkr11upilhlavwerjpeb.fx.internal.cloudapp.net/192.168.2.5 Start Time: Sat, 14 Oct 2017 11:15:11 +0000 Labels: deployment=jenkins-1 deploymentconfig=jenkins name=jenkins Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"test","name":"jenkins-1","uid":"ecdb2278-b0d0-11e7-8117-000d3ab79a02",... openshift.io/deployment-config.latest-version=1 openshift.io/deployment-config.name=jenkins openshift.io/deployment.name=jenkins-1 openshift.io/scc=restricted Status: Terminating (expires Sat, 14 Oct 2017 11:20:29 +0000) Termination Grace Period: 30s IP: Controllers: ReplicationController/jenkins-1 Containers: jenkins: Container ID: docker://625c14709fb05348b896bb0078ebed4f352e7cfa537a2a3fe8f358be6cf79b57 Image: registry.access.redhat.com/openshift3/jenkins-2-rhel7@sha256:c47b5d8c9ba8a57255e5191cbf0ed9e0cb998bc823846ba52c34cca11a3cf2a0 Image ID: docker-pullable://registry.access.redhat.com/openshift3/jenkins-2-rhel7@sha256:c47b5d8c9ba8a57255e5191cbf0ed9e0cb998bc823846ba52c34cca11a3cf2a0 Port: State: Terminated Exit Code: 0 Started: Mon, 01 Jan 0001 00:00:00 +0000 Finished: Mon, 01 Jan 0001 00:00:00 +0000 Ready: False Restart Count: 0 Limits: memory: 512Mi Requests: memory: 512Mi Liveness: http-get http://:8080/login delay=420s timeout=3s period=10s #success=1 #failure=30 Readiness: http-get http://:8080/login delay=3s timeout=3s period=10s #success=1 #failure=3 Environment: OPENSHIFT_ENABLE_OAUTH: true OPENSHIFT_ENABLE_REDIRECT_PROMPT: true OPENSHIFT_JENKINS_JVM_ARCH: i386 KUBERNETES_MASTER: https://kubernetes.default:443 KUBERNETES_TRUST_CERTIFICATES: true JNLP_SERVICE_NAME: jenkins-jnlp Mounts: /var/lib/jenkins from jenkins-data (rw) /var/run/secrets/kubernetes.io/serviceaccount from jenkins-token-82fgt (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: jenkins-data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: jenkins ReadOnly: false jenkins-token-82fgt: Type: Secret (a volume populated by a Secret) SecretName: jenkins-token-82fgt Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: <none> Events: <none> Please note that this terminating state does not seem to expire, atleast not in 2 hours time.. Registry is not on CNS, if this matters. Joel, PTAL. Might be related to (or a dup of) https://bugzilla.redhat.com/show_bug.cgi?id=1489082 Magnus, Can you confirm whether just deleting the pod triggers the issue? Or does it only occur if you delete the namespace while the pod still exists? We're marking this as a duplicate for now. If fixes for 1489082 don't remedy this, we'll re-open it. *** This bug has been marked as a duplicate of bug 1489082 *** Duplicate |