Description of problem:
Several days ago, webconsole pods were evicted due to some sort of disk pressure. Even though that disk pressure is no longer present, the pods are not being scheduled. This condition persists even after restarting master-controllers.
[root@free-stg-master-9fec9 ~]# oc get pods -n openshift-web-console
NAME READY STATUS RESTARTS AGE
webconsole-6768b679b8-2j4wk 0/1 Evicted 0 5d
webconsole-6768b679b8-6pjl9 0/1 Evicted 0 7d
webconsole-6768b679b8-brv29 0/1 Evicted 0 7d
webconsole-6768b679b8-bt5wv 0/1 Evicted 0 5d
webconsole-6768b679b8-csdwp 0/1 Evicted 0 5d
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. cluster was running acceptably
2. pods were evicted due to disk pressure
3. pods were never scheduled after disk pressure abated
My triage results:
The evicted pods were evicted days ago. The deployment controller doesn't seem to be attempting to start new ones for some reason.
I started a deployment in a test project on that cluster and the pods were created as expected so the controller is working at least to some degree. Couldn't find any panics or anything in the controller logs on the current leader ip-172-31-78-254.us-east-2.compute.internal.
I am not sure why that happens, the RS is scaled down to 0, no idea why and there are no logs.
free-stg runs with loglevel=2 which is pointless for debugging controller issues.
Justin can you please raise it to V4? I am afraid that the bug will go away with controllers restart which should be a good reason to run with V4 by default, at least in testing environments.
my current findings:
- Deployment (with Recreate) strategy waits for all pods to be deleted here:
- Deployment does that by scaling RS to 0
- Deployment waits till the pods are deleted but RS leaves them around for some reason
I need to find out tomorrow why RS leaves that pod around when scaled to 0.
*** Bug 1547604 has been marked as a duplicate of this bug. ***
RS ignores failed pods
and leave them laying around
upstream now has issue regarding failed pods:
I'll try if upstream accepts a hotfix before RS and other controllers are fixed in generic way.
upstream PR for unblocking the Deployment:
Additional fixes will follow for the leftover pods depending on which way upstream decides to go but this should fix the stuck deployment.
firstname.lastname@example.org yes, we have the same issue with DCs but we have a timeout in addition so it will reconcile eventually, so it's not that severe. Please file a separate issue for tracking it.
Picked to origin 3.9: https://github.com/openshift/origin/pull/18760
I am sure this worked now with the new puddle built out, but there I found the new deploy cost a lot time also, see the events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 1h deployment-controller Scaled up replica set busybox-5f9469d8dc to 1
Normal ScalingReplicaSet 14m deployment-controller Scaled down replica set busybox-5f9469d8dc to 0
Normal ScalingReplicaSet 5m deployment-controller Scaled up replica set busybox-67c646dff5 to 1
[root@host-172-16-120-6 ~]# oc get pod
NAME READY STATUS RESTARTS AGE
busybox-5f9469d8dc-m54jl 0/1 Evicted 0 1h
busybox-67c646dff5-n477f 1/1 Running 0 12m
[root@host-172-16-120-6 ~]# oc get rs
NAME DESIRED CURRENT READY AGE
busybox-5f9469d8dc 0 0 0 1h
busybox-67c646dff5 1 1 1 12m
Is this ecpected ?
I am not sure what went wrong in your case, but I recall that when looking into the env I think you tested it by filling up the disk and at least the journald was borked; not sure if something else wasn't broken as well.
Would you mind trying to recreate such state again and leave the env around if that happens?
I have tried it locally and I went with the path using failed pods from https://bugzilla.redhat.com/show_bug.cgi?id=1547604 by having failed pods with MatchNodeSelector and the deployment worked without any delay.
(To create such pod you'd need to specify valid nodeName, to avoid scheduler, and then have non-matching nodeSelector for kubelet to reject it and fail it.)
It works well now and the journald is not break this time, verified with: