Description of problem:
If change nodeSelector and replicas at the same time, only one replicated image-registry pod will obey nodeSelector rule:
$ oc get pods -n openshift-image-registry -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
cluster-image-registry-operator-54ff44b885-dk6j9 1/1 Running 0 3h21m 10.130.0.21 ip-10-0-12-181.us-east-2.compute.internal <none>
image-registry-5dd88dd48b-kjvqj 0/1 Pending 0 9m53s <none> <none> <none>
image-registry-78b4d6b48f-4pxcz 1/1 Running 0 9m53s 10.130.0.56 ip-10-0-12-181.us-east-2.compute.internal <none>
image-registry-78b4d6b48f-znbnb 1/1 Running 0 3h21m 10.129.0.23 ip-10-0-24-186.us-east-2.compute.internal <none>
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.oc edit configs.imageregistry.operator.openshift.io
2.oc get pods -n openshift-image-registry
Only one pod obey nodeSelector rule.
All replicated pod should obey.
I think this is expected. The rollout of the changes cannot continue until the first new pod goes from pending to available, and your pod is pending because no nodes match the nodeselector.
You can confirm that the right settings were applied by looking at the image-registry deployment object. Assuming you see the correct replica count and nodeselector value there, this is working as expected. To fully test it, you'll need to set the nodeselector to a node that can be scheduled to.
I thought the first new created pod should also be pending with the same reason like the pending pod. Since they are created at the same time after the invalid nodeSelector applied.
I think this is a bug in deployments (or at least deployment behavior that needs to be explained).
I'm able to see similar behavior by simply:
1) oc create deployment --image=someimage mydeployment (results in 1 running pod as expected)
2) oc edit deployment mydeployment
- set replicas: 2
- set nodeselector as you did (ie set it to something that can't be scheduled)
3) result: I see 3 pods- the original pod from the deployment(running) and 2 new pods: one with the nodeselector and the other without it.
My theory would be that the deployment controller first scales the deployment up to 2 replicas (hence why you see a second pod running), then starts rolling out the nodeselector change(at which point it gets stuck when the first new pod w/ the nodeselector gets stuck in pending...if it had proceeded, you'd have seen the oldest pod and the new running pod both get removed and replaced with another pod that has the nodeselector set), but i don't know for sure.
Assigning to master team to confirm/explain this behavior.
Master team if you can confirm this is working as expected/designed, please set it back to ON_QA so the QE team can update their test case expectations/procedure.
Ben is right. this seems like a cosmetic bug in upstream Deployment controller where this should check that pod template also didn't change https://github.com/kubernetes/kubernetes/blob/954996e231074dc7429f7be1256a579bedd8344c/pkg/controller/deployment/deployment_controller.go#L632-L638
Looking at this again, I now think scaling first is the correct approach.
It feels better to first create pods that are easy, or scale the old ones down before starting update on the rest. Considering cases when you switch say from 2 to 1 replica with Recreate and don't have leader election, by applying the config your expectation is to have the old once scaled to 1 and then recreate of the remaining pod.
I can see all scaled up pods obey the node selector rule with 4.4.0-0.nightly-2020-02-06-230833 now:
$ oc get pods
NAME READY STATUS RESTARTS AGE
cluster-image-registry-operator-df4ccc5c9-jzhnj 2/2 Running 0 7h39m
image-registry-94bbd669f-5zcvl 0/1 Pending 0 5m1s
image-registry-94bbd669f-7w95z 0/1 Pending 0 5m1s
Verified on 4.4.0-0.nightly-2020-02-12-191550.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.