Description of problem: Since upgrading to OSE 3.4 during startup of the router we see PODs created in a state of PodFitsHostPorts as follows NAME READY STATUS RESTARTS AGE router-91-9pb9j 1/1 Running 0 18m router-91-agqeh 0/1 PodFitsHostPorts 0 18m router-91-aioz5 1/1 Running 0 18m router-91-fi92p 1/1 Running 0 18m router-91-h5w2v 1/1 Running 0 18m router-91-xvnrd 0/1 PodFitsHostPorts 0 30m This can run into the hundreds and is very intermittent to recreate Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Creates hundreds of pods resulting in PodFitsHostPorts Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: This should not happen just after upgrading Additional info: Please attach logs from ansible-playbook with the -vvv flag
It's probably scaled to a number of hosts that matches the selector but for some reason some of those hosts have services running on them that creates pod conflicts. I'd suggest scaling the router down to a minimum number of pods desired.
The update strategy on the DC may be bad as well.
I think this is a master problem.
I think the scheduler does not find the node with the predicate (host port) and the kubelet will delete the pod after some time and replication controller will then recreate it, which is why there are so many pods created... I don't remember exactly 3.4 behavior here, but I don't think this is an update strategy problem as the deployer pod will just scale the RC up and down (it does not create nor manage the pods).