Description of problem: Since upgrading to OSE 3.4 during startup of the router we see PODs created in a state of PodFitsHostPorts as follows
NAME READY STATUS RESTARTS AGE
router-91-9pb9j 1/1 Running 0 18m
router-91-agqeh 0/1 PodFitsHostPorts 0 18m
router-91-aioz5 1/1 Running 0 18m
router-91-fi92p 1/1 Running 0 18m
router-91-h5w2v 1/1 Running 0 18m
router-91-xvnrd 0/1 PodFitsHostPorts 0 30m
This can run into the hundreds and is very intermittent to recreate
Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
Steps to Reproduce:
Actual results: Creates hundreds of pods resulting in PodFitsHostPorts
Please include the entire output from the last TASK line through the end of output if an error is generated
Expected results: This should not happen just after upgrading
Please attach logs from ansible-playbook with the -vvv flag
It's probably scaled to a number of hosts that matches the selector but for some reason some of those hosts have services running on them that creates pod conflicts. I'd suggest scaling the router down to a minimum number of pods desired.
The update strategy on the DC may be bad as well.
I think this is a master problem.
I think the scheduler does not find the node with the predicate (host port) and the kubelet will delete the pod after some time and replication controller will then recreate it, which is why there are so many pods created...
I don't remember exactly 3.4 behavior here, but I don't think this is an update strategy problem as the deployer pod will just scale the RC up and down (it does not create nor manage the pods).