1573415 – Startup/deployment Router since 3.4 upgrade creates many pods in 'PodFitsHostPorts '

Bug 1573415 - Startup/deployment Router since 3.4 upgrade creates many pods in 'PodFitsHostPorts '

Summary: Startup/deployment Router since 3.4 upgrade creates many pods in 'PodFitsHost...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Master
Sub Component:
Version:	3.4.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	3.4.z
Assignee:	Michal Fojtik
QA Contact:	Wang Haoran
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-05-01 07:04 UTC by Jaspreet Kaur
Modified:	2021-06-10 16:02 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-03-07 11:17:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Jaspreet Kaur 2018-05-01 07:04:30 UTC

Description of problem: Since upgrading to OSE 3.4 during startup  of the router we see PODs created in  a state of PodFitsHostPorts as follows 

NAME              READY     STATUS             RESTARTS   AGE
router-91-9pb9j   1/1       Running            0          18m
router-91-agqeh   0/1       PodFitsHostPorts   0          18m
router-91-aioz5   1/1       Running            0          18m
router-91-fi92p   1/1       Running            0          18m
router-91-h5w2v   1/1       Running            0          18m
router-91-xvnrd   0/1       PodFitsHostPorts   0          30m

This can run into the hundreds and is very intermittent to recreate

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results: Creates hundreds of pods resulting in PodFitsHostPorts

Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results: This should not happen just after upgrading

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Scott Dodson 2018-05-01 12:38:27 UTC

It's probably scaled to a number of hosts that matches the selector but for some reason some of those hosts have services running on them that creates pod conflicts. I'd suggest scaling the router down to a minimum number of pods desired.

Comment 2 Scott Dodson 2018-05-01 12:51:50 UTC

The update strategy on the DC may be bad as well.

Comment 4 Ben Bennett 2018-05-04 17:12:03 UTC

I think this is a master problem.

Comment 5 Michal Fojtik 2018-05-07 09:36:08 UTC

I think the scheduler does not find the node with the predicate (host port) and the kubelet will delete the pod after some time and replication controller will then recreate it, which is why there are so many pods created... 

I don't remember exactly 3.4 behavior here, but I don't think this is an update strategy problem as the deployer pod will just scale the RC up and down (it does not create nor manage the pods).

Note You need to log in before you can comment on or make changes to this bug.