Bug 1647674

Summary:	FailedScheduling didn't have free ports
Product:	OpenShift Container Platform	Reporter:	Mahesh Taru <mtaru>
Component:	Node	Assignee:	Seth Jennings <sjenning>
Status:	CLOSED ERRATA	QA Contact:	Weinan Liu <weinliu>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.11.0	CC:	aabhishe, adhingra, aos-bugs, jhou, jokerman, mmccomas, pdwyer, sfu, sgarciam
Target Milestone:	---
Target Release:	3.11.z
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Fixes issue where pods would not schedule due to not having free ports.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-01-30 15:19:31 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Mahesh Taru 2018-11-08 06:08:52 UTC

Description of problem:
Deployed router pods, but one pod didn't schedule:
***************************
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  6s (x25 over 34s)  default-scheduler  0/9 nodes are available: 4 node(s) didn't have free ports for the requested pod ports, 7 node(s) didn't match node selector.
***************************

Version-Release number of selected component (if applicable):
oc v3.11.16
kubernetes v1.11.0+d4cacc0

How reproducible:
Some times

Steps to Reproduce:
1. Pod deployment during installation or post installation.
2. Deploy application with 'oc new-app' OR
3. Deploy pod with 'oc create -f <pod.yaml>'

Actual results:
Scheduling fails even when ports and resources are available.

Expected results:
Scheduling should succeed without any error.

Additional info:
Upstream issue: https://github.com/kubernetes/kubernetes/issues/66568

Workaround:
restart control plane

Comment 3 Avesh Agarwal 2018-12-11 20:03:48 UTC

https://github.com/openshift/origin/pull/21649

Comment 4 Anil Dhingra 2018-12-26 02:05:15 UTC

tried with latest v3.11.51 still same even restart control plane didnt help

  Type     Reason            Age              From                         Message
  ----     ------            ----             ----                         -------
  Warning  FailedScheduling  1m (x9 over 1m)  default-scheduler            0/5 nodes are available: 2 node(s) didn't have free ports for the requested pod ports, 3 node(s) didn't match node selector.
  Normal   Scheduled         1m               default-scheduler            Successfully assigned default/router-1-4kq8j to infra1.example.com

pod should not come to running state if requested ports are not available - its confusing as state changes to running & even docker ps shows container is running ( pod should be in failed state if cant get ports )
[root@master ~]#  oc get pods -o wide -n default | grep -i router
router-1-4kq8j             1/1       Running   0          21m       192.168.122.15   infra1.example.com   <none>
router-1-dnm7j             1/1       Running   3          15h       192.168.122.14   infra.example.com    <none>

[root@infra1 ~]# docker ps -a | grep -i route
208feb1113be        faa4d9ee67c4                                             "/usr/bin/openshif..."   22 minutes ago      Up 22 minutes                                   k8s_router_router-1-4kq8j_default_62a619da-08af-11e9-af02-525400f7e27c_0
7fc75b7510da        registry.access.redhat.com/openshift3/ose-pod:v3.11.51   "/usr/bin/pod"           22 minutes ago      Up 22 minutes                                   k8s_POD_router-1-4kq8j_default_62a619da-08af-11e9-af02-525400f7e27c_0

Comment 10 errata-xmlrpc 2019-01-30 15:19:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0096

Comment 11 Sergio G. 2019-08-28 09:02:45 UTC

Was this backported to 3.10 ?