Bug 1647674

Summary: FailedScheduling didn't have free ports
Product: OpenShift Container Platform Reporter: Mahesh Taru <mtaru>
Component: NodeAssignee: Seth Jennings <sjenning>
Status: CLOSED ERRATA QA Contact: Weinan Liu <weinliu>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.11.0CC: aabhishe, adhingra, aos-bugs, jhou, jokerman, mmccomas, pdwyer, sfu, sgarciam
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Fixes issue where pods would not schedule due to not having free ports.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-30 15:19:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Mahesh Taru 2018-11-08 06:08:52 UTC
Description of problem:
Deployed router pods, but one pod didn't schedule:
***************************
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  6s (x25 over 34s)  default-scheduler  0/9 nodes are available: 4 node(s) didn't have free ports for the requested pod ports, 7 node(s) didn't match node selector.
***************************

Version-Release number of selected component (if applicable):
oc v3.11.16
kubernetes v1.11.0+d4cacc0

How reproducible:
Some times

Steps to Reproduce:
1. Pod deployment during installation or post installation.
2. Deploy application with 'oc new-app' OR
3. Deploy pod with 'oc create -f <pod.yaml>'

Actual results:
Scheduling fails even when ports and resources are available.

Expected results:
Scheduling should succeed without any error.

Additional info:
Upstream issue: https://github.com/kubernetes/kubernetes/issues/66568

Workaround:
restart control plane

Comment 3 Avesh Agarwal 2018-12-11 20:03:48 UTC
https://github.com/openshift/origin/pull/21649

Comment 4 Anil Dhingra 2018-12-26 02:05:15 UTC
tried with latest v3.11.51 still same even restart control plane didnt help

  Type     Reason            Age              From                         Message
  ----     ------            ----             ----                         -------
  Warning  FailedScheduling  1m (x9 over 1m)  default-scheduler            0/5 nodes are available: 2 node(s) didn't have free ports for the requested pod ports, 3 node(s) didn't match node selector.
  Normal   Scheduled         1m               default-scheduler            Successfully assigned default/router-1-4kq8j to infra1.example.com

pod should not come to running state if requested ports are not available - its confusing as state changes to running & even docker ps shows container is running ( pod should be in failed state if cant get ports )
[root@master ~]#  oc get pods -o wide -n default | grep -i router
router-1-4kq8j             1/1       Running   0          21m       192.168.122.15   infra1.example.com   <none>
router-1-dnm7j             1/1       Running   3          15h       192.168.122.14   infra.example.com    <none>

[root@infra1 ~]# docker ps -a | grep -i route
208feb1113be        faa4d9ee67c4                                             "/usr/bin/openshif..."   22 minutes ago      Up 22 minutes                                   k8s_router_router-1-4kq8j_default_62a619da-08af-11e9-af02-525400f7e27c_0
7fc75b7510da        registry.access.redhat.com/openshift3/ose-pod:v3.11.51   "/usr/bin/pod"           22 minutes ago      Up 22 minutes                                   k8s_POD_router-1-4kq8j_default_62a619da-08af-11e9-af02-525400f7e27c_0

Comment 10 errata-xmlrpc 2019-01-30 15:19:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0096

Comment 11 Sergio G. 2019-08-28 09:02:45 UTC
Was this backported to 3.10 ?