Bug 1261448

Summary: The "PodFitsResources" predicate rule will only be applied to the 1st pod created in the project
Product: OKD Reporter: Meng Bo <bmeng>
Component: PodAssignee: Abhishek Gupta <abhgupta>
Status: CLOSED CURRENTRELEASE QA Contact: Meng Bo <bmeng>
Severity: high Docs Contact:
Priority: high    
Version: 3.xCC: aos-bugs, mmccomas, pweil
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1261712 (view as bug list) Environment:
Last Closed: 2015-11-23 21:15:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1261712    

Description Meng Bo 2015-09-09 10:55:07 UTC
Description of problem:
Set the scheduler works with PodFitsResources predicate rule,
Create pods which do not fit for the resource of the nodes. The 1st pod will fail due to the predicate rules, but the following pods with same resource requested can be created.

The issue can be reproduced on both latest Origin and latest OSE build.

Version-Release number of selected component (if applicable):
Origin:
openshift v1.0.5-301-ge61cc31
kubernetes v1.1.0-alpha.0-1605-g44c91b1

OSE:
openshift v3.0.1.900-185-g2f7757a
kubernetes v1.1.0-alpha.0-1605-g44c91b1


How reproducible:
always

Steps to Reproduce:
1. Set the scheduler.json as:
{
        "kind" : "Policy",
        "version" : "v1",
        "predicates" : [
                {"name" : "PodFitsPorts"},
                {"name" : "PodFitsResources"},
                {"name" : "NoDiskConflict"},
                {"name" : "MatchNodeSelector"},
                {"name" : "HostName"}
        ],
        "priorities" : [
                {"name" : "LeastRequestedPriority", "weight" : 1},
                {"name" : "BalancedResourceAllocation", "weight" : 1},
                {"name" : "ServiceSpreadingPriority", "weight" : 1}
        ]
}
and restart broker.

2. Check the node resource capability
# oc get node -o yaml | grep -e cpu -e memory
      cpu: "1"
      memory: 1884532Ki
      cpu: "2"
      memory: 1884420Ki
      cpu: "2"
      memory: 1884420Ki
      cpu: "1"
      memory: 1533300Ki

3. Create pod with following parameters
{
  "apiVersion":"v1",
  "kind": "Pod",
  "metadata": {
    "name": "resource-pod",
    "labels": {
      "name": "resource-pod"
    }
  },
  "spec": {
    "containers": [{
      "name": "resource-pod",
      "image": "openshift/hello-openshift",
      "resources":{
        "limits":{
          "cpu":1,
          "memory":2000000000      # 2GiB in total
        }
      }
    }]
  }
}

4. Create pod with above json and only change the name 



Actual results:
The 1st pod cannot pass the validation of the PodFitsResources predicate rule, but all the following pods can be created on nodes and run as normal.

Expected results:
All the pods created should be validated by the scheduler.



Additional info:
From the master log, the predicates.go will not be involved from the 2nd pod creation:

Sep 09 18:31:43 master.bmeng.local openshift[9078]: I0909 18:31:43.670308    9078 factory.go:216] About to try and schedule pod resource-pod
Sep 09 18:31:43 master.bmeng.local openshift[9078]: I0909 18:31:43.670328    9078 scheduler.go:119] Attempting to schedule: &{{ } {resource-pod  u1p2 /api/v1/namespac
es/u1p2/pods/resource-pod f7038900-56dd-11e5-b517-5254009c1bd6 11649 0 2015-09-09 18:31:43 +0800 CST <nil> map[name:resource-pod] map[openshift.io/scc:restricted]} {[
{default-token-9quuy {<nil> <nil> <nil> <nil> <nil> 0xc209c1e040 <nil> <nil> <nil> <nil> <nil> <nil> <nil>}}] [{resource-pod bmeng/hello-openshift [] []  [] [] {map[c
pu:{1.000 DecimalSI} memory:{800000000.000 DecimalSI}] map[cpu:{1.000 DecimalSI} memory:{800000000.000 DecimalSI}]} [{default-token-9quuy true /var/run/secrets/kubern
etes.io/serviceaccount}] <nil> <nil> <nil> /dev/termination-log IfNotPresent 0xc210148360 false false}] Always <nil> <nil> ClusterFirst map[] default  false [{default
-dockercfg-px49k}]} {Pending []     <nil> []}}
Sep 09 18:31:43 master.bmeng.local openshift[9078]: I0909 18:31:43.670481    9078 predicates.go:176] Schedule Pod &{{ } {resource-pod  u1p2 /api/v1/namespaces/u1p2/po
ds/resource-pod f7038900-56dd-11e5-b517-5254009c1bd6 11649 0 2015-09-09 18:31:43 +0800 CST <nil> map[name:resource-pod] map[openshift.io/scc:restricted]} {[{default-t
oken-9quuy {<nil> <nil> <nil> <nil> <nil> 0xc209c1e040 <nil> <nil> <nil> <nil> <nil> <nil> <nil>}}] [{resource-pod bmeng/hello-openshift [] []  [] [] {map[cpu:{1.000 
DecimalSI} memory:{800000000.000 DecimalSI}] map[memory:{800000000.000 DecimalSI} cpu:{1.000 DecimalSI}]} [{default-token-9quuy true /var/run/secrets/kubernetes.io/se
rviceaccount}] <nil> <nil> <nil> /dev/termination-log IfNotPresent 0xc210148360 false false}] Always <nil> <nil> ClusterFirst map[] default  false [{default-dockercfg
-px49k}]} {Pending []     <nil> []}} on Node node1.bmeng.local is allowed, Node is running only 1 out of 40 Pods.
Sep 09 18:31:43 master.bmeng.local openshift[9078]: I0909 18:31:43.670594    9078 predicates.go:176] Schedule Pod &{{ } {resource-pod  u1p2 /api/v1/namespaces/u1p2/po
ds/resource-pod f7038900-56dd-11e5-b517-5254009c1bd6 11649 0 2015-09-09 18:31:43 +0800 CST <nil> map[name:resource-pod] map[openshift.io/scc:restricted]} {[{default-t
oken-9quuy {<nil> <nil> <nil> <nil> <nil> 0xc209c1e040 <nil> <nil> <nil> <nil> <nil> <nil> <nil>}}] [{resource-pod bmeng/hello-openshift [] []  [] [] {map[cpu:{1.000 
DecimalSI} memory:{800000000.000 DecimalSI}] map[cpu:{1.000 DecimalSI} memory:{800000000.000 DecimalSI}]} [{default-token-9quuy true /var/run/secrets/kubernetes.io/se
rviceaccount}] <nil> <nil> <nil> /dev/termination-log IfNotPresent 0xc210148360 false false}] Always <nil> <nil> ClusterFirst map[] default  false [{default-dockercfg
-px49k}]} {Pending []     <nil> []}} on Node node2.bmeng.local is allowed, Node is running only 2 out of 40 Pods.
Sep 09 18:31:43 master.bmeng.local openshift[9078]: I0909 18:31:43.670686    9078 predicates.go:176] Schedule Pod &{{ } {resource-pod  u1p2 /api/v1/namespaces/u1p2/po
ds/resource-pod f7038900-56dd-11e5-b517-5254009c1bd6 11649 0 2015-09-09 18:31:43 +0800 CST <nil> map[name:resource-pod] map[openshift.io/scc:restricted]} {[{default-t
oken-9quuy {<nil> <nil> <nil> <nil> <nil> 0xc209c1e040 <nil> <nil> <nil> <nil> <nil> <nil> <nil>}}] [{resource-pod bmeng/hello-openshift [] []  [] [] {map[cpu:{1.000 
DecimalSI} memory:{800000000.000 DecimalSI}] map[cpu:{1.000 DecimalSI} memory:{800000000.000 DecimalSI}]} [{default-token-9quuy true /var/run/secrets/kubernetes.io/se
rviceaccount}] <nil> <nil> <nil> /dev/termination-log IfNotPresent 0xc210148360 false false}] Always <nil> <nil> ClusterFirst map[] default  false [{default-dockercfg
-px49k}]} {Pending []     <nil> []}} on Node node3.bmeng.local is allowed, Node is running only 1 out of 40 Pods.






Sep 09 18:31:53 master.bmeng.local openshift[9078]: I0909 18:31:53.046350    9078 factory.go:216] About to try and schedule pod resource-1-pod
Sep 09 18:31:53 master.bmeng.local openshift[9078]: I0909 18:31:53.046367    9078 scheduler.go:119] Attempting to schedule: &{{ } {resource-1-pod  u1p2 /api/v1/namesp
aces/u1p2/pods/resource-1-pod fc9a4114-56dd-11e5-b517-5254009c1bd6 11663 0 2015-09-09 18:31:53 +0800 CST <nil> map[name:resource-1-pod] map[openshift.io/scc:restricte
d]} {[{default-token-9quuy {<nil> <nil> <nil> <nil> <nil> 0xc20fd6e420 <nil> <nil> <nil> <nil> <nil> <nil> <nil>}}] [{resource-1-pod bmeng/hello-openshift [] []  [] [
] {map[] map[]} [{default-token-9quuy true /var/run/secrets/kubernetes.io/serviceaccount}] <nil> <nil> <nil> /dev/termination-log IfNotPresent 0xc20fd774a0 false false}] Always <nil> <nil> ClusterFirst map[] default  false [{default-dockercfg-px49k}]} {Pending []     <nil> []}}

Comment 1 Meng Bo 2015-09-10 02:50:11 UTC
Note:
The master log above is not when creating the pod which does not fit for the node resource, just to show the predicates.go will not be involved in the normal situation.

Comment 2 Paul Weil 2015-09-10 15:44:23 UTC
Please see my comment on https://bugzilla.redhat.com/show_bug.cgi?id=1261712#c2

Comment 3 Abhishek Gupta 2015-09-10 23:40:20 UTC
The clone of this bug was marked ON_QA and lowered in severity.

Comment 4 Meng Bo 2015-09-11 06:49:33 UTC
Hi Paul and Abhishek,

I cannot re-create this issue on both latest Origin and current OSE env now...
Not sure what's wrong when I reported this bug. I will close this bug and re-open it if I have a chance to meet it again.

Thanks.