Bug 1324888

Summary: after scaling a replicationController, pods get scheduled to nodes with SchedulingDisabled
Product: OpenShift Container Platform Reporter: Christoph Görn <cgoern>
Component: NodeAssignee: Jan Chaloupka <jchaloup>
Status: CLOSED NOTABUG QA Contact: DeShuai Ma <dma>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.1.0CC: agoldste, aos-bugs, jokerman, mmccomas
Target Milestone: ---Keywords: UpcomingRelease
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-12 06:21:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Christoph Görn 2016-04-07 13:50:09 UTC
Description of problem:
After a node is marked as unschedulable and pods have been evacuated, scaling (up or down) of a replicationController results in pods getting schedules on the node that is unschedulable.

Version-Release number of selected component (if applicable):
atomic-openshift-3.1.0.4-1.git.15.5e061c3.el7aos.x86_64

How reproducible:
evacuate pods from a node, scale a replicationController

Steps to Reproduce:
[root@test-master-0 ~]# oc -n testing-2 get pods -o wide 
NAME                             READY     STATUS    RESTARTS   AGE       NODE
mongodb-1-0zxcg                  1/1       Running   0          1m        test-node-primary-0.example.com
mongodb-1-60t65                  1/1       Running   0          46s       test-node-primary-2.example.com
mongodb-1-ru4ru                  1/1       Running   0          1m        test-node-primary-2.example.com
mongodb-1-wk558                  1/1       Running   0          17h       test-node-primary-1.example.com
nodejs-mongodb-example-6-6mmze   1/1       Running   2          20h       test-node-primary-1.example.com
nodejs-mongodb-example-6-e6j0w   1/1       Running   0          3m        test-node-primary-0.example.com

[root@test-master-0 ~]# oadm manage-node test-node-primary-2.example.com --schedulable=false
NAME                                             LABELS                                                                                              STATUS                     AGE
test-node-primary-2.example.com   kubernetes.io/hostname=test-node-primary-2.example.com,region=primary,zone=default   Ready,SchedulingDisabled   16d

[root@test-master-0 ~]# oadm manage-node test-node-primary-2.example.com --evacuate 

Migrating these pods on node: test-node-primary-2.example.com

NAME              READY     STATUS    RESTARTS   AGE
mongodb-1-60t65   1/1       Running   0          1m
mongodb-1-ru4ru   1/1       Running   0         2m

[root@test-master-0 ~]# oc -n testing-2 get pods -o wide 
NAME                             READY     STATUS    RESTARTS   AGE       NODE
mongodb-1-0zxcg                  1/1       Running   0          2m        test-node-primary-0.example.com
mongodb-1-g5o4r                  1/1       Running   0          22s       test-node-primary-1.example.com
mongodb-1-rxrdb                  1/1       Running   0          22s       test-node-primary-0.example.com
mongodb-1-wk558                  1/1       Running   0          17h       test-node-primary-1.example.com
nodejs-mongodb-example-6-6mmze   1/1       Running   2          20h       test-node-primary-1.example.com
nodejs-mongodb-example-6-e6j0w   1/1       Running   0          4m        test-node-primary-0.example.com
[root@test-master-0 ~]# oc -n testing-2 scale --replicas=3 rc mongodb-1
replicationcontroller "mongodb-1" scaled

[root@test-master-0 ~]# oc -n testing-2 get pods -o wide 
NAME                             READY     STATUS    RESTARTS   AGE       NODE
mongodb-1-rxrdb                  1/1       Running   0          1m        test-node-primary-0.example.com
mongodb-1-v4l7b                  1/1       Running   0          26s       test-node-primary-2.example.com
mongodb-1-wk558                  1/1       Running   0          17h       test-node-primary-1.example.com
nodejs-mongodb-example-6-6mmze   1/1       Running   2          20h       test-node-primary-1.example.com
nodejs-mongodb-example-6-e6j0w   1/1       Running   0          4m        test-node-primary-0.example.com

[root@test-master-0 ~]# oc get nodes
NAME                                             LABELS                                                                                              STATUS                     AGE
test-master-0.example.com         kubernetes.io/hostname=test-master-0.example.com,region=master,zone=default          Ready,SchedulingDisabled   16d
test-node-infra-0.example.com     kubernetes.io/hostname=test-node-infra-0.example.com,region=infra,zone=default       Ready                      16d
test-node-infra-1.example.com     kubernetes.io/hostname=test-node-infra-1.example.com,region=infra,zone=default       Ready                      16d
test-node-primary-0.example.com   kubernetes.io/hostname=test-node-primary-0.example.com,region=primary,zone=default   Ready                      16d
test-node-primary-1.example.com   kubernetes.io/hostname=test-node-primary-1.example.com,region=primary,zone=default   Ready                      16d
test-node-primary-2.example.com   kubernetes.io/hostname=test-node-primary-2.example.com,region=primary,zone=default   Ready,SchedulingDisabled   16d


Actual results:
pods get schedule on test-node-primary-2.example.com

Expected results:
no pod scheduled on test-node-primary-2.example.com

Additional info:

[root@test-master-0 ~]# oc -n testing-2 describe pod mongodb-1-v4l7b
Name:				mongodb-1-v4l7b
Namespace:			testing-2
Image(s):			registry.access.redhat.com/rhscl/mongodb-26-rhel7:latest
Node:				test-node-primary-2.example.com/10.19.0.249
Start Time:			Thu, 07 Apr 2016 07:56:06 +0000
Labels:				deployment=mongodb-1,deploymentconfig=mongodb,name=mongodb
Status:				Running
Reason:				
Message:			
IP:				10.1.5.70
Replication Controllers:	mongodb-1 (3/3 replicas created)
Containers:
  mongodb:
    Container ID:	docker://65299d2d4e6362dc7c8ff09c6b9fa02a7b95809f798bd76b4e8bd92061bd20f8
    Image:		registry.access.redhat.com/rhscl/mongodb-26-rhel7:latest
    Image ID:		docker://19c92ed464ccfaa085af8ed8cca18edfa242c337c9fcab6c9c7dd8b5cb2b9c3c
    QoS Tier:
      cpu:	BestEffort
      memory:	Guaranteed
    Limits:
      memory:	512Mi
    Requests:
      memory:		512Mi
    State:		Running
      Started:		Thu, 07 Apr 2016 07:56:09 +0000
    Ready:		True
    Restart Count:	0
    Environment Variables:
      MONGODB_USER:		userLUV
      MONGODB_PASSWORD:		SgNJjwXpFoaO2Cm2
      MONGODB_DATABASE:		sampledb
      MONGODB_ADMIN_PASSWORD:	YgfBUE2gH4CpNKNc
Conditions:
  Type		Status
  Ready 	True 
Volumes:
  default-token-uu427:
    Type:	Secret (a secret that should populate this volume)
    SecretName:	default-token-uu427
Events:
  FirstSeen	LastSeen	Count	From								SubobjectPath				Reason			Message
  ─────────	────────	─────	────								─────────────				──────			───────
  10m		10m		1	{scheduler }												Scheduled		Successfully assigned mongodb-1-v4l7b to test-node-primary-2.example.com
  7m		7m		1	{scheduler }												FailedScheduling	Failed for reason Region and possibly others
  7m		7m		1	{kubelet test-node-primary-2.example.com}	implicitly required container POD	Pulled			Container image "openshift3/ose-pod:v3.1.0.4" already present on machine
  7m		7m		1	{kubelet test-node-primary-2.example.com}	implicitly required container POD	Started			Started with docker id 1ab9cda791de
  7m		7m		1	{kubelet test-node-primary-2.example.com}	implicitly required container POD	Created			Created with docker id 1ab9cda791de
  6m		6m		1	{kubelet test-node-primary-2.example.com}	spec.containers{mongodb}		Pulled			Container image "registry.access.redhat.com/rhscl/mongodb-26-rhel7:latest" already present on machine
  6m		6m		1	{kubelet test-node-primary-2.example.com}	spec.containers{mongodb}		Created			Created with docker id 65299d2d4e63
  6m		6m		1	{kubelet test-node-primary-2.example.com}	spec.containers{mongodb}		Started			Started with docker id 65299d2d4e63

Comment 1 Andy Goldstein 2016-04-08 18:45:18 UTC
I haven't been able to reproduce this.

Comment 2 Christoph Görn 2016-04-12 06:21:12 UTC
As the environment has died, I cant replicate. Will reopen if I got it replicated on a new environment. Thanks for the help!

Comment 3 Jan Chaloupka 2016-04-12 17:20:56 UTC
Trying to reproduce the issue on origin v1.1.6 and ose v3.1.1.6. With the same result not being able to reproduce it.