Bug 1327520

Summary: Pod should not disappear unexpectedly after created with non-exist nodeName
Product: OpenShift Container Platform Reporter: Weihua Meng <wmeng>
Component: NodeAssignee: Andy Goldstein <agoldste>
Status: CLOSED NOTABUG QA Contact: DeShuai Ma <dma>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.0CC: aos-bugs, decarr, jokerman, mmccomas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-15 20:49:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Weihua Meng 2016-04-15 09:52:58 UTC
Description of problem:
Pod disappeared unexpectedly after created.

Version-Release number of selected component (if applicable):
openshift v3.2.0.15
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5

How reproducible:
Always

Steps to Reproduce:
1. make a pod json file hello-openshift-nodename.json with a not-exist-name indicated in nodename. 
2. oc create -f hello-openshift-nodename.json ; oc get pod ; oc describe pod ; oc get pod ; oc get pod 


Actual results:
pod created.
pod disappeard.

[root@dhcp-128-70 test]# oc create -f hello-openshift-nodename.json ; oc get pod ; oc describe pod ; oc get pod ; oc get pod
pod "hello-openshift" created
NAME              READY     STATUS    RESTARTS   AGE
hello-openshift   0/1       Pending   0          <invalid>
Name:		hello-openshift
Namespace:	wmeng01
Node:		name-not-exist/
Labels:		name=hello-openshift
Status:		Pending
IP:		
Controllers:	<none>
Containers:
  hello-openshift:
    Image:	deshuai/hello-openshift:root
    Port:	8080/TCP
    QoS Tier:
      cpu:	BestEffort
      memory:	BestEffort
    Environment Variables:
Volumes:
  tmp:
    Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:	
  default-token-icbwq:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-icbwq
No events.

[root@dhcp-128-70 test]#

Expected results:
pod should NOT disappear after created.

Additional info:
cat hello-openshift-nodename.json 
{
  "kind": "Pod",
  "apiVersion": "v1",
  "metadata": {
    "name": "hello-openshift",
    "creationTimestamp": null,
    "labels": {
      "name": "hello-openshift"
    }
  },
  "spec": {
    "containers": [
      {
        "name": "hello-openshift",
        "image": "deshuai/hello-openshift:root",
        "ports": [
          {
            "containerPort": 8080,
            "protocol": "TCP"
          }
        ],
        "resources": {},
        "volumeMounts": [
          {
            "name":"tmp",
            "mountPath":"/tmp"
          }
        ],
        "terminationMessagePath": "/dev/termination-log",
        "imagePullPolicy": "IfNotPresent",
        "capabilities": {},
        "securityContext": {
          "capabilities": {},
          "privileged": false
        }
      }
    ],
    "volumes": [
      {
        "name":"tmp",
        "emptyDir": {}
      }
    ],
    "restartPolicy": "Always",
    "nodeName": "name-not-exist",
    "dnsPolicy": "ClusterFirst",
    "serviceAccount": ""
  },
  "status": {}
}

Comment 1 Derek Carr 2016-04-15 20:49:13 UTC
This is working as designed in Kubernetes.

If the node controller observes a pod that is scheduled to a non-existent node, it will evict the pod from that node and delete it.  It does not matter if the pod was scheduled to a node that previously existed, or never existed, the logic is the same.

From a user standpoint, the current behavior is desired especially if that pod is backed by a replication controller.  The replication controller will think its satisfied the user's intent, but the pod is never able to actually run because its bound node does not exist.  As a result, it makes sense to evict those pods from the cluster, and force the replication controller to observe and create a new one.