Bug 1339975

Summary:	NoDiskConflicts is not enforced when pod is created with Persistent Volume Claim instead of rbd image
Product:	OpenShift Container Platform	Reporter:	Jianwei Hou <jhou>
Component:	Documentation	Assignee:	Vikram Goyal <vigoyal>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Vikram Goyal <vigoyal>
Severity:	medium	Docs Contact:	Vikram Goyal <vigoyal>
Priority:	medium
Version:	3.2.0	CC:	aos-bugs, bchilds, eparis, jhou, jokerman, kalexand, lxia, mmccomas
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-04-24 14:58:51 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jianwei Hou 2016-05-26 09:29:40 UTC

Description of problem:
A pod can use an RBD image by specifying it in pod.volumes.rbd section, or by requesting a Persistent Volume Claim. When two pods are created with a same rbd image, NoDiskConflicts prevents the second one from scheduling. But if two pods are created with one same PVC which bounds to a same PV(and same rbd image), both of the pods are created. This can not protect the rbd image

Version-Release number of selected component (if applicable):
openshift v3.2.0.44
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5

How reproducible:
Always

Steps to Reproduce:
1. Setup ceph rbd server
2. Create PV/PVC and rbd secret
oc create -f https://raw.githubusercontent.com/openshift-qe/docker-rbd/master/pv-rwo.json
oc create -f https://raw.githubusercontent.com/openshift-qe/docker-rbd/master/pvc-rwo.json
oc create -f https://raw.githubusercontent.com/openshift-qe/docker-rbd/master/rbd-secret.yaml
3. Create two pods
Here I created two pods with different names using same template https://raw.githubusercontent.com/openshift-qe/docker-rbd/master/pod.json
4. oc get pods

Actual results:
Both rbd pods are running

Expected results:
Maybe we want to support rbd fencing for pods created using PVC as well.

Additional info:

Comment 1 hchen 2016-06-16 16:47:29 UTC

Do you know where the two Pods are running? If they are running on the same host, you have to set NoDiskConflict predicate for kube scheduler to ensure only one is using the RBD image.

I tested your pv/pvc/pod on my setup, and I see the lock is held once Pod is created, the lock is deleted after Pod is removed

[root@host10-rack07 kubernetes]# kubectl create -f rbd-pod.json 
pod "rbd" created
[root@host10-rack07 kubernetes]# rbd lock list foo -p kube
There is 1 exclusive lock on this image.
Locker       ID                                                                      Address             
client.14130 kubelet_lock_magic_host10-rack07.scale.openstack.engineering.redhat.com 10.1.4.10:0/1005456 
[root@host10-rack07 kubernetes]# kubectl describe pod
Name:		rbd
Namespace:	default
Node:		127.0.0.1/127.0.0.1
Start Time:	Thu, 16 Jun 2016 16:45:46 +0000
Labels:		name=rbd
Status:		Running
IP:		172.17.0.2
Controllers:	<none>
Containers:
  rbd:
    Container ID:	docker://8eeb97ba498c5e14b846681ca97c2b6a1718a6dc68c4e72625178722bc16a94c
    Image:		aosqe/hello-openshift
    Image ID:		docker://sha256:caa46d03cf599cd2e98f40accd8256efa362e2212e70a903beb5b6380d2c461c
    Port:		
    QoS Tier:
      cpu:		BestEffort
      memory:		BestEffort
    State:		Running
      Started:		Thu, 16 Jun 2016 16:45:48 +0000
    Ready:		True
    Restart Count:	0
    Environment Variables:
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	True 
  PodScheduled 	True 
Volumes:
  rbd:
    Type:	PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:	pvc-rbd-server
    ReadOnly:	false
  default-token-1015v:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-1015v
Events:
  FirstSeen	LastSeen	Count	From			SubobjectPath		Type		Reason		Message
  ---------	--------	-----	----			-------------		--------	------		-------
  13s		13s		1	{default-scheduler }				Normal		Scheduled	Successfully assigned rbd to 127.0.0.1
  11s		11s		1	{kubelet 127.0.0.1}	spec.containers{rbd}	Normal		Pulled		Container image "aosqe/hello-openshift" already present on machine
  11s		11s		1	{kubelet 127.0.0.1}	spec.containers{rbd}	Normal		Created		Created container with docker id 8eeb97ba498c
  11s		11s		1	{kubelet 127.0.0.1}	spec.containers{rbd}	Normal		Started		Started container with docker id 8eeb97ba498c


[root@host10-rack07 kubernetes]# kubectl delete -f rbd-pod.json 
pod "rbd" deleted
[root@host10-rack07 kubernetes]# rbd lock list foo -p kube
[root@host10-rack07 kubernetes]#

Comment 2 hchen 2016-06-16 16:49:20 UTC

My kubernetes version is latest master code

Comment 3 Jianwei Hou 2016-07-26 02:27:22 UTC

Tested on
oc v3.3.0.9
kubernetes v1.3.0+57fb9ac

I have configured only one schedulable node, so both pods are running on the same host.

Pod 1
```
Name:		rbd
Namespace:	jhou
Node:		host-8-172-115.host.centralci.eng.rdu2.redhat.com/172.16.120.39
Start Time:	Tue, 26 Jul 2016 10:13:16 +0800
Labels:		name=rbd
Status:		Running
IP:		10.1.1.4
Controllers:	<none>
Containers:
  rbd:
    Container ID:	docker://e5c71fa8b8131b61d2cdb20746ea230a50ae1093dbc75c479b83934a1ca54753
    Image:		aosqe/hello-openshift
    Image ID:		docker://sha256:caa46d03cf599cd2e98f40accd8256efa362e2212e70a903beb5b6380d2c461c
    Port:		
    QoS Tier:
      cpu:		BestEffort
      memory:		BestEffort
    State:		Running
      Started:		Tue, 26 Jul 2016 10:13:19 +0800
    Ready:		True
    Restart Count:	0
    Environment Variables:
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	True 
  PodScheduled 	True 
Volumes:
  rbd:
    Type:	PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:	pvc-rbd-server
    ReadOnly:	false
  default-token-cvpbc:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-cvpbc
Events:
  FirstSeen	LastSeen	Count	From								SubobjectPath		Type		Reason		Message
  ---------	--------	-----	----								-------------		--------	------		-------
  6m		6m		1	{default-scheduler }									Normal		Scheduled	Successfully assigned rbd to host-8-172-115.host.centralci.eng.rdu2.redhat.com
  6m		6m		1	{kubelet host-8-172-115.host.centralci.eng.rdu2.redhat.com}	spec.containers{rbd}	Normal		Pulled		Container image "aosqe/hello-openshift" already present on machine
  6m		6m		1	{kubelet host-8-172-115.host.centralci.eng.rdu2.redhat.com}	spec.containers{rbd}	Normal		Created		Created container with docker id e5c71fa8b813
  6m		6m		1	{kubelet host-8-172-115.host.centralci.eng.rdu2.redhat.com}	spec.containers{rbd}	Normal		Started		Started container with docker id e5c71fa8b813
```

Pod 2
```
0 % oc describe pod rbd1
Name:		rbd1
Namespace:	jhou
Node:		host-8-172-115.host.centralci.eng.rdu2.redhat.com/172.16.120.39
Start Time:	Tue, 26 Jul 2016 10:14:37 +0800
Labels:		name=rbd
Status:		Running
IP:		10.1.1.6
Controllers:	<none>
Containers:
  rbd:
    Container ID:	docker://593ad1fd6480d752fdcf30958fa77b7e353969edc0087965222b77a519b7fc60
    Image:		aosqe/hello-openshift
    Image ID:		docker://sha256:caa46d03cf599cd2e98f40accd8256efa362e2212e70a903beb5b6380d2c461c
    Port:		
    QoS Tier:
      memory:		BestEffort
      cpu:		BestEffort
    State:		Running
      Started:		Tue, 26 Jul 2016 10:14:39 +0800
    Ready:		True
    Restart Count:	0
    Environment Variables:
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	True 
  PodScheduled 	True 
Volumes:
  rbd:
    Type:	PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:	pvc-rbd-server
    ReadOnly:	false
  default-token-cvpbc:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-cvpbc
Events:
  FirstSeen	LastSeen	Count	From								SubobjectPath		Type		Reason		Message
  ---------	--------	-----	----								-------------		--------	------		-------
  6m		6m		1	{default-scheduler }									Normal		Scheduled	Successfully assigned rbd1 to host-8-172-115.host.centralci.eng.rdu2.redhat.com
  6m		6m		1	{kubelet host-8-172-115.host.centralci.eng.rdu2.redhat.com}	spec.containers{rbd}	Normal		Pulled		Container image "aosqe/hello-openshift" already present on machine
  6m		6m		1	{kubelet host-8-172-115.host.centralci.eng.rdu2.redhat.com}	spec.containers{rbd}	Normal		Created		Created container with docker id 593ad1fd6480
  6m		6m		1	{kubelet host-8-172-115.host.centralci.eng.rdu2.redhat.com}	spec.containers{rbd}	Normal		Started		Started container with docker id 593ad1fd6480
```

On the OSE env, the 'NodiskConflict' is enabled by default.
```
[root@host-8-172-116 ~]# cat /etc/origin/master/scheduler.json | python -m json.tool
{
    "apiVersion": "v1",
    "kind": "Policy",
    "predicates": [
        {
            "name": "MatchNodeSelector"
        },
        {
            "name": "PodFitsResources"
        },
        {
            "name": "PodFitsPorts"
        },
        {
            "name": "NoDiskConflict"
        },
        {
            "argument": {
                "serviceAffinity": {
                    "labels": [
                        "region"
                    ]
                }
            },
            "name": "Region"
        }
    ],
    "priorities": [
        {
            "name": "LeastRequestedPriority",
            "weight": 1
        },
        {
            "name": "SelectorSpreadPriority",
            "weight": 1
        },
        {
            "argument": {
                "serviceAntiAffinity": {
                    "label": "zone"
                }
            },
            "name": "Zone",
            "weight": 2
        }
    ]
}
```

The lock is held when the pod is created

[root@host-8-172-115 /]# rbd lock list foo        
There is 1 exclusive lock on this image.
Locker      ID                                                                   Address                 
client.4143 kubelet_lock_magic_host-8-172-115.host.centralci.eng.rdu2.redhat.com 172.16.120.39:0/1032569

Comment 4 hchen 2016-07-26 14:02:40 UTC

Fencing is used to exclude multiple node accessing the same rbd image. For single node, fencing is not working for single node, You have to set NoDiskConflicts scheduler predicate https://trello.com/c/rrSzOx20/39-13-support-nodiskconflicts-scheduler-predicates-for-rbd-origin

Comment 5 Jianwei Hou 2016-07-27 09:00:36 UTC

Sorry, I described it incorrectly. It is NoDiskConflicts scheduler predicate that was not working when pod references PVC.

In my env, I have 1 schedulable node, creating 2 pods referencing same rbd image in their pod.spec.volumes.rbd, the 2nd pod can't schedule because NoDiskConflicts is enabled. Creating 2 pods that referencing same PVC(which references same PV thus same rbd image), both pods are scheduled and created.

Comment 6 hchen 2016-07-27 13:28:32 UTC

I don't think NoDiskConflicts works for PVC of any volumes (EBS/PD/RBD/Cinder). I'll open an kubernetes issue.

Comment 7 hchen 2016-07-28 15:26:05 UTC

kube issue filed https://github.com/kubernetes/kubernetes/issues/29670

Comment 9 Eric Paris 2016-08-15 18:12:21 UTC

https://github.com/kubernetes/kubernetes/issues/26567

Comment 12 hchen 2016-08-17 21:23:27 UTC

Kube doc update is here https://github.com/kubernetes/kubernetes/pull/30817

Openshift doc PR is here https://github.com/openshift/openshift-docs/pull/2672