Description of problem: A pod can use an RBD image by specifying it in pod.volumes.rbd section, or by requesting a Persistent Volume Claim. When two pods are created with a same rbd image, NoDiskConflicts prevents the second one from scheduling. But if two pods are created with one same PVC which bounds to a same PV(and same rbd image), both of the pods are created. This can not protect the rbd image Version-Release number of selected component (if applicable): openshift v3.2.0.44 kubernetes v1.2.0-36-g4a3f9c5 etcd 2.2.5 How reproducible: Always Steps to Reproduce: 1. Setup ceph rbd server 2. Create PV/PVC and rbd secret oc create -f https://raw.githubusercontent.com/openshift-qe/docker-rbd/master/pv-rwo.json oc create -f https://raw.githubusercontent.com/openshift-qe/docker-rbd/master/pvc-rwo.json oc create -f https://raw.githubusercontent.com/openshift-qe/docker-rbd/master/rbd-secret.yaml 3. Create two pods Here I created two pods with different names using same template https://raw.githubusercontent.com/openshift-qe/docker-rbd/master/pod.json 4. oc get pods Actual results: Both rbd pods are running Expected results: Maybe we want to support rbd fencing for pods created using PVC as well. Additional info:
Do you know where the two Pods are running? If they are running on the same host, you have to set NoDiskConflict predicate for kube scheduler to ensure only one is using the RBD image. I tested your pv/pvc/pod on my setup, and I see the lock is held once Pod is created, the lock is deleted after Pod is removed [root@host10-rack07 kubernetes]# kubectl create -f rbd-pod.json pod "rbd" created [root@host10-rack07 kubernetes]# rbd lock list foo -p kube There is 1 exclusive lock on this image. Locker ID Address client.14130 kubelet_lock_magic_host10-rack07.scale.openstack.engineering.redhat.com 10.1.4.10:0/1005456 [root@host10-rack07 kubernetes]# kubectl describe pod Name: rbd Namespace: default Node: 127.0.0.1/127.0.0.1 Start Time: Thu, 16 Jun 2016 16:45:46 +0000 Labels: name=rbd Status: Running IP: 172.17.0.2 Controllers: <none> Containers: rbd: Container ID: docker://8eeb97ba498c5e14b846681ca97c2b6a1718a6dc68c4e72625178722bc16a94c Image: aosqe/hello-openshift Image ID: docker://sha256:caa46d03cf599cd2e98f40accd8256efa362e2212e70a903beb5b6380d2c461c Port: QoS Tier: cpu: BestEffort memory: BestEffort State: Running Started: Thu, 16 Jun 2016 16:45:48 +0000 Ready: True Restart Count: 0 Environment Variables: Conditions: Type Status Initialized True Ready True PodScheduled True Volumes: rbd: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvc-rbd-server ReadOnly: false default-token-1015v: Type: Secret (a volume populated by a Secret) SecretName: default-token-1015v Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 13s 13s 1 {default-scheduler } Normal Scheduled Successfully assigned rbd to 127.0.0.1 11s 11s 1 {kubelet 127.0.0.1} spec.containers{rbd} Normal Pulled Container image "aosqe/hello-openshift" already present on machine 11s 11s 1 {kubelet 127.0.0.1} spec.containers{rbd} Normal Created Created container with docker id 8eeb97ba498c 11s 11s 1 {kubelet 127.0.0.1} spec.containers{rbd} Normal Started Started container with docker id 8eeb97ba498c [root@host10-rack07 kubernetes]# kubectl delete -f rbd-pod.json pod "rbd" deleted [root@host10-rack07 kubernetes]# rbd lock list foo -p kube [root@host10-rack07 kubernetes]#
My kubernetes version is latest master code
Tested on oc v3.3.0.9 kubernetes v1.3.0+57fb9ac I have configured only one schedulable node, so both pods are running on the same host. Pod 1 ``` Name: rbd Namespace: jhou Node: host-8-172-115.host.centralci.eng.rdu2.redhat.com/172.16.120.39 Start Time: Tue, 26 Jul 2016 10:13:16 +0800 Labels: name=rbd Status: Running IP: 10.1.1.4 Controllers: <none> Containers: rbd: Container ID: docker://e5c71fa8b8131b61d2cdb20746ea230a50ae1093dbc75c479b83934a1ca54753 Image: aosqe/hello-openshift Image ID: docker://sha256:caa46d03cf599cd2e98f40accd8256efa362e2212e70a903beb5b6380d2c461c Port: QoS Tier: cpu: BestEffort memory: BestEffort State: Running Started: Tue, 26 Jul 2016 10:13:19 +0800 Ready: True Restart Count: 0 Environment Variables: Conditions: Type Status Initialized True Ready True PodScheduled True Volumes: rbd: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvc-rbd-server ReadOnly: false default-token-cvpbc: Type: Secret (a volume populated by a Secret) SecretName: default-token-cvpbc Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 6m 6m 1 {default-scheduler } Normal Scheduled Successfully assigned rbd to host-8-172-115.host.centralci.eng.rdu2.redhat.com 6m 6m 1 {kubelet host-8-172-115.host.centralci.eng.rdu2.redhat.com} spec.containers{rbd} Normal Pulled Container image "aosqe/hello-openshift" already present on machine 6m 6m 1 {kubelet host-8-172-115.host.centralci.eng.rdu2.redhat.com} spec.containers{rbd} Normal Created Created container with docker id e5c71fa8b813 6m 6m 1 {kubelet host-8-172-115.host.centralci.eng.rdu2.redhat.com} spec.containers{rbd} Normal Started Started container with docker id e5c71fa8b813 ``` Pod 2 ``` 0 % oc describe pod rbd1 Name: rbd1 Namespace: jhou Node: host-8-172-115.host.centralci.eng.rdu2.redhat.com/172.16.120.39 Start Time: Tue, 26 Jul 2016 10:14:37 +0800 Labels: name=rbd Status: Running IP: 10.1.1.6 Controllers: <none> Containers: rbd: Container ID: docker://593ad1fd6480d752fdcf30958fa77b7e353969edc0087965222b77a519b7fc60 Image: aosqe/hello-openshift Image ID: docker://sha256:caa46d03cf599cd2e98f40accd8256efa362e2212e70a903beb5b6380d2c461c Port: QoS Tier: memory: BestEffort cpu: BestEffort State: Running Started: Tue, 26 Jul 2016 10:14:39 +0800 Ready: True Restart Count: 0 Environment Variables: Conditions: Type Status Initialized True Ready True PodScheduled True Volumes: rbd: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvc-rbd-server ReadOnly: false default-token-cvpbc: Type: Secret (a volume populated by a Secret) SecretName: default-token-cvpbc Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 6m 6m 1 {default-scheduler } Normal Scheduled Successfully assigned rbd1 to host-8-172-115.host.centralci.eng.rdu2.redhat.com 6m 6m 1 {kubelet host-8-172-115.host.centralci.eng.rdu2.redhat.com} spec.containers{rbd} Normal Pulled Container image "aosqe/hello-openshift" already present on machine 6m 6m 1 {kubelet host-8-172-115.host.centralci.eng.rdu2.redhat.com} spec.containers{rbd} Normal Created Created container with docker id 593ad1fd6480 6m 6m 1 {kubelet host-8-172-115.host.centralci.eng.rdu2.redhat.com} spec.containers{rbd} Normal Started Started container with docker id 593ad1fd6480 ``` On the OSE env, the 'NodiskConflict' is enabled by default. ``` [root@host-8-172-116 ~]# cat /etc/origin/master/scheduler.json | python -m json.tool { "apiVersion": "v1", "kind": "Policy", "predicates": [ { "name": "MatchNodeSelector" }, { "name": "PodFitsResources" }, { "name": "PodFitsPorts" }, { "name": "NoDiskConflict" }, { "argument": { "serviceAffinity": { "labels": [ "region" ] } }, "name": "Region" } ], "priorities": [ { "name": "LeastRequestedPriority", "weight": 1 }, { "name": "SelectorSpreadPriority", "weight": 1 }, { "argument": { "serviceAntiAffinity": { "label": "zone" } }, "name": "Zone", "weight": 2 } ] } ``` The lock is held when the pod is created [root@host-8-172-115 /]# rbd lock list foo There is 1 exclusive lock on this image. Locker ID Address client.4143 kubelet_lock_magic_host-8-172-115.host.centralci.eng.rdu2.redhat.com 172.16.120.39:0/1032569
Fencing is used to exclude multiple node accessing the same rbd image. For single node, fencing is not working for single node, You have to set NoDiskConflicts scheduler predicate https://trello.com/c/rrSzOx20/39-13-support-nodiskconflicts-scheduler-predicates-for-rbd-origin
Sorry, I described it incorrectly. It is NoDiskConflicts scheduler predicate that was not working when pod references PVC. In my env, I have 1 schedulable node, creating 2 pods referencing same rbd image in their pod.spec.volumes.rbd, the 2nd pod can't schedule because NoDiskConflicts is enabled. Creating 2 pods that referencing same PVC(which references same PV thus same rbd image), both pods are scheduled and created.
I don't think NoDiskConflicts works for PVC of any volumes (EBS/PD/RBD/Cinder). I'll open an kubernetes issue.
kube issue filed https://github.com/kubernetes/kubernetes/issues/29670
https://github.com/kubernetes/kubernetes/issues/26567
Kube doc update is here https://github.com/kubernetes/kubernetes/pull/30817 Openshift doc PR is here https://github.com/openshift/openshift-docs/pull/2672