Description of problem: Pod is rescheduled due to some conflict, which cause local volume failed to mount. Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-07-28-222114 local-storage-operator.v4.2.0 How reproducible: Not sure Steps to Reproduce: 1.Deploy local-storage-operator with several disks. 2.Create pod and pvc. 3.Check the pod Actual results: Pod in ContainerCreating status. Expected results: Pod is up and running with local volume. Additional info: $ oc describe pod Name: mypod Namespace: 6bgu5 Priority: 0 PriorityClassName: <none> Node: qe-lxia-0728-222114-d82x6-worker-centralus2-g79gb/10.0.32.4 Start Time: Wed, 31 Jul 2019 13:38:24 +0800 Labels: <none> Annotations: openshift.io/scc: restricted Status: Pending IP: Containers: mycontainer: Container ID: Image: aosqe/hello-openshift Image ID: Port: 80/TCP Host Port: 0/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-pdwkv (ro) Devices: /dev/myblock from myvolume Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: myvolume: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: mypvc ReadOnly: false default-token-pdwkv: Type: Secret (a volume populated by a Secret) SecretName: default-token-pdwkv Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 2m21s default-scheduler Operation cannot be fulfilled on persistentvolumes "local-pv-7f58a50f": the object has been modified; please apply your changes to the latest version and try again Warning FailedScheduling 2m21s default-scheduler AssumePod failed: pod 6926f0bd-b355-11e9-8232-000d3a92e41c is in the cache, so can't be assumed Warning FailedScheduling 2m20s default-scheduler pod "6bgu5/mypod" does not exist any more Warning FailedScheduling 2m14s (x2 over 2m14s) default-scheduler AssumePod failed: pod 6926f0bd-b355-11e9-8232-000d3a92e41c is in the cache, so can't be assumed Warning FailedScheduling 2m13s (x2 over 2m14s) default-scheduler AssumePod failed: pod 6926f0bd-b355-11e9-8232-000d3a92e41c is in the cache, so can't be assumed Normal Scheduled 2m13s default-scheduler Successfully assigned 6bgu5/mypod to qe-lxia-0728-222114-d82x6-worker-centralus2-g79gb Warning FailedScheduling 2m13s default-scheduler Binding rejected: Operation cannot be fulfilled on pods/binding "mypod": pod mypod is already assigned to node "qe-lxia-0728-222114-d82x6-worker-centralus2-g79gb" Warning FailedMount 2m9s (x25 over 2m12s) kubelet, qe-lxia-0728-222114-d82x6-worker-centralus2-g79gb MapVolume.NodeAffinity check failed for volume "local-pv-158dfe47" : No matching NodeSelectorTerms $ oc get pv local-pv-158dfe47 local-pv-7f58a50f -o yaml apiVersion: v1 items: - apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/provisioned-by: local-volume-provisioner-qe-lxia-0728-222114-d82x6-worker-centralus1-hnlcx-740f8302-b1a4-11e9-9ac3-000d3a92e440 creationTimestamp: "2019-07-31T06:54:16Z" finalizers: - kubernetes.io/pv-protection labels: storage.openshift.com/local-volume-owner-name: local-disks storage.openshift.com/local-volume-owner-namespace: local-storage name: local-pv-158dfe47 resourceVersion: "910222" selfLink: /api/v1/persistentvolumes/local-pv-158dfe47 uid: 033566b5-b360-11e9-a378-000d3a92e02d spec: accessModes: - ReadWriteOnce capacity: storage: 2Gi local: path: /mnt/local-storage/local-block-sc/sdd nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - qe-lxia-0728-222114-d82x6-worker-centralus1-hnlcx persistentVolumeReclaimPolicy: Delete storageClassName: local-block-sc volumeMode: Block status: phase: Available - apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/provisioned-by: local-volume-provisioner-qe-lxia-0728-222114-d82x6-worker-centralus2-g79gb-8790b985-b1a4-11e9-bde8-000d3a92e02d creationTimestamp: "2019-07-31T06:53:25Z" finalizers: - kubernetes.io/pv-protection labels: storage.openshift.com/local-volume-owner-name: local-disks storage.openshift.com/local-volume-owner-namespace: local-storage name: local-pv-7f58a50f resourceVersion: "909927" selfLink: /api/v1/persistentvolumes/local-pv-7f58a50f uid: e50c7cfa-b35f-11e9-a378-000d3a92e02d spec: accessModes: - ReadWriteOnce capacity: storage: 1Gi local: path: /mnt/local-storage/local-block-sc/sdd nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - qe-lxia-0728-222114-d82x6-worker-centralus2-g79gb persistentVolumeReclaimPolicy: Delete storageClassName: local-block-sc volumeMode: Block status: phase: Available kind: List metadata: resourceVersion: "" selfLink: "" $ oc get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS qe-lxia-0728-222114-d82x6-master-0 Ready master 2d5h v1.14.0+2e9d4a117 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D4s_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=centralus,failure-domain.beta.kubernetes.io/zone=centralus-2,kubernetes.io/arch=amd64,kubernetes.io/hostname=qe-lxia-0728-222114-d82x6-master-0,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos qe-lxia-0728-222114-d82x6-master-1 Ready master 2d5h v1.14.0+2e9d4a117 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D4s_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=centralus,failure-domain.beta.kubernetes.io/zone=centralus-1,kubernetes.io/arch=amd64,kubernetes.io/hostname=qe-lxia-0728-222114-d82x6-master-1,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos qe-lxia-0728-222114-d82x6-master-2 Ready master 2d5h v1.14.0+2e9d4a117 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D4s_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=centralus,failure-domain.beta.kubernetes.io/zone=centralus-3,kubernetes.io/arch=amd64,kubernetes.io/hostname=qe-lxia-0728-222114-d82x6-master-2,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos qe-lxia-0728-222114-d82x6-worker-centralus1-hnlcx Ready worker 2d5h v1.14.0+2e9d4a117 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D2s_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=centralus,failure-domain.beta.kubernetes.io/zone=centralus-1,kubernetes.io/arch=amd64,kubernetes.io/hostname=qe-lxia-0728-222114-d82x6-worker-centralus1-hnlcx,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos qe-lxia-0728-222114-d82x6-worker-centralus2-g79gb Ready worker 2d5h v1.14.0+2e9d4a117 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D2s_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=centralus,failure-domain.beta.kubernetes.io/zone=centralus-2,kubernetes.io/arch=amd64,kubernetes.io/hostname=qe-lxia-0728-222114-d82x6-worker-centralus2-g79gb,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos
The project is clean-up by our automation framework, so I don't have the bound PVC YAML. Provide the YAML which is used to create the PVC. --- { "kind": "PersistentVolumeClaim", "apiVersion": "v1", "metadata": { "name": "mypvc" }, "spec": { "accessModes": [ "ReadWriteOnce" ], "volumeMode": "Block", "resources": { "requests": { "storage": "1Gi" } } } }
Maybe it looks like as below. (Got from another project, might be different). $ oc get pvc mypvc -o yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" creationTimestamp: "2019-07-31T08:25:27Z" finalizers: - kubernetes.io/pvc-protection labels: name: dynamic-pvc name: mypvc namespace: t5 resourceVersion: "936299" selfLink: /api/v1/namespaces/t5/persistentvolumeclaims/mypvc uid: c0676594-b36c-11e9-8232-000d3a92e41c spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: local-block-sc volumeMode: Block volumeName: local-pv-158dfe47 status: accessModes: - ReadWriteOnce capacity: storage: 2Gi phase: Bound
Got this reproduced. $ oc get pvc,pod,pv NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/mypvc Bound local-pv-7f58a50f 1Gi RWO local-block-sc 136m NAME READY STATUS RESTARTS AGE pod/mypod 0/1 ContainerCreating 0 137m NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE persistentvolume/local-pv-158dfe47 2Gi RWO Delete Available local-block-sc 136m persistentvolume/local-pv-7f58a50f 1Gi RWO Delete Bound 1i893/mypvc local-block-sc 21h persistentvolume/local-pv-c692d3f2 2Gi RWO Delete Available local-filesystem-sc 22h persistentvolume/local-pv-f012ba9e 1Gi RWO Delete Available local-filesystem-sc 24h $ oc describe pod Name: mypod Namespace: 1i893 Priority: 0 PriorityClassName: <none> Node: qe-lxia-0728-222114-d82x6-worker-centralus1-hnlcx/10.0.32.5 Start Time: Thu, 01 Aug 2019 11:37:59 +0800 Labels: <none> Annotations: openshift.io/scc: restricted Status: Pending IP: Containers: mycontainer: Container ID: Image: aosqe/hello-openshift Image ID: Port: 80/TCP Host Port: 0/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-tktlc (ro) Devices: /dev/myblock from myvolume Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: myvolume: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: mypvc ReadOnly: false default-token-tktlc: Type: Secret (a volume populated by a Secret) SecretName: default-token-tktlc Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 137m (x2 over 137m) default-scheduler AssumePod failed: pod ba98a0cd-b40d-11e9-a378-000d3a92e02d is in the cache, so can't be assumed Warning FailedScheduling 137m default-scheduler Binding rejected: Operation cannot be fulfilled on pods/binding "mypod": pod mypod is already assigned to node "qe-lxia-0728-222114-d82x6-worker-centralus1-hnlcx" Warning FailedScheduling 137m default-scheduler Operation cannot be fulfilled on persistentvolumes "local-pv-158dfe47": the object has been modified; please apply your changes to the latest version and try again Warning FailedScheduling 137m default-scheduler AssumePod failed: pod ba98a0cd-b40d-11e9-a378-000d3a92e02d is in the cache, so can't be assumed Warning FailedScheduling 137m (x2 over 137m) default-scheduler AssumePod failed: pod ba98a0cd-b40d-11e9-a378-000d3a92e02d is in the cache, so can't be assumed Warning FailedScheduling 137m default-scheduler pod "1i893/mypod" does not exist any more Normal Scheduled 137m default-scheduler Successfully assigned 1i893/mypod to qe-lxia-0728-222114-d82x6-worker-centralus1-hnlcx Warning FailedMount 2m30s (x80214 over 137m) kubelet, qe-lxia-0728-222114-d82x6-worker-centralus1-hnlcx MapVolume.NodeAffinity check failed for volume "local-pv-7f58a50f" : No matching NodeSelectorTerms
Typo in last message - "The entire point of pods that use local-storage is that they can't be rescheduled on another Node. I am curious how did we manage to reschedule them!"
I got this to reproduce in my own Amazon cluster too. Getting multiple schedulers to start acting as "leaders" (even if they dont have leader lock) requires loss of connectivity from scheduler to api-server. The loss of connection to api-server causes leader lock lease to move around. For example: # When cluster first came up scheduler A --> waiting on lease lock scheduler B --> has leader lock scheduler C --> waiting on lease lock # loss of connectivity to api-server scheduler A --> New leader Scheduler B --> lost leader lock (because of api-server connectivity) scheduler C --> waiting on lease lock At this point, both scheduler A and scheduler B can schedule pods. Because scheduler A is the new leader but scheduler B although is not the current leader, its `Run` loop is still running. It is still watching pods/pvcs etc and doing everything a scheduler does. Unlike controller-managers (https://github.com/kubernetes/kubernetes/blob/master/cmd/kube-controller-manager/app/controllermanager.go#L282) , schedulers don't crash when they loose leader lock (https://github.com/kubernetes/kubernetes/blob/master/cmd/kube-scheduler/app/server.go#L265) .
Upstream PR: https://github.com/kubernetes/kubernetes/pull/81306
Verified with: 4.2.0-0.nightly-2019-08-29-170426 No multiple schedulers are in active status.
*** Bug 1734612 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922