Bug 1731059
| Summary: | Pod with persistent volumes failed scheduling on Azure due to volume node affinity conflict | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Liang Xia <lxia> |
| Component: | Storage | Assignee: | Jan Safranek <jsafrane> |
| Status: | CLOSED ERRATA | QA Contact: | Liang Xia <lxia> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.2.0 | CC: | abudavis, aos-bugs, aos-storage-staff, bchilds, chaoyang, jialiu, skordas |
| Target Milestone: | --- | ||
| Target Release: | 4.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-16 06:29:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
How did you create the cluster? By default, our installer creates one master and one node in each zone, so any PVC can be scheduled. I can see that you have only 5 nodes (3 masters, 2 nodes) instead of 6. Can you please check what happened to the 6th node? *** Bug 1732901 has been marked as a duplicate of this bug. *** Verified the issue has been fixed.
Tested on a cluster with 5 nodes ( 3 masters, 2 nodes), and create dynamic pvc/pod several times, pods are up and running with the volumes.
$ oc get co storage
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
storage 4.2.0-0.nightly-2019-07-28-222114 True False False 35m
$ oc get sc managed-premium -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: "true"
creationTimestamp: "2019-07-29T02:00:13Z"
name: managed-premium
ownerReferences:
- apiVersion: v1
kind: clusteroperator
name: storage
uid: 5f960bf0-b1a3-11e9-bb54-000d3a92e279
resourceVersion: "9674"
selfLink: /apis/storage.k8s.io/v1/storageclasses/managed-premium
uid: 9a64e4a3-b1a4-11e9-9ac3-000d3a92e440
parameters:
kind: Managed
storageaccounttype: Premium_LRS
provisioner: kubernetes.io/azure-disk
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 Facing the same problem on Openshift 4.3.28 (5 node cluster), what is the solution / the root cause of the problem? |
Description of problem: Pod with persistent volumes failed scheduling on Azure due to volume node affinity conflict Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-07-18-003042 How reproducible: Often Steps to Reproduce: 1.Login and create a project. 2.Create a PVC. 3.Create a pod with above PVC. Actual results: Pod failed scheduling, Warning FailedScheduling 37s (x2 over 37s) default-scheduler 0/5 nodes are available: 2 node(s) had volume node affinity conflict, 3 node(s) had taints that the pod didn't tolerate. Expected results: Pod is up and running. Additional info: $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE mypvc Bound pvc-a0e2b137-a92d-11e9-b160-000d3a92c15a 1Gi RWO managed-premium 28s $ oc get pods NAME READY STATUS RESTARTS AGE mypod 0/1 Pending 0 31s $ oc describe pod Name: mypod Namespace: xg17v Priority: 0 PriorityClassName: <none> Node: <none> Labels: <none> Annotations: openshift.io/scc: restricted Status: Pending IP: Containers: mycontainer: Image: aosqe/hello-openshift Port: <none> Host Port: <none> Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-986qk (ro) Devices: /dev/myblock from myvolume Conditions: Type Status PodScheduled False Volumes: myvolume: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: mypvc ReadOnly: false default-token-986qk: Type: Secret (a volume populated by a Secret) SecretName: default-token-986qk Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 38s (x2 over 38s) default-scheduler 0/5 nodes are available: 2 node(s) had volume node affinity conflict, 3 node(s) had taints that the pod didn't tolerate. Warning FailedScheduling 37s default-scheduler 0/5 nodes are available: 2 node(s) had volume node affinity conflict, 3 node(s) had taints that the pod didn't tolerate. Warning FailedScheduling 37s (x2 over 37s) default-scheduler 0/5 nodes are available: 2 node(s) had volume node affinity conflict, 3 node(s) had taints that the pod didn't tolerate. $ oc get sc -o yaml apiVersion: v1 items: - apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.kubernetes.io/is-default-class: "true" creationTimestamp: "2019-07-18T01:06:12Z" name: managed-premium ownerReferences: - apiVersion: v1 kind: clusteroperator name: storage uid: 1b2163c5-a8f7-11e9-98a3-000d3a93cf81 resourceVersion: "8866" selfLink: /apis/storage.k8s.io/v1/storageclasses/managed-premium uid: 3c267dfa-a8f8-11e9-8236-000d3a93c3f4 parameters: kind: Managed storageaccounttype: Premium_LRS provisioner: kubernetes.io/azure-disk reclaimPolicy: Delete volumeBindingMode: Immediate kind: List metadata: resourceVersion: "" selfLink: "" $ oc get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS qe-lxia-0718-003042-bnwll-master-0 Ready master 7h v1.14.0+c47409b6f beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D4s_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=centralus,failure-domain.beta.kubernetes.io/zone=centralus-2,kubernetes.io/arch=amd64,kubernetes.io/hostname=qe-lxia-0718-003042-bnwll-master-0,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos qe-lxia-0718-003042-bnwll-master-1 Ready master 7h v1.14.0+c47409b6f beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D4s_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=centralus,failure-domain.beta.kubernetes.io/zone=centralus-1,kubernetes.io/arch=amd64,kubernetes.io/hostname=qe-lxia-0718-003042-bnwll-master-1,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos qe-lxia-0718-003042-bnwll-master-2 Ready master 7h1m v1.14.0+c47409b6f beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D4s_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=centralus,failure-domain.beta.kubernetes.io/zone=centralus-3,kubernetes.io/arch=amd64,kubernetes.io/hostname=qe-lxia-0718-003042-bnwll-master-2,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos qe-lxia-0718-003042-bnwll-worker-centralus1-qpjkr Ready worker 6h53m v1.14.0+c47409b6f beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D2s_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=centralus,failure-domain.beta.kubernetes.io/zone=centralus-1,kubernetes.io/arch=amd64,kubernetes.io/hostname=qe-lxia-0718-003042-bnwll-worker-centralus1-qpjkr,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos qe-lxia-0718-003042-bnwll-worker-centralus2-652vr Ready worker 6h53m v1.14.0+c47409b6f beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D2s_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=centralus,failure-domain.beta.kubernetes.io/zone=centralus-2,kubernetes.io/arch=amd64,kubernetes.io/hostname=qe-lxia-0718-003042-bnwll-worker-centralus2-652vr,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos