Description of problem: Ever since we upgraded Openshift cluster version from 4.3.22 to 4.3.28, we are unable to bind new pods to a new PVC which is possibly due to mismatch in "failure-domain.beta.kubernetes.io/region" values in PVC and worker node label as shown below where it seems to be case sensitive (norwayeast versus NorwayEast). The issue can be temporary fixed by editing the worker node label to from "NorwayEast" to "norwayeast" but we do not know if this has other implications and the setting does not survive a cluster restart. Platform: Openshift 4.3.28 IPI install on Azure Cloud How reproducible: Can be reproduced on Openshift 4.3.28, unable to reproduce issue on 4.3.22 as it does not exist. Attachments: Please let me know where to upload the 700MB must-gather. Steps to Reproduce on project/namespace "default": 1. Check the storage class and create a PVC ---- $ oc get sc azureblock -o yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.kubernetes.io/is-default-class: "true" creationTimestamp: "2020-05-07T09:31:40Z" name: azureblock resourceVersion: "272711" selfLink: /apis/storage.k8s.io/v1/storageclasses/azureblock uid: c4fdbc7b-7ca2-4926-a07b-93ede3a63c2b parameters: kind: Managed storageaccounttype: Premium_LRS provisioner: kubernetes.io/azure-disk reclaimPolicy: Retain volumeBindingMode: Immediate ---- --- oc create --filename - <<EOF apiVersion: v1 kind: PersistentVolumeClaim metadata: name: my-vol-1 spec: storageClassName: "azureblock" accessModes: - ReadWriteOnce resources: requests: storage: 1Gi EOF --- 2. Create a pod --- oc create --filename - <<EOF apiVersion: v1 kind: Pod metadata: name: busybox spec: containers: - name: busybox image: busybox command: ['sh', '-c', 'echo Container 1 is Running ; sleep 3600'] volumeMounts: - name: var-storage mountPath: /var volumes: - name: var-storage persistentVolumeClaim: claimName: my-vol-1 EOF --- 3. Describe pod "busybox" and "oc get pods" Actual results: Pod is in pending state with below errors --- Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling <unknown> default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 2 times) Warning FailedScheduling <unknown> default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 2 times) Warning FailedScheduling <unknown> default-scheduler 0/5 nodes are available: 2 node(s) had volume node affinity conflict, 3 node(s) had taints that the pod didn't tolerate. --- --- $ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES busybox 0/1 Pending 0 28m <none> <none> <none> <none> --- --- $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE my-vol-1 Bound pvc-49af9d53-72c1-4a3b-8cc3-48162e50a002 1Gi RWO azureblock 29m --- --- $ oc get pv pvc-49af9d53-72c1-4a3b-8cc3-48162e50a002 -o yaml apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/bound-by-controller: "yes" pv.kubernetes.io/provisioned-by: kubernetes.io/azure-disk volumehelper.VolumeDynamicallyCreatedByKey: azure-disk-dynamic-provisioner creationTimestamp: "2020-08-05T09:46:32Z" finalizers: - kubernetes.io/pv-protection labels: failure-domain.beta.kubernetes.io/region: norwayeast name: pvc-49af9d53-72c1-4a3b-8cc3-48162e50a002 resourceVersion: "45571973" selfLink: /api/v1/persistentvolumes/pvc-49af9d53-72c1-4a3b-8cc3-48162e50a002 uid: df4723f3-729d-4e35-bb7e-8e4ee013ca2e spec: accessModes: - ReadWriteOnce azureDisk: cachingMode: ReadOnly diskName: testcluster-d68nv-dynamic-pvc-49af9d53-72c1-4a3b-8cc3-48162e50a002 diskURI: /subscriptions/aa809679-9463-4a06-a484-5601797acc97/resourceGroups/testcluster-d68nv-rg/providers/Microsoft.Compute/disks/testcluster-d68nv-dynamic-pvc-49af9d53-72c1-4a3b-8cc3-48162e50a002 fsType: "" kind: Managed readOnly: false capacity: storage: 1Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: my-vol-1 namespace: default resourceVersion: "45571895" uid: 49af9d53-72c1-4a3b-8cc3-48162e50a002 nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: failure-domain.beta.kubernetes.io/region operator: In values: - norwayeast persistentVolumeReclaimPolicy: Retain storageClassName: azureblock volumeMode: Filesystem status: phase: Bound --- --- $ oc describe node testcluster-d68nv-worker-norwayeast-csr9b Name: testcluster-d68nv-worker-norwayeast-csr9b Roles: cp-management,cp-master,cp-proxy,worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=Standard_D48s_v3 beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=NorwayEast failure-domain.beta.kubernetes.io/zone=0 kubernetes.io/arch=amd64 kubernetes.io/hostname=testcluster-d68nv-worker-norwayeast-csr9b kubernetes.io/os=linux management=true master=true node-role.kubernetes.io/cp-management=true node-role.kubernetes.io/cp-master=true node-role.kubernetes.io/cp-proxy=true node-role.kubernetes.io/worker= node.openshift.io/os_id=rhcos proxy=true Annotations: machine.openshift.io/machine: openshift-machine-api/testcluster-d68nv-worker-norwayeast-csr9b machineconfiguration.openshift.io/currentConfig: rendered-worker-abea61d4b2f9970a6897d8803384188c machineconfiguration.openshift.io/desiredConfig: rendered-worker-abea61d4b2f9970a6897d8803384188c machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/ssh: accessed machineconfiguration.openshift.io/state: Done volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Wed, 06 May 2020 20:07:26 +0000 Taints: <none> Unschedulable: false Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Wed, 05 Aug 2020 10:16:49 +0000 Thu, 16 Jul 2020 22:09:39 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Wed, 05 Aug 2020 10:16:49 +0000 Thu, 16 Jul 2020 22:09:39 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Wed, 05 Aug 2020 10:16:49 +0000 Thu, 16 Jul 2020 22:09:39 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Wed, 05 Aug 2020 10:16:49 +0000 Thu, 16 Jul 2020 22:09:39 +0000 KubeletReady kubelet is posting ready status --- Expected results:
*** Bug 1866302 has been marked as a duplicate of this bug. ***
Not sure if this should go to Storage or machine-api but... Storage first.
We upgraded our other cluster also to 4.3.28 from 4.3.22 and now have the exact same problem. In addition, tried the same on an Openshift cluster 4.3.27 and...faced the same problem. The issue does NOT exist on cluster on 4.3.22. Therefore it seems the issue is introduced via upgrade from fix level 4.3.22 to a higher version (as verified on 4.3.27 and 4.3.28)
It seems it's already being fixed. 4.4 fix will be released soon, 4.3 will follow, see #1860832 *** This bug has been marked as a duplicate of bug 1860832 ***