1866312 – failure-domain.beta.kubernetes.io/region on PVC does not match with worker node label

Bug 1866312 - failure-domain.beta.kubernetes.io/region on PVC does not match with worker node label

Summary: failure-domain.beta.kubernetes.io/region on PVC does not match with worker no...

Keywords:
Status:	CLOSED DUPLICATE of bug 1860832
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.3.z
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	aos-storage-staff@redhat.com
QA Contact:	Qin Ping
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1866302 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-05 10:53 UTC by Abu Davis
Modified:	2023-12-15 18:43 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-08-10 11:12:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Abu Davis 2020-08-05 10:53:46 UTC

Description of problem: Ever since we upgraded Openshift cluster version from 4.3.22 to 4.3.28, we are unable to bind new pods to a new PVC which is possibly due to mismatch in "failure-domain.beta.kubernetes.io/region" values in PVC and worker node label as shown below where it seems to be case sensitive (norwayeast versus NorwayEast). The issue can be temporary fixed by editing the worker node label to from "NorwayEast" to "norwayeast" but we do not know if this has other implications and the setting does not survive a cluster restart.

Platform: Openshift 4.3.28 IPI install on Azure Cloud

How reproducible: Can be reproduced on Openshift 4.3.28, unable to reproduce issue on 4.3.22 as it does not exist.

Attachments: Please let me know where to upload the 700MB must-gather.

Steps to Reproduce on project/namespace "default":
1. Check the storage class and create a PVC
----
$ oc get sc azureblock -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2020-05-07T09:31:40Z"
  name: azureblock
  resourceVersion: "272711"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/azureblock
  uid: c4fdbc7b-7ca2-4926-a07b-93ede3a63c2b
parameters:
  kind: Managed
  storageaccounttype: Premium_LRS
provisioner: kubernetes.io/azure-disk
reclaimPolicy: Retain
volumeBindingMode: Immediate
----
---
oc create --filename - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-vol-1
spec:
  storageClassName: "azureblock"
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
EOF
---

2. Create a pod
---
oc create --filename - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  containers:
  - name: busybox
    image: busybox
    command: ['sh', '-c', 'echo Container 1 is Running ; sleep 3600']
    volumeMounts:
    - name: var-storage
      mountPath: /var
  volumes:
  - name: var-storage
    persistentVolumeClaim:
      claimName: my-vol-1
EOF
---

3. Describe pod "busybox" and "oc get pods"

Actual results:
Pod is in pending state with below errors

---
Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
  Warning  FailedScheduling  <unknown>  default-scheduler  pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
  Warning  FailedScheduling  <unknown>  default-scheduler  0/5 nodes are available: 2 node(s) had volume node affinity conflict, 3 node(s) had taints that the pod didn't tolerate.
---

---
$ oc get pods -o wide
NAME      READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
busybox   0/1     Pending   0          28m   <none>   <none>   <none>           <none>
---

---
$ oc get pvc
NAME            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
my-vol-1        Bound    pvc-49af9d53-72c1-4a3b-8cc3-48162e50a002   1Gi        RWO            azureblock     29m
---
---
$ oc get pv pvc-49af9d53-72c1-4a3b-8cc3-48162e50a002 -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/bound-by-controller: "yes"
    pv.kubernetes.io/provisioned-by: kubernetes.io/azure-disk
    volumehelper.VolumeDynamicallyCreatedByKey: azure-disk-dynamic-provisioner
  creationTimestamp: "2020-08-05T09:46:32Z"
  finalizers:
  - kubernetes.io/pv-protection
  labels:
    failure-domain.beta.kubernetes.io/region: norwayeast
  name: pvc-49af9d53-72c1-4a3b-8cc3-48162e50a002
  resourceVersion: "45571973"
  selfLink: /api/v1/persistentvolumes/pvc-49af9d53-72c1-4a3b-8cc3-48162e50a002
  uid: df4723f3-729d-4e35-bb7e-8e4ee013ca2e
spec:
  accessModes:
  - ReadWriteOnce
  azureDisk:
    cachingMode: ReadOnly
    diskName: testcluster-d68nv-dynamic-pvc-49af9d53-72c1-4a3b-8cc3-48162e50a002
    diskURI: /subscriptions/aa809679-9463-4a06-a484-5601797acc97/resourceGroups/testcluster-d68nv-rg/providers/Microsoft.Compute/disks/testcluster-d68nv-dynamic-pvc-49af9d53-72c1-4a3b-8cc3-48162e50a002
    fsType: ""
    kind: Managed
    readOnly: false
  capacity:
    storage: 1Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: my-vol-1
    namespace: default
    resourceVersion: "45571895"
    uid: 49af9d53-72c1-4a3b-8cc3-48162e50a002
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: failure-domain.beta.kubernetes.io/region
          operator: In
          values:
          - norwayeast
  persistentVolumeReclaimPolicy: Retain
  storageClassName: azureblock
  volumeMode: Filesystem
status:
  phase: Bound
---

---
$ oc describe node testcluster-d68nv-worker-norwayeast-csr9b 
Name:               testcluster-d68nv-worker-norwayeast-csr9b
Roles:              cp-management,cp-master,cp-proxy,worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=Standard_D48s_v3
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=NorwayEast
                    failure-domain.beta.kubernetes.io/zone=0
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=testcluster-d68nv-worker-norwayeast-csr9b
                    kubernetes.io/os=linux
                    management=true
                    master=true
                    node-role.kubernetes.io/cp-management=true
                    node-role.kubernetes.io/cp-master=true
                    node-role.kubernetes.io/cp-proxy=true
                    node-role.kubernetes.io/worker=
                    node.openshift.io/os_id=rhcos
                    proxy=true
Annotations:        machine.openshift.io/machine: openshift-machine-api/testcluster-d68nv-worker-norwayeast-csr9b
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-abea61d4b2f9970a6897d8803384188c
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-abea61d4b2f9970a6897d8803384188c
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/ssh: accessed
                    machineconfiguration.openshift.io/state: Done
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 06 May 2020 20:07:26 +0000
Taints:             <none>
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 05 Aug 2020 10:16:49 +0000   Thu, 16 Jul 2020 22:09:39 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 05 Aug 2020 10:16:49 +0000   Thu, 16 Jul 2020 22:09:39 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 05 Aug 2020 10:16:49 +0000   Thu, 16 Jul 2020 22:09:39 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Wed, 05 Aug 2020 10:16:49 +0000   Thu, 16 Jul 2020 22:09:39 +0000   KubeletReady                 kubelet is posting ready status
---

Expected results:

Comment 1 Abu Davis 2020-08-05 10:54:04 UTC

*** Bug 1866302 has been marked as a duplicate of this bug. ***

Comment 2 Seth Jennings 2020-08-07 21:36:13 UTC

Not sure if this should go to Storage or machine-api but... Storage first.

Comment 3 Abu Davis 2020-08-10 10:26:37 UTC

We upgraded our other cluster also to 4.3.28 from 4.3.22 and now have the exact same problem.
In addition, tried the same on an Openshift cluster 4.3.27 and...faced the same problem.
The issue does NOT exist on cluster on 4.3.22.
Therefore it seems the issue is introduced via upgrade from fix level 4.3.22 to a higher version (as verified on 4.3.27 and 4.3.28)

Comment 4 Jan Safranek 2020-08-10 11:12:12 UTC

It seems it's already being fixed. 4.4 fix will be released soon, 4.3 will follow, see #1860832

*** This bug has been marked as a duplicate of bug 1860832 ***

Note You need to log in before you can comment on or make changes to this bug.