Bug 1858328 - Azure France Central Region node label doesn't match with PV Node Affinity
Summary: Azure France Central Region node label doesn't match with PV Node Affinity
Keywords:
Status: CLOSED DUPLICATE of bug 1860830
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: ---
Assignee: Alberto
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-17 14:45 UTC by Cansu Kavili
Modified: 2020-07-29 15:28 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-29 15:28:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Cansu Kavili 2020-07-17 14:45:28 UTC
Description of problem:

We installed OCP 4.4 to Azure France Central region with IPI. When we wanted to deploy an app using PV, pod got below error:
"Failed to bind volumes: pv "pvc-113b4883-ef9e-496f-a527-4ebbe4b0c4e1" node affinity doesn't match node "who-tdjh4-worker-francecentral3-j4zw6": No matching NodeSelectorTerms"

PV has following affinity:

    nodeAffinity:
      required:
        nodeSelectorTerms:
        - matchExpressions:
          - key: failure-domain.beta.kubernetes.io/region
            operator: In
            values:
            - francecentral
          - key: failure-domain.beta.kubernetes.io/zone
            operator: In
            values:
            - francecentral-3


but the node label value is FranceCentral, so they didn't match.

$ oc get node who-tdjh4-worker-francecentral3-j4zw6  --show-labels

NAME                                    STATUS   ROLES    AGE   VERSION           LABELS
who-tdjh4-worker-francecentral3-j4zw6   Ready    worker   9h    v1.17.1+1aa1c48   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_D4s_v3,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=FranceCentral,failure-domain.beta.kubernetes.io/zone=francecentral-3,kubernetes.io/arch=amd64,kubernetes.io/hostname=who-tdjh4-worker-francecentral3-j4zw6,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=Standard_D4s_v3,node.openshift.io/os_id=rhcos,topology.kubernetes.io/region=FranceCentral,topology.kubernetes.io/zone=francecentral-3

When I overwrite the label, it works fine. 

Version-Release number of the following components:
$ oc version
Client Version: openshift-clients-4.4.0-202004250654
Server Version: 4.4.11


How reproducible:


Steps to Reproduce:
1. Create a cluster on Azure France Central region
2. Deploy an app requires PV
3. 

Actual results:
Pods are not placed due to not matching NodeSelectorTerms unless node labels overwrite with expected labels

Expected results:
Pods are placed without changing any node label.


Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Abhinav Dahiya 2020-07-20 16:54:41 UTC
Installer does not handle the tags or annotations that end up on PVs or nodes. So moving to storage team for triaging the issue.

Comment 2 Hemant Kumar 2020-07-23 13:41:31 UTC
This appears to be fixed in latest/1.18 release but the PRs to bring back the changes verbatim is pretty big, because upstream has introduced a new ARM (azure resource manager) client - https://github.com/kubernetes/kubernetes/pull/86740

The crux of the fix for our purpose is:

clientRegion: NormalizeAzureRegion(clientRegion),

which seems to lower case region before applying to the node, whereas in OCP-4.4 code, we are handling zone correctly but not region:

func (az *Cloud) makeZone(location string, zoneID int) string {
	return fmt.Sprintf("%s-%d", strings.ToLower(location), zoneID)
}


I would defer to cloud team about whether we want to "carry" a simple fix or backport those larger PRs that fix this problem. Also setting target to 4.4.z because this problem should be fixed in 4.5 and 4.6.

Comment 4 Joel Speed 2020-07-29 15:00:57 UTC
This seems to be the same issue as https://bugzilla.redhat.com/show_bug.cgi?id=1860832 which already has PRs open to resolve it, I believe we can close this as a duplicate

Comment 5 Alberto 2020-07-29 15:28:38 UTC

*** This bug has been marked as a duplicate of bug 1860830 ***


Note You need to log in before you can comment on or make changes to this bug.