Bug 1877681

Summary: Manually created PV can not be used
Product: OpenShift Container Platform Reporter: Qin Ping <piqin>
Component: StorageAssignee: Tomas Smetana <tsmetana>
Storage sub component: Kubernetes QA Contact: Qin Ping <piqin>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, jsafrane
Version: 4.6   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: On the OpenStack platform he admission plugin did always add a failover domain and region labels even if they were not configured properly (empty). This caused issues with statically provisioned persistent volumes. Consequence: Pods using statically provisioned persistent volumes failed to start on OpenStack clusters with empty region in configuration. Fix: The labels are added to the volume only in the case they contain valid region and failure domain, just as in the case of dynamically provisioned persistent volumes. Result: The pods using statically provisioned volumes behave the same as the ones with dynamically provisioned volumes on OpenStack clusters configured with empty region or failure domain.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:17:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1900446    
Bug Blocks:    

Description Qin Ping 2020-09-10 07:55:01 UTC
Description of Problem:
Manually created PV can not be used for the PV is assigned a wrong nodeAffnity.

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-09-07-224533

How Reproducible:
Always


Steps to Reproduce:
1. Create a PV manually in an OSP16 with kuryr cluster
2. Create a PVC
3. Create a Pod to use this PVC

Actual Results:
Pod can not be scheduled.
  Warning  FailedScheduling  <unknown>        0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 3 node(s) had volume node affinity conflict.
  Warning  FailedScheduling  <unknown>        0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 3 node(s) had volume node affinity conflict.


Expected Results:
Pod can run succuessfully.

Additional info:
$ cat pv.yaml 
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-test
spec:
  capacity:
    storage: 1Gi
  accessModes:
  - ReadWriteOnce
  cinder:
    volumeID: d3c7c54a-bb69-447f-ae45-c5af2033a885
    fsType: ext4
  persistentVolumeReclaimPolicy: Delete
  storageClassName: sc-81qdr

Manually created PV's nodeAffinity:
$ oc get pv pv-test -ojson |jq .spec.nodeAffinity
{
  "required": {
    "nodeSelectorTerms": [
      {
        "matchExpressions": [
          {
            "key": "failure-domain.beta.kubernetes.io/zone",
            "operator": "In",
            "values": [
              "nova"
            ]
          },
          {
            "key": "failure-domain.beta.kubernetes.io/region",
            "operator": "In",
            "values": [
              ""
            ]
          }
        ]
      }
    ]
  }
}

Dynamic provisioned PV's nodeAffinity
$ oc get pv pvc-547ba420-f426-4db4-b336-b8b961e259d1 -ojson|jq .spec.nodeAffinity
{
  "required": {
    "nodeSelectorTerms": [
      {
        "matchExpressions": [
          {
            "key": "failure-domain.beta.kubernetes.io/zone",
            "operator": "In",
            "values": [
              "nova"
            ]
          }
        ]
      }
    ]
  }
}

Node labels:
$ oc get node --show-labels 
NAME                          STATUS   ROLES    AGE   VERSION                LABELS
ostest-rfp79-master-0         Ready    master   44h   v1.19.0-rc.2+068702d   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=nova,kubernetes.io/arch=amd64,kubernetes.io/hostname=ostest-rfp79-master-0,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=nova
ostest-rfp79-master-1         Ready    master   44h   v1.19.0-rc.2+068702d   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=nova,kubernetes.io/arch=amd64,kubernetes.io/hostname=ostest-rfp79-master-1,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=nova
ostest-rfp79-master-2         Ready    master   44h   v1.19.0-rc.2+068702d   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=nova,kubernetes.io/arch=amd64,kubernetes.io/hostname=ostest-rfp79-master-2,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=nova
ostest-rfp79-worker-0-bqr47   Ready    worker   44h   v1.19.0-rc.2+068702d   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,efp6i=testfor510646,failure-domain.beta.kubernetes.io/zone=nova,kubernetes.io/arch=amd64,kubernetes.io/hostname=ostest-rfp79-worker-0-bqr47,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=nova
ostest-rfp79-worker-0-fs2ms   Ready    worker   44h   v1.19.0-rc.2+068702d   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=nova,kubernetes.io/arch=amd64,kubernetes.io/hostname=ostest-rfp79-worker-0-fs2ms,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=nova
ostest-rfp79-worker-0-w64p6   Ready    worker   44h   v1.19.0-rc.2+068702d   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=nova,kubernetes.io/arch=amd64,kubernetes.io/hostname=ostest-rfp79-worker-0-w64p6,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=nova

Comment 3 Tomas Smetana 2020-09-14 10:06:09 UTC
It looks like there's a discrepancy between the dynamic provisioner and the openstack PVLabeler:

While the provisioner doesn't add the labels if they're empty:
https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/volume/cinder/cinder_util.go#L228-L237

The cloud provider's PVLabeller (that's being used by the admission controller for statically provisioned volumes) will always return the zone/region labels even if empty strings:
https://github.com/openshift/origin/blob/master/vendor/k8s.io/legacy-cloud-providers/openstack/openstack_volumes.go#L745-L749

The fix should be trivial... However I need to find a way to test this too.

Comment 4 Tomas Smetana 2020-09-15 14:14:13 UTC
I failed to reproduce this. I thought it should happen with every statically provisioned PV but the PV I created in the PSI openstack has no nodeAffinity set at all.

Comment 5 Tomas Smetana 2020-09-15 15:09:19 UTC
I also compared the PV from must gather and mine. The one from must-gather has the failure-domain.beta.kubernetes.io/<region|zone> set while mine does not (ie. no labels at all). The PVLabeler code is the same so there's something else that's altering the PV object.

Comment 9 Tomas Smetana 2020-09-30 09:12:39 UTC
Upstream PR: https://github.com/kubernetes/kubernetes/pull/95174

Comment 12 Qin Ping 2020-11-25 03:33:40 UTC
Verified with: 4.7.0-0.nightly-2020-11-24-080601

Comment 15 errata-xmlrpc 2021-02-24 15:17:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633