1936871 – [Cinder CSI] Topology aware provisioning doesn't work when Nova and Cinder AZs are different

Bug 1936871 - [Cinder CSI] Topology aware provisioning doesn't work when Nova and Cinder AZs are different

Summary: [Cinder CSI] Topology aware provisioning doesn't work when Nova and Cinder AZ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Mike Fedosin
QA Contact:	rlobillo
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-03-09 11:45 UTC by Mike Fedosin
Modified:	2021-07-27 22:52 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:	Feature: In 4.8 Cinder CSI driver operator automatically detects OpenStack cloud parameters related to availability zones, and configures the driver accordingly. Reason: Previously users couldn't deploy a PV in a volume availability zone that name is different from compute availability zone, because it required additional configuration of Cinder CSI driver. Result: Users can provision PVs and then mount them to pods in different availability zones.
Clone Of:
Environment:
Last Closed:	2021-07-27 22:51:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-storage-operator pull 141	None	closed	Bug 1936871: Add volumes with credentials to Cinder operator deployment	2021-06-04 15:09:58 UTC
Github	openshift openstack-cinder-csi-driver-operator pull 43	None	closed	Bug 1936871: support clouds with multiple availability zones	2021-06-04 15:09:56 UTC
Red Hat Product Errata	RHSA-2021:2438	None	None	None	2021-07-27 22:52:20 UTC

Description Mike Fedosin 2021-03-09 11:45:04 UTC

Description of problem:
When Cinder and Nova availability zones are different, we can't provision a pod with the attached volume.

  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  2m5s  default-scheduler  running PreBind plugin "VolumeBinding": binding volumes: pv "pvc-fda419fc-5dbc-4878-ab89-5cb1541a33a5" node affinity doesn't match node "ostest-dl27b-worker-0-cbdbh": no matching NodeSelectorTerms
  Warning  FailedScheduling  2m5s  default-scheduler  0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 3 node(s) had volume node affinity conflict.
  Warning  FailedScheduling  2m2s  default-scheduler  0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 3 node(s) had volume node affinity conflict.

It happens because Cinder CSI driver adds a node affinity to the created PV like:

Node Affinity:                                                                                                                                                                                
  Required Terms:                                                                                                                                                                             
    Term 0:        topology.cinder.csi.openstack.org/zone in [AZ1]

To avoid this we need to set `ignore-volume-az = true` in the driver config.

How reproducible:
Always

Steps to Reproduce:

Create next objects in the cluster:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: topology-aware-standard
provisioner: cinder.csi.openstack.org
parameters:
  availability: AZ1
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
  - key: topology.cinder.csi.openstack.org/zone
    values:
    - "AZ-0"

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc1
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: topology-aware-standard

---
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - image: nginx
    imagePullPolicy: IfNotPresent
    name: nginx
    ports:
    - containerPort: 80
      protocol: TCP
    volumeMounts:
      - mountPath: /var/lib/www/data
        name: mydata
  volumes:
  - name: mydata
    persistentVolumeClaim:
      claimName: pvc1
      readOnly: false

AZ-0 is a Nova AZ
AZ1 is a Cinder AZ

Actual results:

Provisioning of the Pod fails because of scheduling issues

Expected results:

PV should be provisioned and the Pod is in the Running state

Additional info:

Upstream Issue: https://github.com/kubernetes/cloud-provider-openstack/issues/1300

Comment 4 rlobillo 2021-06-09 14:04:22 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1936871

Verified on 4.8.0-0.nightly-2021-06-08-034312 over OSP16.1 (RHOS-16.1-RHEL-8-20210323.n.0)

Test #1. One compute zone and three volume zones with different names all of them. rootVolumes enabled for all the nodes.

install-config includes:

	compute:
	- name: worker
	  platform:
		openstack:
		  zones: ['AZ-0', 'AZ-0', 'AZ-0']
		  additionalNetworkIDs: []
		  rootVolume:
			size: 25
			type: tripleo
			zones: ['cinderAZ0', 'cinderAZ1', 'cinderAZ0']
	  replicas: 3
	controlPlane:
	  name: master
	  platform:
		openstack:
		  zones: ['AZ-0', 'AZ-0', 'AZ-0']
		  rootVolume:
			size: 25
			type: tripleo
			zones: ['cinderAZ0', 'cinderAZ1', 'cinderAZ0']
	  replicas: 3

where the project has below :

	(shiftstack) [stack@undercloud-0 ~]$ openstack availability zone list --compute
	+-----------+-------------+
	| Zone Name | Zone Status |
	+-----------+-------------+
	| AZ-0      | available   |
	+-----------+-------------+
	(shiftstack) [stack@undercloud-0 ~]$ openstack availability zone list --volume
	+-----------+-------------+
	| Zone Name | Zone Status |
	+-----------+-------------+
	| nova      | available   |
	| cinderAZ0 | available   |
	| cinderAZ1 | available   |
	+-----------+-------------+

Once the cluster is up, ignore-volume-az is set:

	$ oc get cm -n openshift-cluster-csi-drivers openstack-cinder-config -o yaml
	apiVersion: v1
	data:
	  cloud.conf: |
		[Global]
		use-clouds = true
		clouds-file = /etc/kubernetes/secret/clouds.yaml
		cloud = openstack
	  multiaz-cloud.conf: |
		[Global]
		use-clouds = true
		clouds-file = /etc/kubernetes/secret/clouds.yaml
		cloud = openstack
		[BlockStorage]
		ignore-volume-az = yes
	kind: ConfigMap
	metadata:
	  creationTimestamp: "2021-06-03T15:30:21Z"
	  name: openstack-cinder-config
	  namespace: openshift-cluster-csi-drivers
	  resourceVersion: "13561"
	  uid: 5d1ab6c0-d4b7-44a2-95ce-073869da611e

Manual test: Loading below manifests, the pods go to run status when they are using PVCs created on different cinder AZs:

	$ cat test_pvc_azs.yaml
	---
	apiVersion: storage.k8s.io/v1
	kind: StorageClass
	metadata:
	  name: topology-aware-cinder-az0
	provisioner: cinder.csi.openstack.org
	parameters:
	  availability: cinderAZ0
	volumeBindingMode: WaitForFirstConsumer
	---
	apiVersion: v1
	kind: PersistentVolumeClaim
	metadata:
	  name: pvc-cinder-az0
	  namespace: demo
	spec:
	  accessModes:
	  - ReadWriteOnce
	  resources:
		requests:
		  storage: 1Gi
	  storageClassName: topology-aware-cinder-az0
	---
	apiVersion: apps/v1
	kind: Deployment
	metadata:
	  name: demo-0
	  namespace: demo
	spec:
	  replicas: 1
	  selector:
		matchLabels:
		  app: demo-0
		  cinder-az: cinderAZ0
		  nova-az: AZ-0
	  template:
		metadata:
		  labels:
			app: demo-0
			cinder-az: cinderAZ0
			nova-az: AZ-0
		spec:
		  containers:
		  - name: demo
			image: quay.io/kuryr/demo
			ports:
			- containerPort: 80
			  protocol: TCP
			volumeMounts:
			  - mountPath: /var/lib/www/data
				name: mydata
		  nodeSelector:
			topology.cinder.csi.openstack.org/zone: AZ-0
		  volumes:
			- name: mydata
			  persistentVolumeClaim:
				claimName: pvc-cinder-az0
				readOnly: false

	$ cat test_pvc_azs2.yaml
	---
	apiVersion: storage.k8s.io/v1
	kind: StorageClass
	metadata:
	  name: topology-aware-cinder-az1
	provisioner: cinder.csi.openstack.org
	parameters:
	  availability: cinderAZ1
	volumeBindingMode: WaitForFirstConsumer
	---
	apiVersion: v1
	kind: PersistentVolumeClaim
	metadata:
	  name: pvc-cinder-az1
	  namespace: demo
	spec:
	  accessModes:
	  - ReadWriteOnce
	  resources:
		requests:
		  storage: 1Gi
	  storageClassName: topology-aware-cinder-az1
	---
	apiVersion: apps/v1
	kind: Deployment
	metadata:
	  name: demo-1
	  namespace: demo
	spec:
	  replicas: 1
	  selector:
		matchLabels:
		  app: demo-1
	  template:
		metadata:
		  labels:
			app: demo-1
		spec:
		  containers:
		  - name: demo
			image: quay.io/kuryr/demo
			ports:
			- containerPort: 80
			  protocol: TCP
			volumeMounts:
			  - mountPath: /var/lib/www/data
				name: mydata
		  nodeSelector:
			topology.cinder.csi.openstack.org/zone: AZ-0
		  volumes:
			- name: mydata
			  persistentVolumeClaim:
				claimName: pvc-cinder-az1
				readOnly: false

	$ oc get pods -n demo -o wide
	NAME                      READY   STATUS    RESTARTS   AGE     IP            NODE                          NOMINATED NODE   READINESS GATES
	demo-0-857fb67fb7-v6tqp   1/1     Running   0          4d23h   10.129.2.20   ostest-wjzt5-worker-2-cccmm   <none>           <none>
	demo-1-7859fdc774-d96p2   1/1     Running   0          4d23h   10.129.2.19   ostest-wjzt5-worker-2-cccmm   <none>           <none>

	$ oc get machines -A
	NAMESPACE               NAME                          PHASE     TYPE        REGION      ZONE   AGE
	openshift-machine-api   ostest-wjzt5-master-0         Running   m4.xlarge   regionOne   AZ-0   5d1h
	openshift-machine-api   ostest-wjzt5-master-1         Running   m4.xlarge   regionOne   AZ-0   5d1h
	openshift-machine-api   ostest-wjzt5-master-2         Running   m4.xlarge   regionOne   AZ-0   5d1h
	openshift-machine-api   ostest-wjzt5-worker-0-4gjsv   Running   m4.xlarge   regionOne   AZ-0   5d
	openshift-machine-api   ostest-wjzt5-worker-1-7qqcw   Running   m4.xlarge   regionOne   AZ-0   5d
	openshift-machine-api   ostest-wjzt5-worker-2-cccmm   Running   m4.xlarge   regionOne   AZ-0   5d

	$ oc get pvc -n demo 
	NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                AGE
	pvc-cinder-az0   Bound    pvc-f5b28358-c883-415f-95d5-51ef458e85a8   1Gi        RWO            topology-aware-cinder-az0   4d23h
	pvc-cinder-az1   Bound    pvc-174a5a6e-e66e-4b30-9f7d-e085e2039f29   1Gi        RWO            topology-aware-cinder-az1   4d23h

	$ openstack volume show pvc-f5b28358-c883-415f-95d5-51ef458e85a8 -c availability_zone
	+-------------------+-----------+
	| Field             | Value     |
	+-------------------+-----------+
	| availability_zone | cinderAZ0 |
	+-------------------+-----------+
	$ openstack volume show pvc-174a5a6e-e66e-4b30-9f7d-e085e2039f29 -c availability_zone
	+-------------------+-----------+
	| Field             | Value     |
	+-------------------+-----------+
	| availability_zone | cinderAZ1 |
	+-------------------+-----------+


All csi test suite [1] passed setting the availabilty parameter to 'cinderAZ0' on the StorageClasses but the two TCs mentioned on https://bugzilla.redhat.com/show_bug.cgi?id=1917710.

Test #2. 3 compute zones and one single volume zone with different names.

install-config.yaml includes

compute:
- name: worker
  platform:
    openstack:
      zones: ['AZ-0','AZ-1','AZ-2']
      additionalNetworkIDs: []
  replicas: 3
controlPlane:
  name: master
  platform:
    openstack:
      zones: ['AZ-0','AZ-1','AZ-2']
  replicas: 3

where the project has below :

	(shiftstack) [stack@undercloud-0 ~]$ openstack availability zone list --compute
	+-----------+-------------+
	| Zone Name | Zone Status |
	+-----------+-------------+
	| AZ-0      | available   |
	| AZ-1      | available   |
	| AZ-2      | available   |
	+-----------+-------------+

	(shiftstack) [stack@undercloud-0 ~]$ openstack availability zone list --volume
	+-----------+-------------+
	| Zone Name | Zone Status |
	+-----------+-------------+
	| nova      | available   |
	+-----------+-------------+

Once the cluster is up, ignore-volume-az is set:

	$ oc get cm -n openshift-cluster-csi-drivers openstack-cinder-config -o yaml
	apiVersion: v1
	data:
	  cloud.conf: |
		[Global]
		use-clouds = true
		clouds-file = /etc/kubernetes/secret/clouds.yaml
		cloud = openstack
	  multiaz-cloud.conf: |
		[Global]
		use-clouds = true
		clouds-file = /etc/kubernetes/secret/clouds.yaml
		cloud = openstack
		[BlockStorage]
		ignore-volume-az = yes
	kind: ConfigMap
	metadata:
	  creationTimestamp: "2021-06-08T10:10:55Z"
	  name: openstack-cinder-config
	  namespace: openshift-cluster-csi-drivers
	  resourceVersion: "6576"
	  uid: 0eb890e7-e084-443e-9f48-4dc77aa86c07


Test #3. one compute zone and 3 volume zones. Same name on first compute and volume zone.

install-config.yaml includes:

compute:
- name: worker
  platform:
    openstack:
      zones: ['nova']
      additionalNetworkIDs: []
  replicas: 3
controlPlane:
  name: master
  platform:
    openstack:
      zones: ['nova']
  replicas: 3

(shiftstack) [stack@undercloud-0 ~]$ openstack availability zone list --compute
+-----------+-------------+
| Zone Name | Zone Status |
+-----------+-------------+
| nova      | available   |
+-----------+-------------+

(shiftstack) [stack@undercloud-0 ~]$ openstack availability zone list --volume
+-----------+-------------+
| Zone Name | Zone Status |
+-----------+-------------+
| nova      | available   |
| cinderAZ0 | available   |
| cinderAZ1 | available   |
+-----------+-------------+


Once the cluster is up, ignore-volume-az is set:

	$ oc get cm -n openshift-cluster-csi-drivers openstack-cinder-config -o yaml
	apiVersion: v1
	data:
	  cloud.conf: |
		[Global]
		use-clouds = true
		clouds-file = /etc/kubernetes/secret/clouds.yaml
		cloud = openstack
	  multiaz-cloud.conf: |
		[Global]
		use-clouds = true
		clouds-file = /etc/kubernetes/secret/clouds.yaml
		cloud = openstack
		[BlockStorage]
		ignore-volume-az = yes
	kind: ConfigMap
	metadata:
	  creationTimestamp: "2021-06-08T15:39:08Z"
	  name: openstack-cinder-config
	  namespace: openshift-cluster-csi-drivers
	  resourceVersion: "5403"
	  uid: f27b4ce2-0319-46b3-88fc-24c8886c8170

Test #4. one compute zone and one volume zone with same name:

	compute:
	- name: worker
	  platform:
		openstack:
		  zones: []
		  additionalNetworkIDs: []
	  replicas: 2
	controlPlane:
	  name: master
	  platform:
		openstack:
		  zones: []
	  replicas: 3


	(shiftstack) [stack@undercloud-0 ~]$ openstack availability zone list --compute
	+-----------+-------------+
	| Zone Name | Zone Status |
	+-----------+-------------+
	| nova      | available   |
	+-----------+-------------+
	(shiftstack) [stack@undercloud-0 ~]$ openstack availability zone list --volume                                                                                                                                                               
	+-----------+---------------+
	| Zone Name | Zone Status   |
	+-----------+---------------+
	| nova      | available     |
	| cinderAZ0 | not available |
	| cinderAZ1 | not available |
	+-----------+---------------+

Unexpectedly, the flag is enabled so below bz has been filed:
https://bugzilla.redhat.com/show_bug.cgi?id=1969945

However, the impact of having the flag enabled is minor.


[1] https://github.com/openshift/openstack-cinder-csi-driver-operator/blob/master/hack/e2e.sh

Comment 7 errata-xmlrpc 2021-07-27 22:51:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.