Bug 2065597 - Cinder CSI is not configurable
Summary: Cinder CSI is not configurable
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.11
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Stephen Finucane
QA Contact: rlobillo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-18 09:57 UTC by Martin André
Modified: 2022-08-10 10:55 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The OpenStack Cinder CSI Driver Operator is responsible for configuring the cinder CSI driver. This operator did not provide any way to configure the driver. Consequence: It was not possible to configure this CSI driver which was problematic for some deployments. Fix: The OpenStack Cinder CSI Driver Operator will now attempt to read configuration information from the 'cloud-provider-config' config map in the 'openshift-config' namespace, which is a user-configurable config map. This must be valid configuration for the cinder CSI driver. Result: Users can now configurae the Cinder CSI driver.
Clone Of:
Environment:
Last Closed: 2022-08-10 10:54:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openstack-cinder-csi-driver-operator pull 78 0 None open Bug 2065597: Add support for dynamic, user-managed config 2022-05-05 12:10:45 UTC
Red Hat Issue Tracker RFE-2587 0 None None None 2022-05-05 13:17:36 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:55:14 UTC

Description Martin André 2022-03-18 09:57:19 UTC
The Cinder CSI driver operator uses a static cloud.conf file [1]. This means that it's currently not possible to set options [2] for the driver.

This is essentially the same problem we have with the cloud provider (bug 2049775) but with Cinder CSI.

We should use a similar strategy as detailed in the cloud-provider's cloud.conf upgrade enhancement proposal [3] for the cloud.conf used by Cinder CSI.

[1] https://github.com/openshift/openstack-cinder-csi-driver-operator/blob/master/assets/configmap.yaml
[2] https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/cinder-csi-plugin/using-cinder-csi-plugin.md#driver-config
[3] https://github.com/openshift/enhancements/pull/1009

Comment 1 ShiftStack Bugwatcher 2022-03-24 07:06:32 UTC
Removing the Triaged keyword because:
* the QE automation assessment (flag qe_test_coverage) is missing

Comment 2 Gregory Charot 2022-05-05 13:18:32 UTC
This BZ solves the following Jira RFE https://issues.redhat.com/browse/RFE-2587

Comment 5 rlobillo 2022-06-08 11:43:26 UTC
Verified on 4.11.0-0.nightly-2022-06-06-025509 on top of RHOS-16.2-RHEL-8-20220311.n.1.

On a cluster that requires the parameter 'ignore-volume-az = yes' enabled, it is possible now to disable it by user and it is confirmed that the change is applied.

 *Note1: rescan-on-resize: Configuring this param breaks the cluster. Kubelet do not accept it and the node is stuck in notReady. BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2077933.
 
 *Note2: node-volume-attach-limit: Despite it is appearing on the cloud-conf, the change does not look to be apllied. Creating more pvcs than the limit on the same worker is accepted. BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2094829

Verification steps:

Cluster running with multiAZ+rootVolumes+Manila after successful IPI installation:

	$ oc get clusterversion
	NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
	version   4.11.0-0.nightly-2022-06-06-025509   True        False         5m17s   Cluster version is 4.11.0-0.nightly-2022-06-06-025509

	$ oc get machines -A
	NAMESPACE               NAME                          PHASE     TYPE        REGION      ZONE      AGE
	openshift-machine-api   ostest-6w4hg-master-0         Running   m4.xlarge   regionOne   AZhci-0   62m
	openshift-machine-api   ostest-6w4hg-master-1         Running   m4.xlarge   regionOne   AZhci-2   62m
	openshift-machine-api   ostest-6w4hg-master-2         Running   m4.xlarge   regionOne   AZhci-1   62m
	openshift-machine-api   ostest-6w4hg-worker-0-tzjpg   Running   m4.xlarge   regionOne   AZhci-0   54m
	openshift-machine-api   ostest-6w4hg-worker-1-rjx42   Running   m4.xlarge   regionOne   AZhci-2   54m
	openshift-machine-api   ostest-6w4hg-worker-2-hmrvk   Running   m4.xlarge   regionOne   AZhci-1   54m

	$ openstack server list --long -c Name -c Networks -c 'Availability Zone'
	+-----------------------------+--------------------------------------------------------------+-------------------+
	| Name                        | Networks                                                     | Availability Zone |
	+-----------------------------+--------------------------------------------------------------+-------------------+
	| ostest-6w4hg-worker-0-tzjpg | StorageNFS=172.17.5.168; ostest-6w4hg-openshift=10.196.2.213 | AZhci-0           |
	| ostest-6w4hg-worker-2-hmrvk | StorageNFS=172.17.5.184; ostest-6w4hg-openshift=10.196.2.71  | AZhci-1           |
	| ostest-6w4hg-worker-1-rjx42 | StorageNFS=172.17.5.221; ostest-6w4hg-openshift=10.196.0.22  | AZhci-2           |
	| ostest-6w4hg-master-2       | ostest-6w4hg-openshift=10.196.2.220                          | AZhci-1           |
	| ostest-6w4hg-master-1       | ostest-6w4hg-openshift=10.196.2.80                           | AZhci-2           |
	| ostest-6w4hg-master-0       | ostest-6w4hg-openshift=10.196.0.205                          | AZhci-0           |
	+-----------------------------+--------------------------------------------------------------+-------------------+


	$ for i in $(openstack volume list -c Name -f value); do echo "# $i"; openstack volume show $i -c availability_zone -c name -c description -f value; echo ; done
	# ostest-6w4hg-worker-0-tzjpg-root
	cinderAZ0
	Root volume for ostest-6w4hg-worker-0-tzjpg
	ostest-6w4hg-worker-0-tzjpg-root

	# ostest-6w4hg-worker-2-hmrvk-root
	cinderAZ0
	Root volume for ostest-6w4hg-worker-2-hmrvk
	ostest-6w4hg-worker-2-hmrvk-root

	# ostest-6w4hg-worker-1-rjx42-root
	cinderAZ1
	Root volume for ostest-6w4hg-worker-1-rjx42
	ostest-6w4hg-worker-1-rjx42-root

	# ostest-6w4hg-master-1
	cinderAZ1
	Created By OpenShift Installer
	ostest-6w4hg-master-1

	# ostest-6w4hg-master-2
	cinderAZ0
	Created By OpenShift Installer
	ostest-6w4hg-master-2

	# ostest-6w4hg-master-0
	cinderAZ0
	Created By OpenShift Installer
	ostest-6w4hg-master-0

Existing config before performing any change:

	$ oc get cm cloud-provider-config -n openshift-config -o yaml
	apiVersion: v1
	data:
	  ca-bundle.pem: |
		-----BEGIN CERTIFICATE-----
		MIIEjjCCAvagAwIBAgIBATANBgkqhkiG9w0BAQsFADA3MRUwEwYDVQQKDAxSRURI
		QVQuTE9DQUwxHjAcBgNVBAMMFUNlcnRpZmljYXRlIEF1dGhvcml0eTAeFw0yMjA2
		MDMyMDA2MTFaFw00MjA2MDMyMDA2MTFaMDcxFTATBgNVBAoMDFJFREhBVC5MT0NB
		TDEeMBwGA1UEAwwVQ2VydGlmaWNhdGUgQXV0aG9yaXR5MIIBojANBgkqhkiG9w0B
		AQEFAAOCAY8AMIIBigKCAYEA1Etjh3AD96m9m7+SSo34m4LED1e8kfGOHDxWZju+
		DlxqRW/ziS7pscGobgH9I5En7ALmBo68kx1Lq9XA9epDv63spuwJzYHS/L8v3+0l
		7RdBe/BeoWcbLha9QcWSaLYkR45hyZF1apHto5xutYTV4VBiUzNCQWoXhg0FaP/t
		qNkLM/CURuMI6LX50odl3IUFgiF3+/j4F5EJzApfU2bBMXXXn6Tt5PkXysrjRitz
		nfPd/j+Ygw8LJEiTz8fl6qysXjyeWgovurBGcfL1OZt29G7bwMu7XRpTxsD6JcNp
		3KT6RTkS9U/9YQeFM320meJ1Ieuh/FZk7Mt/yZaPVOE+pl01deINWHtk5eP5sgu0
		3ivI3VCqjAaP0SYAdEBNvo4A3cN9Kh/g4B/ihDScMpR5vNjwJBTHRIV3qMdNvJUW
		171NuDbT6mHe/LMQMaHWaK86zUtkyAg3INjk1vY6rJNAZw9sTY4OLW0I/kNE3bHK
		9WQWMlf/WZIJKF/gje3x1pjfAgMBAAGjgaQwgaEwHwYDVR0jBBgwFoAUAp8QxbZh
		Qa5wdoVf+PB2MjE7W6swDwYDVR0TAQH/BAUwAwEB/zAOBgNVHQ8BAf8EBAMCAcYw
		HQYDVR0OBBYEFAKfEMW2YUGucHaFX/jwdjIxO1urMD4GCCsGAQUFBwEBBDIwMDAu
		BggrBgEFBQcwAYYiaHR0cDovL2lwYS1jYS5yZWRoYXQubG9jYWwvY2Evb2NzcDAN
		BgkqhkiG9w0BAQsFAAOCAYEAn6DTN9hlQOCifBR1kSHywd3wJOnrUUCeKGs6I8xK
		LSMyiHKBdPe7zt4L1/yL6H8KQayzThgKR2rUCdUn7eFXgbXcK5GYAuJ82AZPxb4H
		mxB4CLNCkNAKNCKn6pjHZa39wnnOjdTPCjSiklk1lkZyiTNeiE37wuWA469wugNE
		5o/rcS0UM1BAT6dLcFHOJPWm1J1aXDBuhHYl7e3wWjHAR5QwijvMUnguAvu3Qber
		LHTBxqD/qN2fR1WmfVZ2NVu5t8eAzzJOBlJs/eTA6gBLUgLgA38mx1i67wSbAclC
		b/9gIUZKKr2ZSB+gmkDtkbtznql9NMO0NLdwGJqdlvfFrG+WM8ZV8MuUChoUgS4P
		kvZyAy8/e0gRLi5WH5ig+JvTdknZN+eE2UL9JKnNpefXskDljQkltXrGiKnwpIPk
		bxYCbFNx53aoWw1pjQu+2zodoRE5x2KpHOOvKrKk7eiz02qk4vgVfWMq0qwat9or
		9dGditlV0YGYoQxGUn8TJjfQ
		-----END CERTIFICATE-----
	  config: |
		[Global]
		secret-name = openstack-credentials
		secret-namespace = kube-system
		region = regionOne
		ca-file = /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem
		[LoadBalancer]
		use-octavia = True
	kind: ConfigMap
	metadata:
	  creationTimestamp: "2022-06-08T09:22:04Z"
	  name: cloud-provider-config
	  namespace: openshift-config
	  resourceVersion: "1864"
	  uid: 3eff3b69-379a-4cfb-8bd8-2b4672bbe11c


	$ oc get cm -n openshift-cluster-csi-drivers cloud-conf -o yaml
	apiVersion: v1
	data:
	  cloud.conf: |+
		[Global]
		region      = regionOne
		ca-file     = /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem
		use-clouds  = true
		clouds-file = /etc/kubernetes/secret/clouds.yaml
		cloud       = openstack

		[LoadBalancer]
		use-octavia = True

		[BlockStorage]
		ignore-volume-az = yes

	kind: ConfigMap
	metadata:
	  creationTimestamp: "2022-06-08T09:26:41Z"
	  name: cloud-conf
	  namespace: openshift-cluster-csi-drivers
	  resourceVersion: "9119"
	  uid: e1ca6a43-8e63-49f3-a5d4-26c8761a2b55

A pod running in a nova AZ using a PVC on a cinder AZ with different names is up and running (meaning the 'ignore-volume-az = yes' is working fine):

	$ oc get pod -o wide
	NAME                      READY   STATUS    RESTARTS   AGE    IP            NODE                          NOMINATED NODE   READINESS GATES
	demo-0-78677fb76d-qggxm   1/1     Running   0          2m5s   10.128.2.13   ostest-6w4hg-worker-0-tzjpg   <none>           <none>

	$ oc get pvc -o wide
	NAME    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS       AGE     VOLUMEMODE
	pvc-0   Bound    pvc-7d214c10-16c9-4d1f-91b7-29668cdae5cb   1Gi        RWO            topology-aware-0   2m40s   Filesystem

	$ openstack volume show pvc-7d214c10-16c9-4d1f-91b7-29668cdae5cb -c 'availability_zone'
	+-------------------+-----------+
	| Field             | Value     |
	+-------------------+-----------+
	| availability_zone | cinderAZ0 |
	+-------------------+-----------+

Performing below change on cloud-provider-config (According to https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/cinder-csi-plugin/using-cinder-csi-plugin.md#driver-config) with the goal to disable the ignore-volume-az flag:
	  [...]
	  config: |
		[Global]
		secret-name = openstack-credentials
		secret-namespace = kube-system
		region = regionOne
		ca-file = /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem
		[LoadBalancer]
		use-octavia = True

		[BlockStorage]
		node-volume-attach-limit = 5
		ignore-volume-az = no

		[Metadata]
		search-order = metadataService

	kind: ConfigMap
	metadata:
	  creationTimestamp: "2022-06-07T13:53:31Z"
	  name: cloud-provider-config
	  namespace: openshift-config
	  resourceVersion: "1609"
	  uid: 8cd96602-c562-4d9e-917d-82360428e40f

Re-configuration starts:

	$ date
	Wed Jun  8 10:39:38 UTC 2022
	$ oc get nodes
	NAME                          STATUS                     ROLES    AGE   VERSION
	ostest-6w4hg-master-0         Ready,SchedulingDisabled   master   76m   v1.24.0+bb9c2f1
	ostest-6w4hg-master-1         Ready                      master   76m   v1.24.0+bb9c2f1
	ostest-6w4hg-master-2         Ready                      master   76m   v1.24.0+bb9c2f1
	ostest-6w4hg-worker-0-tzjpg   Ready,SchedulingDisabled   worker   62m   v1.24.0+bb9c2f1
	ostest-6w4hg-worker-1-rjx42   Ready                      worker   63m   v1.24.0+bb9c2f1
	ostest-6w4hg-worker-2-hmrvk   Ready                      worker   61m   v1.24.0+bb9c2f1

Re-configuration ends:

	$ date 
	Wed Jun  8 11:07:04 UTC 2022
	$ oc get nodes
	NAME                          STATUS   ROLES    AGE    VERSION
	ostest-6w4hg-master-0         Ready    master   104m   v1.24.0+bb9c2f1
	ostest-6w4hg-master-1         Ready    master   104m   v1.24.0+bb9c2f1
	ostest-6w4hg-master-2         Ready    master   104m   v1.24.0+bb9c2f1
	ostest-6w4hg-worker-0-tzjpg   Ready    worker   89m    v1.24.0+bb9c2f1
	ostest-6w4hg-worker-1-rjx42   Ready    worker   91m    v1.24.0+bb9c2f1
	ostest-6w4hg-worker-2-hmrvk   Ready    worker   89m    v1.24.0+bb9c2f1

Change is applied:

	$ oc get cm -n openshift-cluster-csi-drivers cloud-conf -o yaml
	apiVersion: v1
	data:
	  cloud.conf: |+
		[Global]
		region      = regionOne
		ca-file     = /etc/kubernetes/static-pod-resources/configmaps/cloud-config/ca-bundle.pem
		use-clouds  = true
		clouds-file = /etc/kubernetes/secret/clouds.yaml
		cloud       = openstack

		[LoadBalancer]
		use-octavia = True

		[BlockStorage]
		node-volume-attach-limit = 5
		ignore-volume-az         = no

	kind: ConfigMap
	metadata:
	  creationTimestamp: "2022-06-08T09:26:41Z"
	  name: cloud-conf
	  namespace: openshift-cluster-csi-drivers
	  resourceVersion: "56619"
	  uid: e1ca6a43-8e63-49f3-a5d4-26c8761a2b55

Creating again the same pod running in a nova AZ using a PVC on a cinder AZ with different names is stuck in pending (meaning the 'ignore-volume-az = no' is working fine):

	
	$ oc get pods -o wide
	NAME                      READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
	demo-0-78677fb76d-bwrxq   0/1     Pending   0          82s   <none>   <none>   <none>           <none>
	$ oc get pvc -o wide
	NAME    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS       AGE   VOLUMEMODE
	pvc-0   Bound    pvc-18394212-5ac8-47f6-a88d-6036f64dd5ad   1Gi        RWO            topology-aware-0   86s   Filesystem
	(shiftstack) [stack@undercloud-0 ~]$ openstack volume show pvc-18394212-5ac8-47f6-a88d-6036f64dd5ad -c 'availability_zone'
	+-------------------+-----------+
	| Field             | Value     |
	+-------------------+-----------+
	| availability_zone | cinderAZ0 |
	+-------------------+-----------+
	$ oc describe pod/demo-0-78677fb76d-bwrxq | tail -5
	Events:
	  Type     Reason            Age    From               Message
	  ----     ------            ----   ----               -------
	  Warning  FailedScheduling  2m17s  default-scheduler  running PreBind plugin "VolumeBinding": binding volumes: pv "pvc-18394212-5ac8-47f6-a88d-6036f64dd5ad" node affinity doesn't match node "ostest-6w4hg-worker-0-tzjpg": no matching NodeSelectorTerms
	  Warning  FailedScheduling  2m15s  default-scheduler  0/6 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 4 node(s) didn't match Pod's node affinity/selector, 6 node(s) had volume node affinity conflict. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.

Comment 8 errata-xmlrpc 2022-08-10 10:54:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.