Bug 2035027 - IBM VPC block CSI driver uses deprecated in-tree topology keys
Summary: IBM VPC block CSI driver uses deprecated in-tree topology keys
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.10
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: ---
Assignee: OpenShift Storage Bugzilla Bot
QA Contact: Chao Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-22 18:35 UTC by Jonathan Dobson
Modified: 2023-02-10 16:38 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-02-10 16:38:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jonathan Dobson 2021-12-22 18:35:58 UTC
The IBM VPC block CSI driver is using the following zone and region labels:

https://github.com/IBM/ibm-csi-common/blob/ce654faf168d6c4d9f90c5d8bc99ee4b2bd33ea2/pkg/utils/constants.go#L55-L59

	// NodeZoneLabel  Zone Label attached to node
	NodeZoneLabel = "failure-domain.beta.kubernetes.io/zone"

	// NodeRegionLabel Region Label attached to node
	NodeRegionLabel = "failure-domain.beta.kubernetes.io/region"


Those labels are used by ibm-vpc-block-csi-driver and ibm-vpc-node-label-updater:

https://github.com/openshift/ibm-vpc-block-csi-driver/blob/d54e3706bb8b38447800aa91632a946eb6c990ec/pkg/ibmcsidriver/controller_helper.go#L440-L441

https://github.com/openshift/ibm-vpc-node-label-updater/blob/8e220983b2c2efdfb67eabe4d74cb35bbeca552e/pkg/nodeupdater/utils.go#L41-L42


And as a result, the e2e test manifest for the operator references the same label:

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/32c4cb8817becc47133d3b0a556cdd706117412a/test/e2e/manifest.yaml#L17


But those are deprecated in-tree labels:

https://github.com/kubernetes/kubernetes/blob/ce9219688ff46b59cc210d880c4cd3af15516c73/staging/src/k8s.io/cloud-provider/cloud.go#L306


They should be updated to provider-specific labels for IBM cloud, similar to what AWS and GCP do:

https://github.com/openshift/aws-ebs-csi-driver-operator/blob/ef20d086a1efcbe9f6b1b716de83c1cc734b6519/test/e2e/manifest.yaml#L17
https://github.com/openshift/gcp-pd-csi-driver-operator/blob/10a76a928fe316537bd86e208e79a302e2095a5d/test/e2e/manifest.yaml#L17

Comment 1 Jonathan Dobson 2022-03-28 17:25:32 UTC
Trying to get a passing test run on https://github.com/openshift/release/pull/24720 and the topology tests in e2e-ibmcloud-csi are failing each time. I think it's related to this bug?

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/24720/rehearse-24720-pull-ci-openshift-ibm-vpc-block-csi-driver-operator-master-e2e-ibmcloud-csi/1508442934234583040

: External Storage [Driver: vpc.block.csi.ibm.io] [Testpattern: Dynamic PV (delayed binding)] topology should provision a volume and schedule a pod with AllowedTopologies expand_less	6m36s
{  fail [k8s.io/kubernetes.0/test/e2e/storage/testsuites/topology.go:180]: Unexpected error:
    <*errors.errorString | 0xc000300c60>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred}
open stdoutopen_in_new
: External Storage [Driver: vpc.block.csi.ibm.io] [Testpattern: Dynamic PV (immediate binding)] topology should provision a volume and schedule a pod with AllowedTopologies expand_less	6m26s
{  fail [k8s.io/kubernetes.0/test/e2e/storage/testsuites/topology.go:180]: Unexpected error:
    <*errors.errorString | 0xc000344c70>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred}
open stdoutopen_in_new

We find and use the failure-domain.beta.kubernetes.io/zone:eu-gb-2 domain in the PVC:

blob:https://prow.ci.openshift.org/119bde49-651c-46be-9c8f-4c50cf188191

Mar 28 15:06:48.855: INFO: found topology map[failure-domain.beta.kubernetes.io/zone:eu-gb-1]
Mar 28 15:06:48.855: INFO: found topology map[failure-domain.beta.kubernetes.io/zone:eu-gb-2]
Mar 28 15:06:48.953: INFO: Creating storage class object and pvc object for driver - sc: &StorageClass{ObjectMeta:{e2e-topology-8510-e2e-scj6dv6    1f53768b-fdf4-4ca6-a729-183a411e314f  0 2022-03-28 14:30:04 +0000 UTC <nil> <nil> map[addonmanager.kubernetes.io/mode:Reconcile app:ibm-vpc-block-csi-driver razee/force-apply:true] map[storageclass.kubernetes.io/is-default-class:true] [] []  [{ibm-vpc-block-csi-driver-operator Update storage.k8s.io/v1 2022-03-28 14:30:04 +0000 UTC FieldsV1 {"f:allowVolumeExpansion":{},"f:metadata":{"f:annotations":{".":{},"f:storageclass.kubernetes.io/is-default-class":{}},"f:labels":{".":{},"f:addonmanager.kubernetes.io/mode":{},"f:app":{},"f:razee/force-apply":{}}},"f:parameters":{".":{},"f:csi.storage.k8s.io/fstype":{},"f:encrypted":{},"f:encryptionKey":{},"f:profile":{},"f:region":{},"f:resourceGroup":{},"f:tags":{},"f:zone":{}},"f:provisioner":{},"f:reclaimPolicy":{},"f:volumeBindingMode":{}} }]},Provisioner:vpc.block.csi.ibm.io,Parameters:map[string]string{csi.storage.k8s.io/fstype: ext4,encrypted: false,encryptionKey: ,profile: 10iops-tier,region: ,resourceGroup: ,tags: ,zone: ,},ReclaimPolicy:*Delete,MountOptions:[],AllowVolumeExpansion:*true,VolumeBindingMode:*WaitForFirstConsumer,AllowedTopologies:[]TopologySelectorTerm{{[{failure-domain.beta.kubernetes.io/zone [eu-gb-2]}]},},}, pvc: &PersistentVolumeClaim{ObjectMeta:{ pvc- e2e-topology-8510    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []  []},Spec:PersistentVolumeClaimSpec{AccessModes:[ReadWriteOnce],Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{storage: {{10737418240 0} {<nil>} 10Gi BinarySI},},},VolumeName:,Selector:nil,StorageClassName:*e2e-topology-8510-e2e-scj6dv6,VolumeMode:nil,DataSource:nil,DataSourceRef:nil,},Status:PersistentVolumeClaimStatus{Phase:,AccessModes:[],Capacity:ResourceList{},Conditions:[]PersistentVolumeClaimCondition{},AllocatedResources:ResourceList{},ResizeStatus:nil,},}

But the pod fails to start with "no matching NodeSelectorTerms":

STEP: Collecting events from namespace "e2e-topology-8510".
STEP: Found 8 events.
Mar 28 15:13:22.131: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for pod-b2efbccd-6e14-48f7-8c79-5238e5764a53: { } FailedScheduling: running PreBind plugin "VolumeBinding": binding volumes: pv "pvc-9a1ebaea-cf98-4647-91b3-76db18ba2c58" node affinity doesn't match node "ci-op-vpizx7dl-baab4-d7chf-worker-2-tgfwq": no matching NodeSelectorTerms
Mar 28 15:13:22.131: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for pod-b2efbccd-6e14-48f7-8c79-5238e5764a53: { } FailedScheduling: 0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 3 node(s) had volume node affinity conflict.
Mar 28 15:13:22.131: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for pod-b2efbccd-6e14-48f7-8c79-5238e5764a53: { } FailedScheduling: 0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 3 node(s) had volume node affinity conflict.
Mar 28 15:13:22.131: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for pod-b2efbccd-6e14-48f7-8c79-5238e5764a53: { } FailedScheduling: skip schedule deleting pod: e2e-topology-8510/pod-b2efbccd-6e14-48f7-8c79-5238e5764a53
Mar 28 15:13:22.131: INFO: At 2022-03-28 15:06:49 +0000 UTC - event for pvc-t4bhj: {persistentvolume-controller } WaitForFirstConsumer: waiting for first consumer to be created before binding
Mar 28 15:13:22.131: INFO: At 2022-03-28 15:06:49 +0000 UTC - event for pvc-t4bhj: {vpc.block.csi.ibm.io_ibm-vpc-block-csi-controller-6496fb9dff-sqhfn_b57afa79-91de-4dfc-a1aa-0a82953825c6 } Provisioning: External provisioner is provisioning volume for claim "e2e-topology-8510/pvc-t4bhj"
Mar 28 15:13:22.131: INFO: At 2022-03-28 15:06:49 +0000 UTC - event for pvc-t4bhj: {persistentvolume-controller } ExternalProvisioning: waiting for a volume to be created, either by external provisioner "vpc.block.csi.ibm.io" or manually created by system administrator
Mar 28 15:13:22.131: INFO: At 2022-03-28 15:07:14 +0000 UTC - event for pvc-t4bhj: {vpc.block.csi.ibm.io_ibm-vpc-block-csi-controller-6496fb9dff-sqhfn_b57afa79-91de-4dfc-a1aa-0a82953825c6 } ProvisioningSucceeded: Successfully provisioned volume pvc-9a1ebaea-cf98-4647-91b3-76db18ba2c58
Mar 28 15:13:22.226: INFO: POD  NODE  PHASE  GRACE  CONDITIONS
Mar 28 15:13:22.226: INFO: 
Mar 28 15:13:22.416: INFO: skipping dumping cluster info - cluster too large
STEP: Destroying namespace "e2e-topology-8510" for this suite.
fail [k8s.io/kubernetes.0/test/e2e/storage/testsuites/topology.go:180]: Unexpected error:
    <*errors.errorString | 0xc000300c60>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred

Comment 2 Jonathan Dobson 2022-04-08 23:17:11 UTC
(In reply to Jonathan Dobson from comment #1)
> Trying to get a passing test run on
> https://github.com/openshift/release/pull/24720 and the topology tests in
> e2e-ibmcloud-csi are failing each time. I think it's related to this bug?

Nope, this turned out to be a completely separate issue.
See https://bugzilla.redhat.com/show_bug.cgi?id=2073617 for details.

Leaving this bug (2035027) open to address the original issue of using deprecated failure-domain.beta labels.


Note You need to log in before you can comment on or make changes to this bug.