Description of problem: After upgrading OCP from 4.9 to 4.10 and making the appropiate changes for the CSI deployment(Changing VMX version and setting the proper permissions), the CSI driver got deployed. After deployment the CSI controller are constantly getting restarted and different replicasets exist for the controller: NAME DESIRED CURRENT READY AGE vmware-vsphere-csi-driver-controller-5f768dbffb 0 0 0 16m vmware-vsphere-csi-driver-controller-6c47778856 1 1 1 23m vmware-vsphere-csi-driver-controller-6fcf8d669d 0 0 0 17m vmware-vsphere-csi-driver-controller-7d4d7dc494 1 0 0 17m The provisioning and attachment operations are working fine so far. Version-Release number of selected component (if applicable): OCP 4.10.17 How reproducible: All the time in specific environment. Steps to Reproduce: 1. Upgrade OCP from 4.9 to 4.10 2. Make the required changes for the CSI deployment Actual results: CSI Controller pod constantly getting redeployed Expected results: CSI Controller pod not getting restarts. Additional info:
Still unsure what is causing this behaviour but I found following very strange logs in KCM logs: ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:12.324375839Z I0713 05:35:12.324358 1 replica_set.go:563] "Too few replicas" replicaSet="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-6c47778856" need=2 creating=1 ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:12.352374793Z I0713 05:35:12.352333 1 deployment_controller.go:490] "Error syncing deployment" deployment="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" err="Operation cannot be fulfilled on deployments.apps \"vmware-vsphere-csi-driver-controller\": the object has been modified; please apply your changes to the latest version and try again" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:12.362557790Z I0713 05:35:12.362517 1 event.go:294] "Event occurred" object="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-6c47778856" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: vmware-vsphere-csi-driver-controller-6c47778856-bn42l" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:12.385904435Z I0713 05:35:12.385838 1 deployment_controller.go:490] "Error syncing deployment" deployment="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" err="Operation cannot be fulfilled on deployments.apps \"vmware-vsphere-csi-driver-controller\": the object has been modified; please apply your changes to the latest version and try again" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.266629870Z I0713 05:35:13.266578 1 deployment_controller.go:490] "Error syncing deployment" deployment="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" err="Operation cannot be fulfilled on replicasets.apps \"vmware-vsphere-csi-driver-controller-5f768dbffb\": the object has been modified; please apply your changes to the latest version and try again" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.292255781Z I0713 05:35:13.292210 1 replica_set.go:599] "Too many replicas" replicaSet="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-6c47778856" need=1 deleting=1 ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.292284049Z I0713 05:35:13.292257 1 replica_set.go:227] "Found related ReplicaSets" replicaSet="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-6c47778856" relatedReplicaSets=[vmware-vsphere-csi-driver-controller-7d4d7dc494 vmware-vsphere-csi-driver-controller-6fcf8d669d vmware-vsphere-csi-driver-controller-5f768dbffb vmware-vsphere-csi-driver-controller-6c47778856] ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.292370500Z I0713 05:35:13.292350 1 controller_utils.go:592] "Deleting pod" controller="vmware-vsphere-csi-driver-controller-6c47778856" pod="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-6c47778856-bn42l" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.293773800Z I0713 05:35:13.293719 1 event.go:294] "Event occurred" object="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled down replica set vmware-vsphere-csi-driver-controller-6c47778856 to 1" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.303790403Z I0713 05:35:13.303742 1 deployment_controller.go:490] "Error syncing deployment" deployment="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" err="Operation cannot be fulfilled on deployments.apps \"vmware-vsphere-csi-driver-controller\": the object has been modified; please apply your changes to the latest version and try again" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.316508686Z I0713 05:35:13.316470 1 event.go:294] "Event occurred" object="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-6c47778856" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulDelete" message="Deleted pod: vmware-vsphere-csi-driver-controller-6c47778856-bn42l" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.327327787Z I0713 05:35:13.324156 1 replica_set.go:563] "Too few replicas" replicaSet="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-5f768dbffb" need=1 creating=1 ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.327327787Z I0713 05:35:13.325246 1 event.go:294] "Event occurred" object="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled up replica set vmware-vsphere-csi-driver-controller-5f768dbffb to 1" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.346966178Z I0713 05:35:13.346907 1 event.go:294] "Event occurred" object="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-5f768dbffb" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: vmware-vsphere-csi-driver-controller-5f768dbffb-476sn" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.348045597Z I0713 05:35:13.348001 1 deployment_controller.go:490] "Error syncing deployment" deployment="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" err="Operation cannot be fulfilled on deployments.apps \"vmware-vsphere-csi-driver-controller\": the object has been modified; please apply your changes to the latest version and try again" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.398192442Z I0713 05:35:13.398145 1 deployment_controller.go:490] "Error syncing deployment" deployment="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" err="Operation cannot be fulfilled on deployments.apps \"vmware-vsphere-csi-driver-controller\": the object has been modified; please apply your changes to the latest version and try again" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:14.262704020Z I0713 05:35:14.262661 1 deployment_controller.go:490] "Error syncing deployment" deployment="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" err="Operation cannot be fulfilled on replicasets.apps \"vmware-vsphere-csi-driver-controller-6fcf8d669d\": the object has been modified; please apply your changes to the latest version and try again" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:14.290775599Z I0713 05:35:14.290733 1 replica_set.go:599] "Too many replicas" replicaSet="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-5f768dbffb" need=0 deleting=1 It appears that Replica count of deployment is fluctuating between 0, 1 and 2 very frequently. I am not sure why this could be happening.
Hi Hemant, the customer did a comment about the upgrade method: They updated from 4.6 (via 4.8) to 4.10 Is there any info I may ask them to help you investigate this issue?
Thanks for the update! Customer updated requested data: https://attachments.access.redhat.com/hydra/rest/cases/03243224/attachments/e8902454-afaf-4ae5-a255-2701eb179118?usePresignedUrl=true https://attachments.access.redhat.com/hydra/rest/cases/03243224/attachments/a8adad24-0aef-4f7c-babd-81f97a01c0f3?usePresignedUrl=true Please, let me know if I should ask any further data.
Hi @hekumar @jsafrane Is there any other data I should ask the customer? Do we have everything we need to work on the bug? Thanks!
This was happening because when more than one credential is present in secret then code can arbitrarily pick one of them and hence secrets may keep changing (and thereby causing deployment rollouts). For now - we are going to warn the users if this happens via - https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/104
Setting another vCenter hostname in secret like below: $ oc -n openshift-cluster-csi-drivers get secret vmware-vsphere-cloud-credentials -ojson | jq .data { "vcenter.xxx-1.vmwarevmc.com.password": "xxx", "vcenter.xxx-1.vmwarevmc.com.username": "xxx", "vcenter.xxx-2.vmwarevmc.com.password": "xxx", "vcenter.xxx-2.vmwarevmc.com.username": "xxx" } The vmware-vsphere-csi-driver-controller updateing could be reproduced: $ oc -n openshift-cluster-csi-drivers get deployment.apps/vmware-vsphere-csi-driver-controller -ojson | jq .metadata.generation;sleep 30;oc -n openshift-cluster-csi-drivers get deployment.apps/vmware-vsphere-csi-driver-controller -ojson | jq .metadata.generation 179 204 We could get the clear message from the operator log which could alert us for such unsupported configuration. W0824 05:59:00.732442 1 driver_starter.go:151] CSI driver can only connect to one vcenter, more than 1 set of credentials found for CSI driver W0824 05:59:01.113462 1 driver_starter.go:151] CSI driver can only connect to one vcenter, more than 1 set of credentials found for CSI driver W0824 05:59:02.280448 1 driver_starter.go:151] CSI driver can only connect to one vcenter, more than 1 set of credentials found for CSI driver W0824 05:59:02.299887 1 driver_starter.go:151] CSI driver can only connect to one vcenter, more than 1 set of credentials found for CSI driver Marked as "Verified".
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399