Bug 2108473
Summary: | [vSphere CSI driver operator] CSI controller pod restarting constantly | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Miguel Blach <mblach> |
Component: | Storage | Assignee: | Hemant Kumar <hekumar> |
Storage sub component: | Operators | QA Contact: | Wei Duan <wduan> |
Status: | CLOSED ERRATA | Docs Contact: | Olivia Payne <opayne> |
Severity: | low | ||
Priority: | unspecified | CC: | hekumar, jsafrane, opayne, parodrig |
Version: | 4.10 | ||
Target Milestone: | --- | ||
Target Release: | 4.12.0 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
* Previously, if more than one secret was present for vSphere, the vSphere CSI Operator randomly picked a secret and sometimes caused the Operator to restart. With this update, a warning appears when there is more than one secret on the vCenter CSI Operator. (link:https://bugzilla.redhat.com/show_bug.cgi?id=2108473[*BZ#2108473*])
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2023-01-17 19:53:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Miguel Blach
2022-07-19 07:18:59 UTC
Still unsure what is causing this behaviour but I found following very strange logs in KCM logs: ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:12.324375839Z I0713 05:35:12.324358 1 replica_set.go:563] "Too few replicas" replicaSet="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-6c47778856" need=2 creating=1 ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:12.352374793Z I0713 05:35:12.352333 1 deployment_controller.go:490] "Error syncing deployment" deployment="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" err="Operation cannot be fulfilled on deployments.apps \"vmware-vsphere-csi-driver-controller\": the object has been modified; please apply your changes to the latest version and try again" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:12.362557790Z I0713 05:35:12.362517 1 event.go:294] "Event occurred" object="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-6c47778856" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: vmware-vsphere-csi-driver-controller-6c47778856-bn42l" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:12.385904435Z I0713 05:35:12.385838 1 deployment_controller.go:490] "Error syncing deployment" deployment="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" err="Operation cannot be fulfilled on deployments.apps \"vmware-vsphere-csi-driver-controller\": the object has been modified; please apply your changes to the latest version and try again" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.266629870Z I0713 05:35:13.266578 1 deployment_controller.go:490] "Error syncing deployment" deployment="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" err="Operation cannot be fulfilled on replicasets.apps \"vmware-vsphere-csi-driver-controller-5f768dbffb\": the object has been modified; please apply your changes to the latest version and try again" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.292255781Z I0713 05:35:13.292210 1 replica_set.go:599] "Too many replicas" replicaSet="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-6c47778856" need=1 deleting=1 ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.292284049Z I0713 05:35:13.292257 1 replica_set.go:227] "Found related ReplicaSets" replicaSet="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-6c47778856" relatedReplicaSets=[vmware-vsphere-csi-driver-controller-7d4d7dc494 vmware-vsphere-csi-driver-controller-6fcf8d669d vmware-vsphere-csi-driver-controller-5f768dbffb vmware-vsphere-csi-driver-controller-6c47778856] ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.292370500Z I0713 05:35:13.292350 1 controller_utils.go:592] "Deleting pod" controller="vmware-vsphere-csi-driver-controller-6c47778856" pod="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-6c47778856-bn42l" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.293773800Z I0713 05:35:13.293719 1 event.go:294] "Event occurred" object="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled down replica set vmware-vsphere-csi-driver-controller-6c47778856 to 1" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.303790403Z I0713 05:35:13.303742 1 deployment_controller.go:490] "Error syncing deployment" deployment="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" err="Operation cannot be fulfilled on deployments.apps \"vmware-vsphere-csi-driver-controller\": the object has been modified; please apply your changes to the latest version and try again" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.316508686Z I0713 05:35:13.316470 1 event.go:294] "Event occurred" object="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-6c47778856" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulDelete" message="Deleted pod: vmware-vsphere-csi-driver-controller-6c47778856-bn42l" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.327327787Z I0713 05:35:13.324156 1 replica_set.go:563] "Too few replicas" replicaSet="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-5f768dbffb" need=1 creating=1 ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.327327787Z I0713 05:35:13.325246 1 event.go:294] "Event occurred" object="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" kind="Deployment" apiVersion="apps/v1" type="Normal" reason="ScalingReplicaSet" message="Scaled up replica set vmware-vsphere-csi-driver-controller-5f768dbffb to 1" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.346966178Z I0713 05:35:13.346907 1 event.go:294] "Event occurred" object="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-5f768dbffb" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: vmware-vsphere-csi-driver-controller-5f768dbffb-476sn" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.348045597Z I0713 05:35:13.348001 1 deployment_controller.go:490] "Error syncing deployment" deployment="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" err="Operation cannot be fulfilled on deployments.apps \"vmware-vsphere-csi-driver-controller\": the object has been modified; please apply your changes to the latest version and try again" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:13.398192442Z I0713 05:35:13.398145 1 deployment_controller.go:490] "Error syncing deployment" deployment="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" err="Operation cannot be fulfilled on deployments.apps \"vmware-vsphere-csi-driver-controller\": the object has been modified; please apply your changes to the latest version and try again" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:14.262704020Z I0713 05:35:14.262661 1 deployment_controller.go:490] "Error syncing deployment" deployment="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller" err="Operation cannot be fulfilled on replicasets.apps \"vmware-vsphere-csi-driver-controller-6fcf8d669d\": the object has been modified; please apply your changes to the latest version and try again" ./namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ocp-l-77f2m-master-1/kube-controller-manager/kube-controller-manager/logs/current.log:2022-07-13T05:35:14.290775599Z I0713 05:35:14.290733 1 replica_set.go:599] "Too many replicas" replicaSet="openshift-cluster-csi-drivers/vmware-vsphere-csi-driver-controller-5f768dbffb" need=0 deleting=1 It appears that Replica count of deployment is fluctuating between 0, 1 and 2 very frequently. I am not sure why this could be happening. Hi Hemant, the customer did a comment about the upgrade method: They updated from 4.6 (via 4.8) to 4.10 Is there any info I may ask them to help you investigate this issue? Thanks for the update! Customer updated requested data: https://attachments.access.redhat.com/hydra/rest/cases/03243224/attachments/e8902454-afaf-4ae5-a255-2701eb179118?usePresignedUrl=true https://attachments.access.redhat.com/hydra/rest/cases/03243224/attachments/a8adad24-0aef-4f7c-babd-81f97a01c0f3?usePresignedUrl=true Please, let me know if I should ask any further data. Hi @hekumar @jsafrane Is there any other data I should ask the customer? Do we have everything we need to work on the bug? Thanks! This was happening because when more than one credential is present in secret then code can arbitrarily pick one of them and hence secrets may keep changing (and thereby causing deployment rollouts). For now - we are going to warn the users if this happens via - https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/104 Setting another vCenter hostname in secret like below: $ oc -n openshift-cluster-csi-drivers get secret vmware-vsphere-cloud-credentials -ojson | jq .data { "vcenter.xxx-1.vmwarevmc.com.password": "xxx", "vcenter.xxx-1.vmwarevmc.com.username": "xxx", "vcenter.xxx-2.vmwarevmc.com.password": "xxx", "vcenter.xxx-2.vmwarevmc.com.username": "xxx" } The vmware-vsphere-csi-driver-controller updateing could be reproduced: $ oc -n openshift-cluster-csi-drivers get deployment.apps/vmware-vsphere-csi-driver-controller -ojson | jq .metadata.generation;sleep 30;oc -n openshift-cluster-csi-drivers get deployment.apps/vmware-vsphere-csi-driver-controller -ojson | jq .metadata.generation 179 204 We could get the clear message from the operator log which could alert us for such unsupported configuration. W0824 05:59:00.732442 1 driver_starter.go:151] CSI driver can only connect to one vcenter, more than 1 set of credentials found for CSI driver W0824 05:59:01.113462 1 driver_starter.go:151] CSI driver can only connect to one vcenter, more than 1 set of credentials found for CSI driver W0824 05:59:02.280448 1 driver_starter.go:151] CSI driver can only connect to one vcenter, more than 1 set of credentials found for CSI driver W0824 05:59:02.299887 1 driver_starter.go:151] CSI driver can only connect to one vcenter, more than 1 set of credentials found for CSI driver Marked as "Verified". Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399 |