Bug 1929733
| Summary: | oVirt CSI driver operator is constantly restarting | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Benny Zlotnik <bzlotnik> |
| Component: | Storage | Assignee: | Benny Zlotnik <bzlotnik> |
| Storage sub component: | oVirt CSI Driver | QA Contact: | michal <mgold> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | urgent | CC: | aos-bugs, bbennett, lleistne, mburman, mgold, pelauter |
| Version: | 4.7 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-27 22:45:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1929777 | ||
steps to reproduce: 1) oc project openshift-cluster-csi-drivers 2) oc status In project openshift-cluster-csi-drivers on server https://api.primary.ocp.rhev.lab.eng.brq.redhat.com:6443 deployment/ovirt-csi-driver-controller deploys quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0feb29efe901393bf80594af53ec8bbef34bbc6303c71cdfb7c779bacc461531,quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:def80d6439c31c03f4d5e5bfa4f209bddfd3b7423d38d90f483a1ad1a10c0e01,quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b3a0f319143cdd04122e50490ffa60e93024e18ace3c105041c432f2daf961fa,quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f14455d69f404747e4458528744fd8ab9c2f5243004b3f7bff0323e73072b681 deployment #1 running for 13 days - 1 pod deployment/ovirt-csi-driver-operator deploys quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:86a675ddbace0069c6d860629724f1dcebccc639fc032093afa04ec7e13b1940 deployment #1 running for 13 days - 1 pod (warning: 974 restarts) 3) we got warning 4) [root@ocp-qe-1 primary]# oc get pods NAME READY STATUS RESTARTS AGE ovirt-csi-driver-controller-7db477884c-tflht 4/4 Running 0 7d14h ovirt-csi-driver-node-8qnxw 3/3 Running 0 13d ovirt-csi-driver-node-h5xvc 3/3 Running 0 13d ovirt-csi-driver-node-jtf7s 3/3 Running 1 13d ovirt-csi-driver-node-lnxmx 3/3 Running 0 13d ovirt-csi-driver-node-sg2td 3/3 Running 0 13d ovirt-csi-driver-node-wvnbm 3/3 Running 0 13d ovirt-csi-driver-operator-89d7bb77b-rn2m5 1/1 Running 975 7d14h 5) oc logs pod/ovirt-csi-driver-operator-89d7bb77b-rn2m5 -n openshift-cluster-csi-drivers 6) oc describe pod/ovirt-csi-driver-operator-89d7bb77b-rn2m5 -n openshift-cluster-csi-drivers This is not a blocker for OCP 4.7.0, but need to be fixed in the first available OCP 4.7.z stream. ocp: 4.8.0-0.nightly-2021-02-22-111248 ovirt: 4.4.2.6-1.el8 steps to reproduce: 1) install 4.8 cluster 2) oc project openshift-cluster-csi-drivers 3) oc status - > I don't see any warning 4) oc get pods NAME READY STATUS RESTARTS AGE ovirt-csi-driver-controller-5bcbbd4c47-7kvld 4/4 Running 0 127m ovirt-csi-driver-node-4r2mg 3/3 Running 0 127m ovirt-csi-driver-node-6tx6f 3/3 Running 0 127m ovirt-csi-driver-node-7prgg 3/3 Running 1 112m ovirt-csi-driver-node-bf54l 3/3 Running 0 113m ovirt-csi-driver-node-r5jxd 3/3 Running 0 127m ovirt-csi-driver-operator-8487469d4f-j72ft 1/1 Running 1 127m there is no pod that do a lot of restart Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |
Description of problem: The oVirt CSI driver operator is constantly restarting since it's inception Containers: ovirt-csi-driver-operator: Container ID: cri-o://4aded21619cec53cd9c6c06ffd1988909059d66acc23720a0895431ce775968d Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:86a675ddbace0069c6d860629724f1dcebccc639fc032093afa04ec7e13b1940 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:86a675ddbace0069c6d860629724f1dcebccc639fc032093afa04ec7e13b1940 Port: <none> Host Port: <none> Args: start --node=$(KUBE_NODE_NAME) -v=2 State: Running Started: Wed, 17 Feb 2021 13:12:15 +0200 Last State: Terminated Reason: Error Exit Code: 1 Started: Wed, 17 Feb 2021 13:01:08 +0200 Finished: Wed, 17 Feb 2021 13:12:14 +0200 Ready: True Restart Count: 945 This happens because configInformers in the operator code were not started[1], as a result the ConfigObserver to sync the cache The error in the operator log: 706171 1 shared_informer.go:266] stop requested 707427 1 base_controller.go:95] unable to sync caches for ConfigObserver Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Should happen on any cluster >4.7 with ovirt csi driver operator 2. 3. Actual results: ovirt CSI driver operator pod keeps restarting Expected results: The operator should not restart unless there is a real issue [1] https://github.com/openshift/ovirt-csi-driver-operator/blob/master/pkg/operator/starter.go#L128