Bug 2186145 - [OCS Client Operator] csi-rbdplugin stuck in CrashLoopBackOff
Summary: [OCS Client Operator] csi-rbdplugin stuck in CrashLoopBackOff
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: build
Version: 4.13
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Tamil
QA Contact: Petr Balogh
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-12 08:32 UTC by Rewant
Modified: 2023-08-09 16:37 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-04-17 09:12:10 UTC
Embargoed:


Attachments (Terms of Use)

Description Rewant 2023-04-12 08:32:10 UTC
Description of problem:
The csi-rbdplugin pod is stuck in CrashLoopBackOff state

resoni:~/redhat/odf-managed-service-migration(deatch)$ oc get pods | grep csi-rbd
csi-rbdplugin-kwqd8                                             2/3     CrashLoopBackOff   37 (2m15s ago)   167m
csi-rbdplugin-provisioner-6b98546ccb-gg47n                      5/5     Running            0                167m
csi-rbdplugin-provisioner-6b98546ccb-h85x7                      5/5     Running            0                167m
csi-rbdplugin-ql2xr                                             2/3     CrashLoopBackOff   37 (3m13s ago)   167m
csi-rbdplugin-vwl5z                                             2/3     CrashLoopBackOff   37 (3m35s ago)   167m

  Normal   Pulled     167m                    kubelet            Container image "registry.redhat.io/odf4/cephcsi-rhel8@sha256:f0ebadcad72a0733bd37be09914136fee2b25fd91ae2488919b0c5b2a2a407e0" already present on machine
  Normal   Created    167m                    kubelet            Created container csi-rbdplugin
  Normal   Started    167m                    kubelet            Started container csi-rbdplugin
  Normal   Created    166m (x4 over 167m)     kubelet            Created container csi-addons
  Normal   Started    166m (x4 over 167m)     kubelet            Started container csi-addons
  Warning  BackOff    7m32s (x744 over 167m)  kubelet            Back-off restarting failed container
  Normal   Pulled     2m32s (x38 over 167m)   kubelet            Container image "registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel8@sha256:6cd79e0674fb244fac02dc871ddcca09ff7ffdbcbf5f52fa53e55667c1b6d113" already present on machine


resoni:~/redhat/odf-managed-service-migration(deatch)$ oc logs csi-rbdplugin-ql2xr -ccsi-addons
F0412 08:27:03.987481  264778 main.go:58] Failed to validate controller endpoint: %!w(*errors.errorString=&{invalid controller ip address ""})
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0x1)
	/remote-source/app/vendor/k8s.io/klog/v2/klog.go:860 +0x8a
k8s.io/klog/v2.(*loggingT).output(0x257b1c0, 0x3, 0x0, 0xc00052a770, 0x1, {0x1cf65ba, 0x1}, 0x257b880, 0x0)
	/remote-source/app/vendor/k8s.io/klog/v2/klog.go:825 +0x686
k8s.io/klog/v2.(*loggingT).printfDepth(0x257b1c0, 0x4, 0x0, {0x0, 0x0}, 0x58, {0x177d314, 0x2a}, {0xc000518160, 0x1, ...})
	/remote-source/app/vendor/k8s.io/klog/v2/klog.go:630 +0x1f2
k8s.io/klog/v2.(*loggingT).printf(...)
	/remote-source/app/vendor/k8s.io/klog/v2/klog.go:612
k8s.io/klog/v2.Fatalf(...)
	/remote-source/app/vendor/k8s.io/klog/v2/klog.go:1516
main.main()
	/remote-source/app/sidecar/main.go:58 +0x3e7



Version-Release number of selected component (if applicable):
resoni:~/redhat/odf-managed-service-migration(deatch)$ oc get csv
NAME                                         DISPLAY                            VERSION             REPLACES                                  PHASE
observability-operator.v0.0.20               Observability Operator             0.0.20              observability-operator.v0.0.19            Succeeded
ocs-client-operator.v4.13.0-130.stable       OpenShift Data Foundation Client   4.13.0-130.stable                                             Succeeded
odf-csi-addons-operator.v4.13.0-130.stable   CSI Addons                         4.13.0-130.stable                                             Succeeded
route-monitor-operator.v0.1.493-a866e7c      Route Monitor Operator             0.1.493-a866e7c     route-monitor-operator.v0.1.489-7d9fe90   Succeed

How reproducible:
1/1

Steps to Reproduce:
1.
2.
3.

Actual results:
csi-rbdplugin pods are in CrashLoopBackOff state

Expected results:
csi-rbdplugin should be in running state

Additional info:

Comment 1 Madhu Rajanna 2023-04-12 08:59:03 UTC
```
[đŸŽŠī¸Ž]mrajanna@fedora test2 $]oc get cm ocs-client-operator-csi-images -oyaml
apiVersion: v1
data:
  csi-images.yaml: |
    ---
    - version: v4.11
      containerImages:
        provisionerImageURL: "registry.redhat.io/openshift4/ose-csi-external-provisioner@sha256:f9557586ec491e56d8c61b9aed238973df7f37e9aac0552ab363e44beed0589c"
        attacherImageURL: "registry.redhat.io/openshift4/ose-csi-external-attacher-rhel8@sha256:cb2cf8b141c03b1cab4e8ccd44040a3094d29f14b7303e2f93860177d8ebc194"
        resizerImageURL: "registry.redhat.io/openshift4/ose-csi-external-resizer@sha256:513ded6fd8afe672d4873ec04abdcf473809bb2cbad7de82d43e6e244958bac7"
        snapshotterImageURL: "registry.redhat.io/openshift4/ose-csi-external-snapshotter-rhel8@sha256:a6da5e8d135c2391b366ef517e35995b07d27fb3402d95f0bdfc08befa22505c"
        driverRegistrarImageURL: "registry.redhat.io/openshift4/ose-csi-node-driver-registrar@sha256:c2a61cc939fb595cb3deadcf4ec47bf6bcf3ebb43e6e8b839c8f7fc9074ef33e"
        cephCSIImageURL: "registry.redhat.io/odf4/cephcsi-rhel8@sha256:f0ebadcad72a0733bd37be09914136fee2b25fd91ae2488919b0c5b2a2a407e0"
        csiaddonsImageURL: "registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel8@sha256:6cd79e0674fb244fac02dc871ddcca09ff7ffdbcbf5f52fa53e55667c1b6d113"

    - version: v4.12
      containerImages:
        provisionerImageURL: "registry.redhat.io/openshift4/ose-csi-external-provisioner@sha256:9c16e6592b28d997bb0143b942405bb598b615c96366cb376f7490928fdc5fa2"
        attacherImageURL: "registry.redhat.io/openshift4/ose-csi-external-attacher-rhel8@sha256:be0b1a6efe7329c609023468257f65b4f0d4923de875fb254fdf835425356b9a"
        resizerImageURL: "registry.redhat.io/openshift4/ose-csi-external-resizer@sha256:d25e9fa7a238f1fb039e3404647f883003b810f4af32d254a88f62add08e56a6"
        snapshotterImageURL: "registry.redhat.io/openshift4/ose-csi-external-snapshotter-rhel8@sha256:9be4bb95d8ab9ff5189a09fdb73034e716ddbab2426c58ab237a53bba36d0853"
        driverRegistrarImageURL: "registry.redhat.io/openshift4/ose-csi-node-driver-registrar@sha256:579c48d339a8adc88bf102ebf4a36f5bce903872e7ceaed09b405b77b8167f8e"
        cephCSIImageURL: "registry.redhat.io/odf4/cephcsi-rhel8@sha256:8261812220fba8c647b5d23d359bef58b4c6710fd0c75a0c3d4bd99d4b88435a"
        csiaddonsImageURL: "registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel8@sha256:a24d3872fd745cdfa08300458f75b81050405fc60be34169ad0d7df8837a284a"

    - version: v4.13
      containerImages:
        provisionerImageURL: "registry.redhat.io/openshift4/ose-csi-external-provisioner@sha256:9c16e6592b28d997bb0143b942405bb598b615c96366cb376f7490928fdc5fa2"
        attacherImageURL: "registry.redhat.io/openshift4/ose-csi-external-attacher-rhel8@sha256:be0b1a6efe7329c609023468257f65b4f0d4923de875fb254fdf835425356b9a"
        resizerImageURL: "registry.redhat.io/openshift4/ose-csi-external-resizer@sha256:d25e9fa7a238f1fb039e3404647f883003b810f4af32d254a88f62add08e56a6"
        snapshotterImageURL: "registry.redhat.io/openshift4/ose-csi-external-snapshotter-rhel8@sha256:9be4bb95d8ab9ff5189a09fdb73034e716ddbab2426c58ab237a53bba36d0853"
        driverRegistrarImageURL: "registry.redhat.io/openshift4/ose-csi-node-driver-registrar@sha256:579c48d339a8adc88bf102ebf4a36f5bce903872e7ceaed09b405b77b8167f8e"
        cephCSIImageURL: "registry.redhat.io/odf4/cephcsi-rhel9@sha256:78e6f5e0de77aa557ebd61af0604b2806c4dcc8c1cda37f63d0fa5f5ab3f1bda"
        csiaddonsImageURL: "registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel9@sha256:2689ee3c9a945d3325d605a15ba4e39dad8cacbbb3cbb2afe518bfa73f637160"
kind: ConfigMap
metadata:
  creationTimestamp: "2023-04-12T05:42:10Z"
  labels:
    operators.coreos.com/ocs-client-operator.ocs-client-ns: ""
  name: ocs-client-operator-csi-images
  namespace: ocs-client-ns
  ownerReferences:
  - apiVersion: operators.coreos.com/v1alpha1
    blockOwnerDeletion: false
    controller: false
    kind: ClusterServiceVersion
    name: ocs-client-operator.v4.13.0-130.stable
    uid: 1c78d9d0-32b0-44f0-81b8-ae3477f2f24c
  resourceVersion: "113869"
  uid: a55c176b-f54d-4b88-88b3-17f5a20d8f45
[đŸŽŠī¸Ž]mrajanna@fedora test2 $]
[đŸŽŠī¸Ž]mrajanna@fedora test2 $]
[đŸŽŠī¸Ž]mrajanna@fedora test2 $]
[đŸŽŠī¸Ž]mrajanna@fedora test2 $]oc logs po/clogs po/csi-rbdplugin-kwqd8 -c csi-addons
F0412 08:28:01.904986  841171 main.go:58] Failed to validate controller endpoint: %!w(*errors.errorString=&{invalid controller ip address ""})
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0x1)
	/remote-source/app/vendor/k8s.io/klog/v2/klog.go:860 +0x8a
k8s.io/klog/v2.(*loggingT).output(0x257b1c0, 0x3, 0x0, 0xc0005aa770, 0x1, {0x1cf65ba, 0x1}, 0x257b880, 0x0)
	/remote-source/app/vendor/k8s.io/klog/v2/klog.go:825 +0x686
k8s.io/klog/v2.(*loggingT).printfDepth(0x257b1c0, 0x4, 0x0, {0x0, 0x0}, 0x58, {0x177d314, 0x2a}, {0xc000598160, 0x1, ...})
	/remote-source/app/vendor/k8s.io/klog/v2/klog.go:630 +0x1f2
k8s.io/klog/v2.(*loggingT).printf(...)
	/remote-source/app/vendor/k8s.io/klog/v2/klog.go:612
k8s.io/klog/v2.Fatalf(...)
	/remote-source/app/vendor/k8s.io/klog/v2/klog.go:1516
main.main()
	/remote-source/app/sidecar/main.go:58 +0x3e7

```


```
[đŸŽŠī¸Ž]mrajanna@fedora test2 $]oc logs po/csi-rbdplugin-kwqd8 -c csi-rbdplugin
I0412 05:42:59.283023  226585 cephcsi.go:182] Driver version: release-4.11 and Git version: eb91a378e3d1b7a6af2b5c1c109af3a3b7798234
I0412 05:42:59.283248  226585 cephcsi.go:200] Initial PID limit is set to 100015
I0412 05:42:59.381867  226585 cephcsi.go:206] Reconfigured PID limit to -1 (max)
I0412 05:42:59.382174  226585 cephcsi.go:231] Starting driver type: rbd with name: ocs-client-ns.rbd.csi.ceph.com
I0412 05:42:59.382560  226585 server.go:114] listening for CSI-Addons requests on address: &net.UnixAddr{Name:"/csi/csi-addons.sock", Net:"unix"}
I0412 05:42:59.683408  226585 mount_linux.go:218] Cannot run systemd-run, assuming non-systemd OS
I0412 05:42:59.683434  226585 mount_linux.go:219] systemd-run output: System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
, failed with: exit status 1
I0412 05:42:59.683525  226585 rbd_attach.go:231] nbd module loaded
I0412 05:42:59.683553  226585 rbd_attach.go:245] kernel version "4.18.0-372.43.1.el8_6.x86_64" supports cookie feature
W0412 05:42:59.683619  226585 rbd_attach.go:251] running rbd-nbd --help failed with error:an error (exec: "rbd-nbd": executable file not found in $PATH) occurred while running rbd-nbd args: [--help], stderr:
I0412 05:42:59.683867  226585 server.go:126] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
I0412 05:43:00.083808  226585 utils.go:191] ID: 1 GRPC call: /csi.v1.Identity/GetPluginInfo
I0412 05:43:00.085016  226585 utils.go:195] ID: 1 GRPC request: {}
I0412 05:43:00.085036  226585 identityserver-default.go:38] ID: 1 Using default GetPluginInfo
I0412 05:43:00.085083  226585 utils.go:202] ID: 1 GRPC response: {"name":"ocs-client-ns.rbd.csi.ceph.com","vendor_version":"release-4.11"}
I0412 05:43:00.902127  226585 utils.go:191] ID: 2 GRPC call: /csi.v1.Node/NodeGetInfo
I0412 05:43:00.902173  226585 utils.go:195] ID: 2 GRPC request: {}
I0412 05:43:00.902180  226585 nodeserver-default.go:46] ID: 2 Using default NodeGetInfo
I0412 05:43:00.902254  226585 utils.go:202] ID: 2 GRPC response: {"accessible_topology":{},"node_id":"ip-10-0-12-43.ap-south-1.compute.internal"}
```


looks like we are setting cephcsi and csiaddons image tags based on the OCP version. If the OCP version is 4.11, we are pulling CSI 4.11. we need to use the cephcsi/csiaddons latest version on all the OCP releases. Only we should choose the sidecar version based on the OCP version. @Ohad Mitrani are we on the same page here?


Moving it to build team to use the latest csi and csiaddons sidecar in ocs-client-operator configurations (images.yaml)

Comment 3 Ohad 2023-04-17 07:26:44 UTC
As this was offline, the strategy of taking newer CSIs and running them on older OCPs is an untested scenario. 
I think we might want to stick to tested CSI versions for any specific OCP release.

If the concern is that new features will not be available for older OCPs, I believe that is fine. We never claim forward compatibility and if the user wants access to newer features then they should upgrade to a later OCP.


Note You need to log in before you can comment on or make changes to this bug.