Description of problem: The csi-rbdplugin pod is stuck in CrashLoopBackOff state resoni:~/redhat/odf-managed-service-migration(deatch)$ oc get pods | grep csi-rbd csi-rbdplugin-kwqd8 2/3 CrashLoopBackOff 37 (2m15s ago) 167m csi-rbdplugin-provisioner-6b98546ccb-gg47n 5/5 Running 0 167m csi-rbdplugin-provisioner-6b98546ccb-h85x7 5/5 Running 0 167m csi-rbdplugin-ql2xr 2/3 CrashLoopBackOff 37 (3m13s ago) 167m csi-rbdplugin-vwl5z 2/3 CrashLoopBackOff 37 (3m35s ago) 167m Normal Pulled 167m kubelet Container image "registry.redhat.io/odf4/cephcsi-rhel8@sha256:f0ebadcad72a0733bd37be09914136fee2b25fd91ae2488919b0c5b2a2a407e0" already present on machine Normal Created 167m kubelet Created container csi-rbdplugin Normal Started 167m kubelet Started container csi-rbdplugin Normal Created 166m (x4 over 167m) kubelet Created container csi-addons Normal Started 166m (x4 over 167m) kubelet Started container csi-addons Warning BackOff 7m32s (x744 over 167m) kubelet Back-off restarting failed container Normal Pulled 2m32s (x38 over 167m) kubelet Container image "registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel8@sha256:6cd79e0674fb244fac02dc871ddcca09ff7ffdbcbf5f52fa53e55667c1b6d113" already present on machine resoni:~/redhat/odf-managed-service-migration(deatch)$ oc logs csi-rbdplugin-ql2xr -ccsi-addons F0412 08:27:03.987481 264778 main.go:58] Failed to validate controller endpoint: %!w(*errors.errorString=&{invalid controller ip address ""}) goroutine 1 [running]: k8s.io/klog/v2.stacks(0x1) /remote-source/app/vendor/k8s.io/klog/v2/klog.go:860 +0x8a k8s.io/klog/v2.(*loggingT).output(0x257b1c0, 0x3, 0x0, 0xc00052a770, 0x1, {0x1cf65ba, 0x1}, 0x257b880, 0x0) /remote-source/app/vendor/k8s.io/klog/v2/klog.go:825 +0x686 k8s.io/klog/v2.(*loggingT).printfDepth(0x257b1c0, 0x4, 0x0, {0x0, 0x0}, 0x58, {0x177d314, 0x2a}, {0xc000518160, 0x1, ...}) /remote-source/app/vendor/k8s.io/klog/v2/klog.go:630 +0x1f2 k8s.io/klog/v2.(*loggingT).printf(...) /remote-source/app/vendor/k8s.io/klog/v2/klog.go:612 k8s.io/klog/v2.Fatalf(...) /remote-source/app/vendor/k8s.io/klog/v2/klog.go:1516 main.main() /remote-source/app/sidecar/main.go:58 +0x3e7 Version-Release number of selected component (if applicable): resoni:~/redhat/odf-managed-service-migration(deatch)$ oc get csv NAME DISPLAY VERSION REPLACES PHASE observability-operator.v0.0.20 Observability Operator 0.0.20 observability-operator.v0.0.19 Succeeded ocs-client-operator.v4.13.0-130.stable OpenShift Data Foundation Client 4.13.0-130.stable Succeeded odf-csi-addons-operator.v4.13.0-130.stable CSI Addons 4.13.0-130.stable Succeeded route-monitor-operator.v0.1.493-a866e7c Route Monitor Operator 0.1.493-a866e7c route-monitor-operator.v0.1.489-7d9fe90 Succeed How reproducible: 1/1 Steps to Reproduce: 1. 2. 3. Actual results: csi-rbdplugin pods are in CrashLoopBackOff state Expected results: csi-rbdplugin should be in running state Additional info:
``` [đŠī¸]mrajanna@fedora test2 $]oc get cm ocs-client-operator-csi-images -oyaml apiVersion: v1 data: csi-images.yaml: | --- - version: v4.11 containerImages: provisionerImageURL: "registry.redhat.io/openshift4/ose-csi-external-provisioner@sha256:f9557586ec491e56d8c61b9aed238973df7f37e9aac0552ab363e44beed0589c" attacherImageURL: "registry.redhat.io/openshift4/ose-csi-external-attacher-rhel8@sha256:cb2cf8b141c03b1cab4e8ccd44040a3094d29f14b7303e2f93860177d8ebc194" resizerImageURL: "registry.redhat.io/openshift4/ose-csi-external-resizer@sha256:513ded6fd8afe672d4873ec04abdcf473809bb2cbad7de82d43e6e244958bac7" snapshotterImageURL: "registry.redhat.io/openshift4/ose-csi-external-snapshotter-rhel8@sha256:a6da5e8d135c2391b366ef517e35995b07d27fb3402d95f0bdfc08befa22505c" driverRegistrarImageURL: "registry.redhat.io/openshift4/ose-csi-node-driver-registrar@sha256:c2a61cc939fb595cb3deadcf4ec47bf6bcf3ebb43e6e8b839c8f7fc9074ef33e" cephCSIImageURL: "registry.redhat.io/odf4/cephcsi-rhel8@sha256:f0ebadcad72a0733bd37be09914136fee2b25fd91ae2488919b0c5b2a2a407e0" csiaddonsImageURL: "registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel8@sha256:6cd79e0674fb244fac02dc871ddcca09ff7ffdbcbf5f52fa53e55667c1b6d113" - version: v4.12 containerImages: provisionerImageURL: "registry.redhat.io/openshift4/ose-csi-external-provisioner@sha256:9c16e6592b28d997bb0143b942405bb598b615c96366cb376f7490928fdc5fa2" attacherImageURL: "registry.redhat.io/openshift4/ose-csi-external-attacher-rhel8@sha256:be0b1a6efe7329c609023468257f65b4f0d4923de875fb254fdf835425356b9a" resizerImageURL: "registry.redhat.io/openshift4/ose-csi-external-resizer@sha256:d25e9fa7a238f1fb039e3404647f883003b810f4af32d254a88f62add08e56a6" snapshotterImageURL: "registry.redhat.io/openshift4/ose-csi-external-snapshotter-rhel8@sha256:9be4bb95d8ab9ff5189a09fdb73034e716ddbab2426c58ab237a53bba36d0853" driverRegistrarImageURL: "registry.redhat.io/openshift4/ose-csi-node-driver-registrar@sha256:579c48d339a8adc88bf102ebf4a36f5bce903872e7ceaed09b405b77b8167f8e" cephCSIImageURL: "registry.redhat.io/odf4/cephcsi-rhel8@sha256:8261812220fba8c647b5d23d359bef58b4c6710fd0c75a0c3d4bd99d4b88435a" csiaddonsImageURL: "registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel8@sha256:a24d3872fd745cdfa08300458f75b81050405fc60be34169ad0d7df8837a284a" - version: v4.13 containerImages: provisionerImageURL: "registry.redhat.io/openshift4/ose-csi-external-provisioner@sha256:9c16e6592b28d997bb0143b942405bb598b615c96366cb376f7490928fdc5fa2" attacherImageURL: "registry.redhat.io/openshift4/ose-csi-external-attacher-rhel8@sha256:be0b1a6efe7329c609023468257f65b4f0d4923de875fb254fdf835425356b9a" resizerImageURL: "registry.redhat.io/openshift4/ose-csi-external-resizer@sha256:d25e9fa7a238f1fb039e3404647f883003b810f4af32d254a88f62add08e56a6" snapshotterImageURL: "registry.redhat.io/openshift4/ose-csi-external-snapshotter-rhel8@sha256:9be4bb95d8ab9ff5189a09fdb73034e716ddbab2426c58ab237a53bba36d0853" driverRegistrarImageURL: "registry.redhat.io/openshift4/ose-csi-node-driver-registrar@sha256:579c48d339a8adc88bf102ebf4a36f5bce903872e7ceaed09b405b77b8167f8e" cephCSIImageURL: "registry.redhat.io/odf4/cephcsi-rhel9@sha256:78e6f5e0de77aa557ebd61af0604b2806c4dcc8c1cda37f63d0fa5f5ab3f1bda" csiaddonsImageURL: "registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel9@sha256:2689ee3c9a945d3325d605a15ba4e39dad8cacbbb3cbb2afe518bfa73f637160" kind: ConfigMap metadata: creationTimestamp: "2023-04-12T05:42:10Z" labels: operators.coreos.com/ocs-client-operator.ocs-client-ns: "" name: ocs-client-operator-csi-images namespace: ocs-client-ns ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: ClusterServiceVersion name: ocs-client-operator.v4.13.0-130.stable uid: 1c78d9d0-32b0-44f0-81b8-ae3477f2f24c resourceVersion: "113869" uid: a55c176b-f54d-4b88-88b3-17f5a20d8f45 [đŠī¸]mrajanna@fedora test2 $] [đŠī¸]mrajanna@fedora test2 $] [đŠī¸]mrajanna@fedora test2 $] [đŠī¸]mrajanna@fedora test2 $]oc logs po/clogs po/csi-rbdplugin-kwqd8 -c csi-addons F0412 08:28:01.904986 841171 main.go:58] Failed to validate controller endpoint: %!w(*errors.errorString=&{invalid controller ip address ""}) goroutine 1 [running]: k8s.io/klog/v2.stacks(0x1) /remote-source/app/vendor/k8s.io/klog/v2/klog.go:860 +0x8a k8s.io/klog/v2.(*loggingT).output(0x257b1c0, 0x3, 0x0, 0xc0005aa770, 0x1, {0x1cf65ba, 0x1}, 0x257b880, 0x0) /remote-source/app/vendor/k8s.io/klog/v2/klog.go:825 +0x686 k8s.io/klog/v2.(*loggingT).printfDepth(0x257b1c0, 0x4, 0x0, {0x0, 0x0}, 0x58, {0x177d314, 0x2a}, {0xc000598160, 0x1, ...}) /remote-source/app/vendor/k8s.io/klog/v2/klog.go:630 +0x1f2 k8s.io/klog/v2.(*loggingT).printf(...) /remote-source/app/vendor/k8s.io/klog/v2/klog.go:612 k8s.io/klog/v2.Fatalf(...) /remote-source/app/vendor/k8s.io/klog/v2/klog.go:1516 main.main() /remote-source/app/sidecar/main.go:58 +0x3e7 ``` ``` [đŠī¸]mrajanna@fedora test2 $]oc logs po/csi-rbdplugin-kwqd8 -c csi-rbdplugin I0412 05:42:59.283023 226585 cephcsi.go:182] Driver version: release-4.11 and Git version: eb91a378e3d1b7a6af2b5c1c109af3a3b7798234 I0412 05:42:59.283248 226585 cephcsi.go:200] Initial PID limit is set to 100015 I0412 05:42:59.381867 226585 cephcsi.go:206] Reconfigured PID limit to -1 (max) I0412 05:42:59.382174 226585 cephcsi.go:231] Starting driver type: rbd with name: ocs-client-ns.rbd.csi.ceph.com I0412 05:42:59.382560 226585 server.go:114] listening for CSI-Addons requests on address: &net.UnixAddr{Name:"/csi/csi-addons.sock", Net:"unix"} I0412 05:42:59.683408 226585 mount_linux.go:218] Cannot run systemd-run, assuming non-systemd OS I0412 05:42:59.683434 226585 mount_linux.go:219] systemd-run output: System has not been booted with systemd as init system (PID 1). Can't operate. Failed to create bus connection: Host is down , failed with: exit status 1 I0412 05:42:59.683525 226585 rbd_attach.go:231] nbd module loaded I0412 05:42:59.683553 226585 rbd_attach.go:245] kernel version "4.18.0-372.43.1.el8_6.x86_64" supports cookie feature W0412 05:42:59.683619 226585 rbd_attach.go:251] running rbd-nbd --help failed with error:an error (exec: "rbd-nbd": executable file not found in $PATH) occurred while running rbd-nbd args: [--help], stderr: I0412 05:42:59.683867 226585 server.go:126] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"} I0412 05:43:00.083808 226585 utils.go:191] ID: 1 GRPC call: /csi.v1.Identity/GetPluginInfo I0412 05:43:00.085016 226585 utils.go:195] ID: 1 GRPC request: {} I0412 05:43:00.085036 226585 identityserver-default.go:38] ID: 1 Using default GetPluginInfo I0412 05:43:00.085083 226585 utils.go:202] ID: 1 GRPC response: {"name":"ocs-client-ns.rbd.csi.ceph.com","vendor_version":"release-4.11"} I0412 05:43:00.902127 226585 utils.go:191] ID: 2 GRPC call: /csi.v1.Node/NodeGetInfo I0412 05:43:00.902173 226585 utils.go:195] ID: 2 GRPC request: {} I0412 05:43:00.902180 226585 nodeserver-default.go:46] ID: 2 Using default NodeGetInfo I0412 05:43:00.902254 226585 utils.go:202] ID: 2 GRPC response: {"accessible_topology":{},"node_id":"ip-10-0-12-43.ap-south-1.compute.internal"} ``` looks like we are setting cephcsi and csiaddons image tags based on the OCP version. If the OCP version is 4.11, we are pulling CSI 4.11. we need to use the cephcsi/csiaddons latest version on all the OCP releases. Only we should choose the sidecar version based on the OCP version. @Ohad Mitrani are we on the same page here? Moving it to build team to use the latest csi and csiaddons sidecar in ocs-client-operator configurations (images.yaml)
As this was offline, the strategy of taking newer CSIs and running them on older OCPs is an untested scenario. I think we might want to stick to tested CSI versions for any specific OCP release. If the concern is that new features will not be available for older OCPs, I believe that is fine. We never claim forward compatibility and if the user wants access to newer features then they should upgrade to a later OCP.