Description of problem (please be detailed as possible and provide log snippests): In OCP 4.16 and ODF 4.16 cluster, csi-cephfsplugin-provisioner and csi-rbdplugin-provisioner pods are stuck in CrashLoopBackOff state after enabling featuregate in OCP $ oc get pod -n openshift-storage | grep provisioner csi-cephfsplugin-provisioner-54cd9c86f6-42bpx 5/6 CrashLoopBackOff 15 (3m32s ago) 55m csi-cephfsplugin-provisioner-54cd9c86f6-jtqtl 5/6 CrashLoopBackOff 15 (3m21s ago) 55m csi-rbdplugin-provisioner-d8b4c77cf-g7tj2 5/6 CrashLoopBackOff 15 (3m23s ago) 55m csi-rbdplugin-provisioner-d8b4c77cf-n4mlv 5/6 CrashLoopBackOff 15 (3m39s ago) 55m Logs from one of the provisioner pod (csi-snapshotter container): --- flag provided but not defined: -enable-volume-group-snapshots Usage of /usr/bin/csi-snapshotter: -add_dir_header If true, adds the file directory to the header of the log messages -alsologtostderr log to standard error as well as files (no effect when -logtostderr=true) -csi-address string Address of the CSI driver socket. (default "/run/csi/socket") -extra-create-metadata If set, add snapshot metadata to plugin snapshot requests as parameters. -groupsnapshot-name-prefix string Prefix to apply to the name of a created group snapshot (default "groupsnapshot") -groupsnapshot-name-uuid-length int Length in characters for the generated uuid of a created group snapshot. Defaults behavior is to NOT truncate. (default -1) -http-endpoint :8080 The TCP network address where the HTTP server for diagnostics, including metrics and leader election health check, will listen (example: :8080). The default is empty string, which means the server is disabled. Only one of `--metrics-address` and `--http-endpoint` can be set. -kube-api-burst int Burst to use while communicating with the kubernetes apiserver. Defaults to 10. (default 10) -kube-api-qps float QPS to use while communicating with the kubernetes apiserver. Defaults to 5.0. (default 5) -kubeconfig string Absolute path to the kubeconfig file. Required only when running out of cluster. -leader-election Enables leader election. -leader-election-lease-duration duration Duration, in seconds, that non-leader candidates will wait to force acquire leadership. Defaults to 15 seconds. (default 15s) -leader-election-namespace string The namespace where the leader election resource exists. Defaults to the pod namespace if not set. -leader-election-renew-deadline duration Duration, in seconds, that the acting leader will retry refreshing leadership before giving up. Defaults to 10 seconds. (default 10s) -leader-election-retry-period duration Duration, in seconds, the LeaderElector clients should wait between tries of actions. Defaults to 5 seconds. (default 5s) -log_backtrace_at value when logging hits line file:N, emit a stack trace -log_dir string If non-empty, write log files in this directory (no effect when -logtostderr=true) -log_file string If non-empty, use this log file (no effect when -logtostderr=true) -log_file_max_size uint Defines the maximum size a log file can grow to (no effect when -logtostderr=true). Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800) -logtostderr log to standard error instead of files (default true) -metrics-address :8080 (deprecated) The TCP network address where the prometheus metrics endpoint will listen (example: :8080). The default is empty string, which means metrics endpoint is disabled. Only one of `--metrics-address` and `--http-endpoint` can be set. -metrics-path /metrics The HTTP path where prometheus metrics will be exposed. Default is /metrics. (default "/metrics") -node-deployment Enables deploying the sidecar controller together with a CSI driver on nodes to manage snapshots for node-local volumes. -one_output If true, only write logs to their native severity level (vs also writing to each lower severity level; no effect when -logtostderr=true) -resync-period duration Resync interval of the controller. Default is 15 minutes (default 15m0s) -retry-interval-max duration Maximum retry interval of failed volume snapshot creation or deletion. Default is 5 minutes. (default 5m0s) -retry-interval-start duration Initial retry interval of failed volume snapshot creation or deletion. It doubles with each failure, up to retry-interval-max. Default is 1 second. (default 1s) -skip_headers If true, avoid header prefixes in the log messages -skip_log_headers If true, avoid headers when opening log files (no effect when -logtostderr=true) -snapshot-name-prefix string Prefix to apply to the name of a created snapshot (default "snapshot") -snapshot-name-uuid-length int Length in characters for the generated uuid of a created snapshot. Defaults behavior is to NOT truncate. (default -1) -stderrthreshold value logs at or above this threshold go to stderr when writing to files and stderr (no effect when -logtostderr=true or -alsologtostderr=false) (default 2) -timeout duration The timeout for any RPCs to the CSI driver. Default is 1 minute. (default 1m0s) -v value number for the log level verbosity -version Show version. -vmodule value comma-separated list of pattern=N settings for file-filtered logging -worker-threads int Number of worker threads. (default 10) --- Version of all relevant components (if applicable): OCP: 4.16.0-0.nightly-2024-05-01-111315 ODF: 4.16.0-90.stable Images from csi pods: registry.redhat.io/odf4/cephcsi-rhel9@sha256:d851bc4896e3666ba4d965eac89010ed5eea6c59d55027a5f5a01f9b079aeafe registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel9@sha256:d0ca282694892d6caf025a35a593a3633785d2a40f4f8984e7f94a6906bb4236 registry.redhat.io/openshift4/ose-csi-external-attacher@sha256:bce20ed64dbee694666b75a96fd505223e8eed193d5cd40a607d871d0cc8b9c0 registry.redhat.io/openshift4/ose-csi-external-provisioner@sha256:2da32b524163a1e046bdde7750fe71a2f1175e509357db3cd1300ef849f4f0b6 registry.redhat.io/openshift4/ose-csi-external-resizer@sha256:927629fd0731988d52d5bb1094b650bc5def609bacb406dac5e60905e4c9ca26 registry.redhat.io/openshift4/ose-csi-external-snapshotter@sha256:965111171af569965e07b724eb93ea77077c6272023c02d0f1aa80ebcdef48fa registry.redhat.io/openshift4/ose-csi-node-driver-registrar@sha256:b7eacc160fcce0881a00be2eb8d050a66b6cf68bcac2ef9da72d7c0297f77c0f Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, pods are in CrashLoopBackOff state Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy OCP 4.16 and ODF 4.16 2. Enable featuregate in OCP 3. Observe the pod status Actual results: csi-cephfsplugin-provisioner and csi-rbdplugin-provisioner pods stuck in CrashLoopBackOff state Expected results: Pods should be in Running state Additional info: $ oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.16.0-90.stable NooBaa Operator 4.16.0-90.stable Succeeded ocs-client-operator.v4.16.0-90.stable OpenShift Data Foundation Client 4.16.0-90.stable Succeeded ocs-operator.v4.16.0-90.stable OpenShift Container Storage 4.16.0-90.stable Succeeded odf-csi-addons-operator.v4.16.0-90.stable CSI Addons 4.16.0-90.stable Succeeded odf-operator.v4.16.0-90.stable OpenShift Data Foundation 4.16.0-90.stable Succeeded odf-prometheus-operator.v4.16.0-90.stable Prometheus Operator 4.16.0-90.stable Succeeded rook-ceph-operator.v4.16.0-90.stable Rook-Ceph 4.16.0-90.stable Succeeded
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591