+++ This bug was initially created as a clone of Bug #2221488 +++ This bug was initially created as a copy of Bug #2203795 I am copying this bug because: ---- ODF 4.14.1-14 Same list of 142 metrics are missing, on non-external mode deployment. https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/557/17055/827985/827986/827990/log?logParams=history%3D827990%26page.page%3D1 ---- Cloned bug: Even though missing metric names are different comparing to 4.13 missing metrics, the description of the problem and parties of the discussion should be the same Description of problem (please be detailed as possible and provide log snippests): ODF Monitoring is missing some of the ceph_* metric values. No related epic providing change/rename was found List of missing metric values: 'ceph_bluestore_state_aio_wait_lat_sum', 'ceph_paxos_store_state_latency_sum', 'ceph_osd_op_out_bytes', 'ceph_bluestore_txc_submit_lat_sum', 'ceph_paxos_commit', 'ceph_paxos_new_pn_latency_count', 'ceph_osd_op_r_process_latency_count', 'ceph_bluestore_txc_submit_lat_count', 'ceph_bluestore_kv_final_lat_sum', 'ceph_paxos_collect_keys_sum', 'ceph_paxos_accept_timeout', 'ceph_paxos_begin_latency_count', 'ceph_bluefs_wal_total_bytes', 'ceph_paxos_refresh', 'ceph_bluestore_read_lat_count', 'ceph_mon_num_sessions', 'ceph_bluefs_bytes_written_wal', 'ceph_mon_num_elections', 'ceph_rocksdb_compact', 'ceph_bluestore_kv_sync_lat_sum', 'ceph_osd_op_process_latency_count', 'ceph_osd_op_w_prepare_latency_count', 'ceph_paxos_begin_latency_sum', 'ceph_osd_op_r', 'ceph_osd_op_rw_prepare_latency_sum', 'ceph_paxos_new_pn', 'ceph_rocksdb_get_latency_count', 'ceph_paxos_commit_latency_count', 'ceph_bluestore_txc_throttle_lat_count', 'ceph_paxos_lease_ack_timeout', 'ceph_bluestore_txc_commit_lat_sum', 'ceph_paxos_collect_bytes_sum', 'ceph_osd_op_rw_latency_count', 'ceph_paxos_collect_uncommitted', 'ceph_osd_op_rw_latency_sum', 'ceph_paxos_share_state', 'ceph_osd_op_r_prepare_latency_sum', 'ceph_bluestore_kv_flush_lat_sum', 'ceph_osd_op_rw_process_latency_sum', 'ceph_rocksdb_rocksdb_write_memtable_time_count', 'ceph_paxos_collect_latency_count', 'ceph_osd_op_rw_prepare_latency_count', 'ceph_paxos_collect_latency_sum', 'ceph_rocksdb_rocksdb_write_delay_time_count', 'ceph_paxos_begin_bytes_sum', 'ceph_osd_numpg', 'ceph_osd_stat_bytes', 'ceph_rocksdb_submit_sync_latency_sum', 'ceph_rocksdb_compact_queue_merge', 'ceph_paxos_collect_bytes_count', 'ceph_osd_op', 'ceph_paxos_commit_keys_sum', 'ceph_osd_op_rw_in_bytes', 'ceph_osd_op_rw_out_bytes', 'ceph_bluefs_bytes_written_sst', 'ceph_osd_op_rw_process_latency_count', 'ceph_rocksdb_compact_queue_len', 'ceph_bluestore_txc_throttle_lat_sum', 'ceph_bluefs_slow_used_bytes', 'ceph_osd_op_r_latency_sum', 'ceph_bluestore_kv_flush_lat_count', 'ceph_rocksdb_compact_range', 'ceph_osd_op_latency_sum', 'ceph_mon_session_add', 'ceph_paxos_share_state_keys_count', 'ceph_paxos_collect', 'ceph_osd_op_w_in_bytes', 'ceph_osd_op_r_process_latency_sum', 'ceph_paxos_start_peon', 'ceph_mon_session_trim', 'ceph_rocksdb_get_latency_sum', 'ceph_osd_op_rw', 'ceph_paxos_store_state_keys_count', 'ceph_rocksdb_rocksdb_write_delay_time_sum', 'ceph_osd_recovery_ops', 'ceph_bluefs_logged_bytes', 'ceph_bluefs_db_total_bytes', 'ceph_osd_op_w_latency_count', 'ceph_bluestore_txc_commit_lat_count', 'ceph_bluestore_state_aio_wait_lat_count', 'ceph_paxos_begin_bytes_count', 'ceph_paxos_start_leader', 'ceph_mon_election_call', 'ceph_rocksdb_rocksdb_write_pre_and_post_time_count', 'ceph_mon_session_rm', 'ceph_paxos_store_state', 'ceph_paxos_store_state_bytes_count', 'ceph_osd_op_w_latency_sum', 'ceph_rocksdb_submit_latency_count', 'ceph_paxos_commit_latency_sum', 'ceph_rocksdb_rocksdb_write_memtable_time_sum', 'ceph_paxos_share_state_bytes_sum', 'ceph_osd_op_process_latency_sum', 'ceph_paxos_begin_keys_sum', 'ceph_rocksdb_rocksdb_write_pre_and_post_time_sum', 'ceph_bluefs_wal_used_bytes', 'ceph_rocksdb_rocksdb_write_wal_time_sum', 'ceph_osd_op_wip', 'ceph_paxos_lease_timeout', 'ceph_osd_op_r_out_bytes', 'ceph_paxos_begin_keys_count', 'ceph_bluestore_kv_sync_lat_count', 'ceph_osd_op_prepare_latency_count', 'ceph_bluefs_bytes_written_slow', 'ceph_rocksdb_submit_latency_sum', 'ceph_osd_op_r_latency_count', 'ceph_paxos_share_state_keys_sum', 'ceph_paxos_store_state_bytes_sum', 'ceph_osd_op_latency_count', 'ceph_paxos_commit_bytes_count', 'ceph_paxos_restart', 'ceph_bluefs_slow_total_bytes', 'ceph_paxos_collect_timeout', 'ceph_osd_op_w_process_latency_sum', 'ceph_paxos_collect_keys_count', 'ceph_paxos_share_state_bytes_count', 'ceph_osd_op_w_prepare_latency_sum', 'ceph_bluestore_read_lat_sum', 'ceph_osd_stat_bytes_used', 'ceph_paxos_begin', 'ceph_mon_election_win', 'ceph_osd_op_w_process_latency_count', 'ceph_rocksdb_rocksdb_write_wal_time_count', 'ceph_paxos_store_state_keys_sum', 'ceph_osd_numpg_removing', 'ceph_paxos_commit_keys_count', 'ceph_paxos_new_pn_latency_sum', 'ceph_osd_op_in_bytes', 'ceph_paxos_store_state_latency_count', 'ceph_paxos_refresh_latency_count', 'ceph_osd_op_r_prepare_latency_count', 'ceph_bluefs_num_files', 'ceph_mon_election_lose', 'ceph_osd_op_prepare_latency_sum', 'ceph_bluefs_db_used_bytes', 'ceph_bluestore_kv_final_lat_count', 'ceph_paxos_refresh_latency_sum', 'ceph_osd_recovery_bytes', 'ceph_osd_op_w', 'ceph_paxos_commit_bytes_sum', 'ceph_bluefs_log_bytes', 'ceph_rocksdb_submit_sync_latency_count', ceph metrics which should be present on a healthy cluster: https://github.com/red-hat-storage/ocs-ci/blob/81ca20aed067a30dd109e0f29e026f2a18c752ee/ocs_ci/ocs/metrics.py#L70 Polarion documentation: https://polarion.engineering.redhat.com/polarion/#/project/OpenShiftContainerStorage/workitem?id=OCS-958 Version of all relevant components (if applicable): OC version: Client Version: 4.13.4 Kustomize Version: v4.5.7 Server Version: 4.14.0-0.nightly-2023-06-30-131338 Kubernetes Version: v1.27.3+ab0b8ee OCS verison: ocs-operator.v4.14.0-36.stable OpenShift Container Storage 4.14.0-36.stable Succeeded Cluster version NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-0.nightly-2023-06-30-131338 True False 4d1h Cluster version is 4.14.0-0.nightly-2023-06-30-131338 Rook version: rook: v4.14.0-0.d8ce011027a26218154bcedf63a54e97f020df40 go: go1.20.4 Ceph version: ceph version 17.2.6-70.el9cp (fe62dcdbb2c6e05782a3e2b67d025b84ff5047cc) quincy (stable) Can this issue reproducible? yes, repeatable in ci run and with local run Steps to Reproduce: 1. Install OCP/ODF cluster 2. After installation, check whether Prometheus provides values for the metrics listed above. Actual results: OCP Prometheus provides no values for any of the metrics listed above. Expected results: OCP Prometheus provides values for all metrics listed above. logs of the test-run: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-034aikt1c33-t1/j-034aikt1c33-t1_20230704T064403/logs/ocs-ci-logs-1688456400/by_outcome/failed/tests/manage/monitoring/prometheusmetrics/test_monitoring_defaults.py/test_ceph_metrics_available/logs must-gather logs http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-034aikt1c33-t1/j-034aikt1c33-t1_20230704T064403/logs/testcases_1688456400/ --- Additional comment from RHEL Program Management on 2023-07-09 12:02:36 UTC --- This bug having no release flag set previously, is now set with release flag 'odf‑4.14.0' to '?', and so is being proposed to be fixed at the ODF 4.14.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag. --- Additional comment from RHEL Program Management on 2023-07-09 12:02:36 UTC --- The 'Target Release' is not to be set manually at the Red Hat OpenShift Data Foundation product. The 'Target Release' will be auto set appropriately, after the 3 Acks (pm,devel,qa) are set to "+" for a specific release flag and that release flag gets auto set to "+". --- Additional comment from Travis Nielsen on 2023-07-10 16:47:34 UTC --- Avan PTAL --- Additional comment from avan on 2023-07-25 17:26:37 UTC --- @Daniel, Is this still reproducible? --- Additional comment from Daniel Osypenko on 2023-07-26 08:21:32 UTC --- @athakkar OCS 4.14.0-77 still fails --- Additional comment from avan on 2023-08-01 10:45:40 UTC --- (In reply to Daniel Osypenko from comment #5) > @athakkar OCS 4.14.0-77 still fails Currently the ceph-exporter is disabled for 4.14 build as there were some issue detected in upstream Ceph. The plan is to get it delivered to 6.1z2 and then enable it in 4.14 release branch of rook repo by this week. --- Additional comment from Travis Nielsen on 2023-08-01 15:12:18 UTC --- Avan Was the exporter disabled in Ceph? If so, we can move this BZ over to the ceph component --- Additional comment from avan on 2023-08-02 09:37:26 UTC --- (In reply to Travis Nielsen from comment #7) > Avan Was the exporter disabled in Ceph? If so, we can move this BZ over to > the ceph component No, I mean it was disabled on rook end. By the way the fixes are merged upstream for exporter so soon will be backported to downstream 6.1z2 --- Additional comment from Travis Nielsen on 2023-08-02 16:32:13 UTC --- Oh right, the min version upstream requires v18 for the exporter to be enabled, which means it's disabled in 4.14 until we change the MinVersionForCephExporter again. --- Additional comment from Red Hat Bugzilla on 2023-08-03 08:28:02 UTC --- Account disabled by LDAP Audit --- Additional comment from Mudit Agarwal on 2023-08-08 05:35:34 UTC --- Avan, please add the link of Ceph BZ/PR which has the exporter changes? Also, when are we planning it to be enabled from rook side? Elad, please provide qa ack. --- Additional comment from RHEL Program Management on 2023-08-08 06:31:57 UTC --- This BZ is being approved for ODF 4.14.0 release, upon receipt of the 3 ACKs (PM,Devel,QA) for the release flag 'odf‑4.14.0 --- Additional comment from RHEL Program Management on 2023-08-08 06:31:57 UTC --- Since this bug has been approved for ODF 4.14.0 release, through release flag 'odf-4.14.0+', the Target Release is being set to 'ODF 4.14.0 --- Additional comment from avan on 2023-08-08 06:33:39 UTC --- (In reply to Mudit Agarwal from comment #11) > Avan, please add the link of Ceph BZ/PR which has the exporter changes? > Also, when are we planning it to be enabled from rook side? > > Elad, please provide qa ack. Ceph BZs https://bugzilla.redhat.com/show_bug.cgi?id=2217817 https://bugzilla.redhat.com/show_bug.cgi?id=2229267 Once we have this BZs moved to on_qa(once we have new build), it can be enabled in rook for 4.14 release branch --- Additional comment from avan on 2023-08-09 11:20:02 UTC --- @kdreyer @branto Give that we have the new Ceph image ready with the required exporter changes https://bugzilla.redhat.com/show_bug.cgi?id=2217817#c3, can you help making sure that ODF 4.14 uses this new image for testing? --- Additional comment from Boris Ranto on 2023-08-09 11:53:41 UTC --- Done, I updated the defaults to use the new RHCS 6.1z2 first build (6-200). We should have it in our builds starting from tomorrow. --- Additional comment from errata-xmlrpc on 2023-08-10 03:50:33 UTC --- This bug has been added to advisory RHBA-2023:115514 by ceph-build service account (ceph-build.COM) --- Additional comment from Daniel Osypenko on 2023-08-31 11:29:03 UTC --- Fixed. Same automation test, previously failed (test_monitoring_reporting_ok_when_idle) now passes: 13:42:54 - MainThread - /Users/danielosypenko/Work/automation_4/ocs-ci/ocs_ci/utility/prometheus.py - INFO - No bad values detected 13:42:54 - MainThread - /Users/danielosypenko/Work/automation_4/ocs-ci/ocs_ci/utility/prometheus.py - INFO - No invalid values detected 13:42:54 - MainThread - test_monitoring_defaults - INFO - ceph_osd_in metric does indicate no problems with OSDs PASSED --- Additional comment from Daniel Osypenko on 2023-09-04 10:13:06 UTC --- BZ has been moved to Verified by a mistake. List of missing metrics on OCP 4.14.0-0.nightly-2023-09-02-132842 ODF 4.14.0-125.stable ['ceph_bluestore_state_aio_wait_lat_sum', 'ceph_paxos_store_state_latency_sum', 'ceph_osd_op_out_bytes', 'ceph_bluestore_txc_submit_lat_sum', 'ceph_paxos_commit', 'ceph_paxos_new_pn_latency_count', 'ceph_osd_op_r_process_latency_count', 'ceph_bluestore_txc_submit_lat_count', 'ceph_bluestore_kv_final_lat_sum', 'ceph_paxos_collect_keys_sum', 'ceph_paxos_accept_timeout', 'ceph_paxos_begin_latency_count', 'ceph_bluefs_wal_total_bytes', 'ceph_paxos_refresh', 'ceph_bluestore_read_lat_count', 'ceph_mon_num_sessions', 'ceph_objecter_op_rmw', 'ceph_bluefs_bytes_written_wal', 'ceph_mon_num_elections', 'ceph_rocksdb_compact', 'ceph_bluestore_kv_sync_lat_sum', 'ceph_osd_op_process_latency_count', 'ceph_osd_op_w_prepare_latency_count', 'ceph_objecter_op_active', 'ceph_paxos_begin_latency_sum', 'ceph_osd_op_r', 'ceph_osd_op_rw_prepare_latency_sum', 'ceph_paxos_new_pn', 'ceph_rgw_qlen', 'ceph_rgw_req', 'ceph_rocksdb_get_latency_count', 'ceph_rgw_cache_miss', 'ceph_paxos_commit_latency_count', 'ceph_bluestore_txc_throttle_lat_count', 'ceph_paxos_lease_ack_timeout', 'ceph_bluestore_txc_commit_lat_sum', 'ceph_paxos_collect_bytes_sum', 'ceph_osd_op_rw_latency_count', 'ceph_paxos_collect_uncommitted', 'ceph_osd_op_rw_latency_sum', 'ceph_paxos_share_state', 'ceph_osd_op_r_prepare_latency_sum', 'ceph_bluestore_kv_flush_lat_sum', 'ceph_osd_op_rw_process_latency_sum', 'ceph_rocksdb_rocksdb_write_memtable_time_count', 'ceph_paxos_collect_latency_count', 'ceph_osd_op_rw_prepare_latency_count', 'ceph_paxos_collect_latency_sum', 'ceph_rocksdb_rocksdb_write_delay_time_count', 'ceph_objecter_op_rmw', 'ceph_paxos_begin_bytes_sum', 'ceph_osd_numpg', 'ceph_osd_stat_bytes', 'ceph_rocksdb_submit_sync_latency_sum', 'ceph_rocksdb_compact_queue_merge', 'ceph_paxos_collect_bytes_count', 'ceph_osd_op', 'ceph_paxos_commit_keys_sum', 'ceph_osd_op_rw_in_bytes', 'ceph_osd_op_rw_out_bytes', 'ceph_bluefs_bytes_written_sst', 'ceph_rgw_put', 'ceph_osd_op_rw_process_latency_count', 'ceph_rocksdb_compact_queue_len', 'ceph_bluestore_txc_throttle_lat_sum', 'ceph_bluefs_slow_used_bytes', 'ceph_osd_op_r_latency_sum', 'ceph_bluestore_kv_flush_lat_count', 'ceph_rocksdb_compact_range', 'ceph_osd_op_latency_sum', 'ceph_mon_session_add', 'ceph_paxos_share_state_keys_count', 'ceph_paxos_collect', 'ceph_osd_op_w_in_bytes', 'ceph_osd_op_r_process_latency_sum', 'ceph_paxos_start_peon', 'ceph_mon_session_trim', 'ceph_rocksdb_get_latency_sum', 'ceph_osd_op_rw', 'ceph_paxos_store_state_keys_count', 'ceph_rocksdb_rocksdb_write_delay_time_sum', 'ceph_objecter_op_r', 'ceph_objecter_op_active', 'ceph_objecter_op_w', 'ceph_osd_recovery_ops', 'ceph_bluefs_logged_bytes', 'ceph_bluefs_db_total_bytes', 'ceph_rgw_put_initial_lat_sum', 'ceph_osd_op_w_latency_count', 'ceph_rgw_put_initial_lat_count', 'ceph_bluestore_txc_commit_lat_count', 'ceph_bluestore_state_aio_wait_lat_count', 'ceph_paxos_begin_bytes_count', 'ceph_paxos_start_leader', 'ceph_mon_election_call', 'ceph_rocksdb_rocksdb_write_pre_and_post_time_count', 'ceph_mon_session_rm', 'ceph_paxos_store_state', 'ceph_paxos_store_state_bytes_count', 'ceph_osd_op_w_latency_sum', 'ceph_rgw_keystone_token_cache_hit', 'ceph_rocksdb_submit_latency_count', 'ceph_paxos_commit_latency_sum', 'ceph_rocksdb_rocksdb_write_memtable_time_sum', 'ceph_paxos_share_state_bytes_sum', 'ceph_osd_op_process_latency_sum', 'ceph_paxos_begin_keys_sum', 'ceph_rgw_qactive', 'ceph_rocksdb_rocksdb_write_pre_and_post_time_sum', 'ceph_bluefs_wal_used_bytes', 'ceph_rocksdb_rocksdb_write_wal_time_sum', 'ceph_osd_op_wip', 'ceph_rgw_get_initial_lat_sum', 'ceph_paxos_lease_timeout', 'ceph_osd_op_r_out_bytes', 'ceph_paxos_begin_keys_count', 'ceph_bluestore_kv_sync_lat_count', 'ceph_osd_op_prepare_latency_count', 'ceph_bluefs_bytes_written_slow', 'ceph_rocksdb_submit_latency_sum', 'ceph_osd_op_r_latency_count', 'ceph_paxos_share_state_keys_sum', 'ceph_paxos_store_state_bytes_sum', 'ceph_osd_op_latency_count', 'ceph_paxos_commit_bytes_count', 'ceph_paxos_restart', 'ceph_rgw_get_initial_lat_count', 'ceph_bluefs_slow_total_bytes', 'ceph_paxos_collect_timeout', 'ceph_osd_op_w_process_latency_sum', 'ceph_paxos_collect_keys_count', 'ceph_paxos_share_state_bytes_count', 'ceph_osd_op_w_prepare_latency_sum', 'ceph_bluestore_read_lat_sum', 'ceph_osd_stat_bytes_used', 'ceph_paxos_begin', 'ceph_mon_election_win', 'ceph_osd_op_w_process_latency_count', 'ceph_rgw_get_b', 'ceph_rgw_failed_req', 'ceph_rocksdb_rocksdb_write_wal_time_count', 'ceph_rgw_keystone_token_cache_miss', 'ceph_paxos_store_state_keys_sum', 'ceph_osd_numpg_removing', 'ceph_paxos_commit_keys_count', 'ceph_paxos_new_pn_latency_sum', 'ceph_osd_op_in_bytes', 'ceph_paxos_store_state_latency_count', 'ceph_paxos_refresh_latency_count', 'ceph_rgw_get', 'ceph_osd_op_r_prepare_latency_count', 'ceph_rgw_cache_hit', 'ceph_objecter_op_w', 'ceph_objecter_op_r', 'ceph_bluefs_num_files', 'ceph_rgw_put_b', 'ceph_mon_election_lose', 'ceph_osd_op_prepare_latency_sum', 'ceph_bluefs_db_used_bytes', 'ceph_bluestore_kv_final_lat_count', 'ceph_paxos_refresh_latency_sum', 'ceph_osd_recovery_bytes', 'ceph_osd_op_w', 'ceph_paxos_commit_bytes_sum', 'ceph_bluefs_log_bytes', 'ceph_rocksdb_submit_sync_latency_count'] --- Additional comment from avan on 2023-09-05 12:24:27 UTC --- There's a fix under review currently https://github.com/red-hat-storage/rook/pull/516 --- Additional comment from Travis Nielsen on 2023-09-05 15:29:15 UTC --- PR 516 was merged now. --- Additional comment from Daniel Osypenko on 2023-09-07 10:12:21 UTC --- Verified, PASSED: test_ceph_metrics_available http://pastebin.test.redhat.com/1108991 test_ceph_rbd_metrics_available http://pastebin.test.redhat.com/1108993 --- Additional comment from Sunil Kumar Acharya on 2023-09-21 05:54:14 UTC --- Please update the requires_doc_text(RDT) flag/text appropriately. --- Additional comment from errata-xmlrpc on 2023-11-08 17:53:45 UTC --- Bug report changed to RELEASE_PENDING status by Errata System. Advisory RHSA-2023:115514-11 has been changed to PUSH_READY status. https://errata.devel.redhat.com/advisory/115514 --- Additional comment from errata-xmlrpc on 2023-11-08 18:52:23 UTC --- Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6832
*** Bug 2253428 has been marked as a duplicate of this bug. ***
Avan, please don't change the bug state to ON_QA until the discussion is concluded and bug has all the acks. Also, this bug kept moving from engineering to QA and vice versa. Can we please setup a meeting to discuss and close it because the turnaround time to have such discussion via bug page is too much. Filip, if the fix is not working in 4.14.z then we need a separate bug for 4.14.z
I have reproduced an issue on post-upgrade deployment of the IBM cloud cluster and it very good represents the failure history of the tests test_ceph_metrics_available and test_ceph_rbd_metrics_available and may explain why we did not see it on life deployment. Issue happens only when we have OCP 4.15 on both ODF 4.14 and 4.15. Issue happens only with ceph metrics (not with rbd metrics). Issue happened on platforms: IBM cloud, Azure, AWS and GCP. Tested: Upgrade from OCP 4.14 & ODF 4.14 to OCP 4.14 & ODF 4.14 and pos-upgrade check Before upgrade data from the metrics were available After upgrade data are not available for 167 metrics Along with this ocs-storagecluster is in progressing, Data resiliency in progressing, one worker node is not ready (VM is running, no errors on odf-operator-controller-manager, no errors on ocs-metrics-exporter rather than health report issues) Issue also observed recently on vSphere OCP 4.15 & ODF 4.15 NOT-post-upgrade oc get pods -n openshift-storage NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-58746c86d6-qjggl 2/2 Running 0 110m csi-cephfsplugin-gzgxr 2/2 Running 2 31h csi-cephfsplugin-provisioner-85f5789c76-pn2vh 5/5 Running 0 3h14m csi-cephfsplugin-provisioner-85f5789c76-pnltj 5/5 Running 0 166m csi-cephfsplugin-thprw 2/2 Running 2 31h csi-cephfsplugin-zjlj7 2/2 Running 0 31h csi-rbdplugin-6cblq 3/3 Running 3 3h27m csi-rbdplugin-6r674 3/3 Running 0 3h26m csi-rbdplugin-provisioner-5fb5cc859b-g94br 6/6 Running 0 3h14m csi-rbdplugin-provisioner-5fb5cc859b-qts6t 6/6 Running 0 166m csi-rbdplugin-qlc7x 3/3 Running 3 3h27m noobaa-core-0 1/1 Running 0 166m noobaa-db-pg-0 1/1 Running 0 166m noobaa-endpoint-845d6d9998-lz296 1/1 Running 0 166m noobaa-operator-5bcf546c-mmlpr 2/2 Running 0 166m ocs-metrics-exporter-64755696fb-766qm 1/1 Running 0 3h14m ocs-operator-78c8fb9446-4g9x8 1/1 Running 2 (177m ago) 3h14m odf-console-76b8fd5784-wptm4 1/1 Running 0 166m odf-operator-controller-manager-7bff4bf5cf-5ldz9 2/2 Running 0 166m rook-ceph-crashcollector-dosypenk-281-i-fd2hc-worker-1-9kvcf5r5 1/1 Running 0 3h14m rook-ceph-crashcollector-dosypenk-281-i-fd2hc-worker-2-8zwczqwf 1/1 Running 0 166m rook-ceph-exporter-dosypenk-281-i-fd2hc-worker-1-9kv6q-689sx5vx 1/1 Running 0 3h14m rook-ceph-exporter-dosypenk-281-i-fd2hc-worker-2-8zwns-874khbrp 1/1 Running 0 166m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7f6766b5774n4 2/2 Running 11 (150m ago) 3h14m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5b6dd7cb2hfsz 2/2 Running 5 (148m ago) 164m rook-ceph-mgr-a-7bcc5c969-xzsm2 2/2 Running 0 166m rook-ceph-mon-a-d7d7bf65f-hjk48 2/2 Running 0 3h27m rook-ceph-mon-b-8f64c96cf-gqsll 2/2 Running 0 167m rook-ceph-mon-c-6f6f79d5dc-ztv72 0/2 Pending 0 5m30s rook-ceph-operator-7b7b6b8d5c-c26t5 1/1 Running 0 3h14m rook-ceph-osd-0-66948789f4-klprs 2/2 Running 0 3h12m rook-ceph-osd-1-79b6766cff-5nltx 0/2 Pending 0 163m rook-ceph-osd-2-747c74d944-xzzh6 2/2 Running 0 3h27m rook-ceph-tools-57fd4d4d68-9kjgw 1/1 Running 0 3h14m ocs must-gather stuck. adding an OCP must-gather and partially OCS must-gather
ocp must-gather https://drive.google.com/file/d/16MisoUMeBZJ--Ilju5wTsBptwXUyDq21/view?usp=sharing ocs must-gather https://drive.google.com/file/d/1iviZ_tPOPlfEy-leX1yo05XF0mHnUrng/view?usp=sharing
root cause is similar to https://bugzilla.redhat.com/show_bug.cgi?id=2258861. Divyansh with link the backport PR(4.14.z) and move the BZ to POST
OCP 4.14.0-0.nightly-2024-03-11-023324 OCS 4.14.6-1 tests passed: test_ceph_metrics_available - https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/34898/consoleFull test_ceph_rbd_metrics_available - https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/34896/console
*** Bug 2262307 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.14.6 Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:1579