Bug 2250995
Summary: | [Tracker][29079] rook-ceph-exporter pod restarts multiple times on a fresh installed HCI cluster | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Mudit Agarwal <muagarwa> |
Component: | rook | Assignee: | Divyansh Kamboj <dkamboj> |
Status: | CLOSED ERRATA | QA Contact: | Daniel Osypenko <dosypenk> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.14 | CC: | athakkar, dkamboj, dosypenk, ebenahar, jolmomar, kbg, lgangava, muagarwa, nberry, nthomas, odf-bz-bot, omitrani, rohgupta, sapillai, tnielsen |
Target Milestone: | --- | ||
Target Release: | ODF 4.15.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | isf-provider | ||
Fixed In Version: | 4.15.0-112 | Doc Type: | Bug Fix |
Doc Text: |
.Deployment strategy to avoid rook-ceph-exporter pod restart
Previously, the `rook-ceph-exporter` pod restarted multiple times on a freshly installed HCI cluster that resulted in crashing of the exporter pod and the Ceph health showing the WARN status. This was because restarting the exporter using `RollingRelease` caused a race condition resulting in crash of the exporter.
With this fix, the deployment strategy is changed to `Recreate`. As a result, exporter pods no longer crash and there is no more health WARN status of Ceph.
|
Story Points: | --- |
Clone Of: | 2248850 | Environment: | |
Last Closed: | 2024-03-19 15:29:07 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2248850 | ||
Bug Blocks: | 2246375 |
Description
Mudit Agarwal
2023-11-22 07:32:51 UTC
fresh installation ODF 4.14.1-13 after 3h no restarts, the age of rook-ceph-exporter pods stays same as of other rook resources oc -n openshift-storage get csv odf-operator.v4.14.1-rhodf -ojsonpath={.metadata.labels.full_version} 4.14.1-13 oc get pods -n openshift-storage NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-57c78f8dcc-qhhrd 2/2 Running 0 4m noobaa-core-0 1/1 Running 0 3h2m noobaa-db-pg-0 1/1 Running 0 3h2m noobaa-endpoint-7b4cc64766-p675z 1/1 Running 0 58m noobaa-operator-5db6879bd8-j4hpr 2/2 Running 0 3h8m ocs-metrics-exporter-78cdb76d7f-t2rlf 1/1 Running 0 3h7m ocs-operator-747cb68d6d-lmjzc 1/1 Running 10 (2m13s ago) 3h7m ocs-provider-server-5c96bd9959-xthtk 1/1 Running 0 3h3m odf-console-84798894d9-fx75k 1/1 Running 0 3h7m odf-operator-controller-manager-f6954947-hw55k 2/2 Running 8 (2m36s ago) 3h7m rook-ceph-crashcollector-00-50-56-8f-2e-87-b8cdbc894-khc5j 1/1 Running 0 3h1m rook-ceph-crashcollector-00-50-56-8f-7d-c3-5985d47bdc-m9865 1/1 Running 0 3h rook-ceph-crashcollector-00-50-56-8f-bc-1d-65c989c856-hgp77 1/1 Running 0 3h1m rook-ceph-exporter-00-50-56-8f-2e-87-9f9fb4f5d-ljs22 1/1 Running 0 3h1m rook-ceph-exporter-00-50-56-8f-7d-c3-6f97d76b6c-cbrh2 1/1 Running 0 3h rook-ceph-exporter-00-50-56-8f-bc-1d-58bfc4bfbd-spj2r 1/1 Running 0 3h1m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-6b859c66q5nxn 2/2 Running 0 3h1m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-6f75f8c7m6gqh 2/2 Running 0 3h1m rook-ceph-mgr-a-5679c86dd7-8x96z 2/2 Running 0 3h2m rook-ceph-mon-a-85cb58fdb9-zlggc 2/2 Running 1 (78m ago) 3h3m rook-ceph-mon-b-8695bcb7cb-qkgbd 2/2 Running 0 3h3m rook-ceph-mon-c-d5cf44b-b6tdz 2/2 Running 1 (78m ago) 3h3m rook-ceph-operator-57dc54fc8-v6sjv 1/1 Running 0 3h3m rook-ceph-osd-0-69dcb6bf7d-zrkc4 2/2 Running 0 3h2m rook-ceph-osd-1-67589dff6f-sm4sj 2/2 Running 0 3h2m rook-ceph-osd-2-5dd888dcd9-g6d4f 2/2 Running 0 3h2m rook-ceph-osd-prepare-3456f52398cce3c85a50f4ba965cf80f-7mhxh 0/1 Completed 0 3h2m rook-ceph-osd-prepare-45478423b0aa91d96006e92e856edbaa-swwj5 0/1 Completed 0 3h2m rook-ceph-osd-prepare-e5260a8ca7e86f1362df996245319234-rmgwm 0/1 Completed 0 3h2m rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-7fbd8497qv6t 2/2 Running 0 3h rook-ceph-tools-67c876b65c-qhszm 1/1 Running 0 114m OCP version 4.15.0-0.nightly-2024-01-25-051548 ODF version 4.15.0-126.stable no restarts since Provider / Client deployment oc get pods -n openshift-storage | awk 'NR==1 || /rook-ceph-exporter/' NAME READY STATUS RESTARTS AGE rook-ceph-exporter-b2.fd.3da9.ip4.static.sl-reverse.com-cbjrjc7 1/1 Running 0 74m rook-ceph-exporter-b5.fd.3da9.ip4.static.sl-reverse.com-59m477k 1/1 Running 0 25h rook-ceph-exporter-b8.fd.3da9.ip4.static.sl-reverse.com-86mgcm8 1/1 Running 0 25h rook-ceph-exporter-b9.fd.3da9.ip4.static.sl-reverse.com-75smhd2 1/1 Running 0 68m rook-ceph-exporter-bd.fd.3da9.ip4.static.sl-reverse.com-6dppb2r 1/1 Running 0 20m oc rsh -n openshift-storage $TOOLBOX sh-5.1$ ceph status cluster: id: e0028f51-9387-49b2-9cd9-ec7a20ebb8a6 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 62m) mgr: a(active, since 25h), standbys: b mds: 1/1 daemons up, 1 hot standby osd: 12 osds: 12 up (since 13m), 12 in (since 2d) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 15 pools, 1227 pgs objects: 44.34k objects, 169 GiB usage: 509 GiB used, 10 TiB / 10 TiB avail pgs: 1227 active+clean io: client: 1.2 KiB/s rd, 5.1 MiB/s wr, 2 op/s rd, 478 op/s wr moving to VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |