Bug 2266621
| Summary: | mon pod scaledown is skipped if the mons are portable | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Joy John Pinto <jopinto> |
| Component: | rook | Assignee: | Subham Rai <srai> |
| Status: | CLOSED ERRATA | QA Contact: | Joy John Pinto <jopinto> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.15 | CC: | nladha, odf-bz-bot, srai, tnielsen |
| Target Milestone: | --- | ||
| Target Release: | ODF 4.16.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 4.16.0-89 | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-07-17 13:14:31 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Moving to 4.16, not a blocker. Based on previous comment/description seems like a rook specific issue, transferring it to rook. Verified with OCP 4.16.0-0.nightly-2024-05-08-222442 and ODF 4.16.0-96
Verification steps:
1. Installed OCP 4.16 and ODF 4.16.0-96 on a 6 worker node and 6 failure domain cluster on vsphere
2. Updated the mon count to 5 and then changed it back to 3 from storagecluster ('monCount' attribute)
3. The monCount value is updated in storagecluster and cephcluster, and mon pods count is reduced to three
storagecluster yaml:
kms: {}
externalStorage: {}
managedResources:
cephBlockPools: {}
cephCluster:
monCount: 5
[jopinto@jopinto 5mbug]$ oc get pods -n openshift-storage | grep mon
rook-ceph-mon-a-5cdd784484-zp6bl 2/2 Running 0 35m
rook-ceph-mon-b-5fdd68b844-4lb44 2/2 Running 0 34m
rook-ceph-mon-c-5f55dfb6bb-ch9ld 2/2 Running 0 34m
rook-ceph-mon-d-65ddfd5556-rlbdc 2/2 Running 0 8m38s
rook-ceph-mon-e-58d8475cd8-ndxdg 2/2 Running 0 8m18s
storagecluster yaml:
kms: {}
externalStorage: {}
managedResources:
cephBlockPools: {}
cephCluster:
monCount: 3
[jopinto@jopinto 5mbug]$ oc get pods -n openshift-storage | grep mon
rook-ceph-mon-c-5f55dfb6bb-ch9ld 2/2 Running 0 41m
rook-ceph-mon-d-65ddfd5556-rlbdc 2/2 Running 0 15m
rook-ceph-mon-e-58d8475cd8-ndxdg 2/2 Running 0 15m
sh-5.1$ ceph health
HEALTH_OK
sh-5.1$
Also upon changing the monCount back to three, CephMonLowNumber alert is trigeered which is expected.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591 |
Description of problem (please be detailed as possible and provide log snippests): mon pod scaledown is skipped if the mons are portable Version of all relevant components (if applicable): OCP 4.15 and ODF 4.15.0-150 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? NA Is there any workaround available to the best of your knowledge? NA Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? NA If this is a regression, please provide more details to justify this: NA Steps to Reproduce: 1. Install OCP 4.15 and ODF 4.15.0-150 on a 6 worker node and 6 failure domain cluster on vsphere 2. Update the mon count to 5 and then change it back to 3 from storagecluster ('monCount' attribute) 3. The monCount value is updated in storagecluster and cephcluster, But still five mons tend to exist Actual results: Even after scaling down mon pods to 3, five mon pods keep running Expected results: Upon scaling down the mon pod to three, only three mon pods should be running Additional info: storagecluster CR: spec: arbiter: {} enableCephTools: true encryption: kms: {} externalStorage: {} managedResources: cephBlockPools: {} cephCluster: monCount: 3 cephConfig: {} cephcluster CR: name: balancer mon: count: 3 volumeClaimTemplate: rook ceph operator log: 2024-02-28 14:40:10.746033 I | ceph-cluster-controller: done reconciling ceph cluster in namespace "openshift-storage" 2024-02-28 14:40:10.796538 I | ceph-cluster-controller: reporting cluster telemetry 2024-02-28 14:40:10.804990 I | ceph-cluster-controller: enabling ceph mon monitoring goroutine for cluster "openshift-storage" 2024-02-28 14:40:16.823208 I | ceph-cluster-controller: reporting node telemetry 2024-02-28 14:40:56.290615 I | op-mon: removing an extra mon. currently 5 are in quorum and only 3 are desired 2024-02-28 14:40:56.290662 I | op-mon: removing arbitrary extra mon "" 2024-02-28 14:40:56.290666 I | op-mon: did not identify a mon to remove 2024-02-28 14:41:41.744893 I | op-mon: removing an extra mon. currently 5 are in quorum and only 3 are desired 2024-02-28 14:41:41.744951 I | op-mon: removing arbitrary extra mon "" 2024-02-28 14:41:41.744954 I | op-mon: did not identify a mon to remove 2024-02-28 14:42:27.196955 I | op-mon: removing an extra mon. currently 5 are in quorum and only 3 are desired 2024-02-28 14:42:27.196997 I | op-mon: removing arbitrary extra mon "" 2024-02-28 14:42:27.197000 I | op-mon: did not identify a mon to remove 2024-02-28 14:43:12.623301 I | op-mon: removing an extra mon. currently 5 are in quorum and only 3 are desired 2024-02-28 14:43:12.623450 I | op-mon: removing arbitrary extra mon "" 2024-02-28 14:43:12.623470 I | op-mon: did not identify a mon to remove