2135384 – OSD deployment is not remaining in scaled down state

Bug 2135384 - OSD deployment is not remaining in scaled down state

Summary: OSD deployment is not remaining in scaled down state

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	odf-managed-service
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Leela Venkaiah Gangavarapu
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-10-17 13:26 UTC by Jilju Joy
Modified:	2023-12-08 04:30 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-02-06 10:25:32 UTC
Embargoed:

Attachments	(Terms of Use)

Description Jilju Joy 2022-10-17 13:26:51 UTC

Description of problem:
While analyzing the failure of the test case test_ceph_osd_stopped_pd, it was noticed that (thanks to Filip Balak) with dev addon which contains the topology changes, if one of rook-ceph-osd-* deployments are scaled down to 0, then in few moments (within a minute) it was scaled up back to 1 automatically. This behavior is different from what happens without the topology change where deployment remains scaled down to 0. Cluster size was 4Ti.

ocs-ci test case given below failed due to this behavior. 
tests/manage/monitoring/pagerduty/test_deployment_status.py::test_ceph_osd_stopped_pd

must-gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-o14-pr/jijoy-o14-pr_20221014T041828/logs/failed_testcase_ocs_logs_1665734797/test_ceph_osd_stopped_pd_ocs_logs/


=======================================================================
Version-Release number of selected component (if applicable):

OCP 4.10.35
ocs-osd-deployer.v2.0.8
odf-operator.v4.10.5

======================================================================
How reproducible:
3/3

Steps to Reproduce:
1. Scale down any one of rook-ceph-osd-* deployment to 0 in a Managed Service provider cluster installed with ocs-provider-dev addon.
2. Wait for some time and verify the replica of rook-ceph-osd-* deployment.

Actual results:
In few moments after the step 1(within a minute) rook-ceph-osd-* deployments are scaled up back to 1 automatically

Expected results:
deployment remains scaled down to 0

Additional info:

Comment 1 Travis Nielsen 2022-10-18 13:43:05 UTC

While the timing of scaling an OSD back up may have changed between releases, if you expect an OSD to stay scaled down, you'll need to scale down the rook operator as well.

Comment 3 Leela Venkaiah Gangavarapu 2022-11-28 06:12:57 UTC

- Based on the info from Rook team this isn't a regression, pls update otherwise.

Comment 10 Red Hat Bugzilla 2023-12-08 04:30:59 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.