Bug 2135384

Summary:	OSD deployment is not remaining in scaled down state
Product:	[Red Hat Storage] Red Hat OpenShift Data Foundation	Reporter:	Jilju Joy <jijoy>
Component:	odf-managed-service	Assignee:	Leela Venkaiah Gangavarapu <lgangava>
Status:	CLOSED NOTABUG	QA Contact:	Neha Berry <nberry>
Severity:	high	Docs Contact:
Priority:	medium
Version:	4.10	CC:	aeyal, fbalak, lgangava, ocs-bugs, odf-bz-bot, tnielsen
Target Milestone:	---	Keywords:	Regression
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-02-06 10:25:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jilju Joy 2022-10-17 13:26:51 UTC

Description of problem:
While analyzing the failure of the test case test_ceph_osd_stopped_pd, it was noticed that (thanks to Filip Balak) with dev addon which contains the topology changes, if one of rook-ceph-osd-* deployments are scaled down to 0, then in few moments (within a minute) it was scaled up back to 1 automatically. This behavior is different from what happens without the topology change where deployment remains scaled down to 0. Cluster size was 4Ti.

ocs-ci test case given below failed due to this behavior. 
tests/manage/monitoring/pagerduty/test_deployment_status.py::test_ceph_osd_stopped_pd

must-gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-o14-pr/jijoy-o14-pr_20221014T041828/logs/failed_testcase_ocs_logs_1665734797/test_ceph_osd_stopped_pd_ocs_logs/


=======================================================================
Version-Release number of selected component (if applicable):

OCP 4.10.35
ocs-osd-deployer.v2.0.8
odf-operator.v4.10.5

======================================================================
How reproducible:
3/3

Steps to Reproduce:
1. Scale down any one of rook-ceph-osd-* deployment to 0 in a Managed Service provider cluster installed with ocs-provider-dev addon.
2. Wait for some time and verify the replica of rook-ceph-osd-* deployment.

Actual results:
In few moments after the step 1(within a minute) rook-ceph-osd-* deployments are scaled up back to 1 automatically

Expected results:
deployment remains scaled down to 0

Additional info:

Comment 1 Travis Nielsen 2022-10-18 13:43:05 UTC

While the timing of scaling an OSD back up may have changed between releases, if you expect an OSD to stay scaled down, you'll need to scale down the rook operator as well.

Comment 3 Leela Venkaiah Gangavarapu 2022-11-28 06:12:57 UTC

- Based on the info from Rook team this isn't a regression, pls update otherwise.

Comment 10 Red Hat Bugzilla 2023-12-08 04:30:59 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days