Bug 2135384

Summary: OSD deployment is not remaining in scaled down state
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Jilju Joy <jijoy>
Component: odf-managed-serviceAssignee: Leela Venkaiah Gangavarapu <lgangava>
Status: CLOSED NOTABUG QA Contact: Neha Berry <nberry>
Severity: high Docs Contact:
Priority: medium    
Version: 4.10CC: aeyal, fbalak, lgangava, ocs-bugs, odf-bz-bot, tnielsen
Target Milestone: ---Keywords: Regression
Target Release: ---Flags: jijoy: needinfo? (fbalak)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-02-06 10:25:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jilju Joy 2022-10-17 13:26:51 UTC
Description of problem:
While analyzing the failure of the test case test_ceph_osd_stopped_pd, it was noticed that (thanks to Filip Balak) with dev addon which contains the topology changes, if one of rook-ceph-osd-* deployments are scaled down to 0, then in few moments (within a minute) it was scaled up back to 1 automatically. This behavior is different from what happens without the topology change where deployment remains scaled down to 0. Cluster size was 4Ti.

ocs-ci test case given below failed due to this behavior. 
tests/manage/monitoring/pagerduty/test_deployment_status.py::test_ceph_osd_stopped_pd

must-gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-o14-pr/jijoy-o14-pr_20221014T041828/logs/failed_testcase_ocs_logs_1665734797/test_ceph_osd_stopped_pd_ocs_logs/


=======================================================================
Version-Release number of selected component (if applicable):

OCP 4.10.35
ocs-osd-deployer.v2.0.8
odf-operator.v4.10.5

======================================================================
How reproducible:
3/3

Steps to Reproduce:
1. Scale down any one of rook-ceph-osd-* deployment to 0 in a Managed Service provider cluster installed with ocs-provider-dev addon.
2. Wait for some time and verify the replica of rook-ceph-osd-* deployment.

Actual results:
In few moments after the step 1(within a minute) rook-ceph-osd-* deployments are scaled up back to 1 automatically

Expected results:
deployment remains scaled down to 0

Additional info:

Comment 1 Travis Nielsen 2022-10-18 13:43:05 UTC
While the timing of scaling an OSD back up may have changed between releases, if you expect an OSD to stay scaled down, you'll need to scale down the rook operator as well.

Comment 3 Leela Venkaiah Gangavarapu 2022-11-28 06:12:57 UTC
- Based on the info from Rook team this isn't a regression, pls update otherwise.