Description of problem: While analyzing the failure of the test case test_ceph_osd_stopped_pd, it was noticed that (thanks to Filip Balak) with dev addon which contains the topology changes, if one of rook-ceph-osd-* deployments are scaled down to 0, then in few moments (within a minute) it was scaled up back to 1 automatically. This behavior is different from what happens without the topology change where deployment remains scaled down to 0. Cluster size was 4Ti. ocs-ci test case given below failed due to this behavior. tests/manage/monitoring/pagerduty/test_deployment_status.py::test_ceph_osd_stopped_pd must-gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-o14-pr/jijoy-o14-pr_20221014T041828/logs/failed_testcase_ocs_logs_1665734797/test_ceph_osd_stopped_pd_ocs_logs/ ======================================================================= Version-Release number of selected component (if applicable): OCP 4.10.35 ocs-osd-deployer.v2.0.8 odf-operator.v4.10.5 ====================================================================== How reproducible: 3/3 Steps to Reproduce: 1. Scale down any one of rook-ceph-osd-* deployment to 0 in a Managed Service provider cluster installed with ocs-provider-dev addon. 2. Wait for some time and verify the replica of rook-ceph-osd-* deployment. Actual results: In few moments after the step 1(within a minute) rook-ceph-osd-* deployments are scaled up back to 1 automatically Expected results: deployment remains scaled down to 0 Additional info:
While the timing of scaling an OSD back up may have changed between releases, if you expect an OSD to stay scaled down, you'll need to scale down the rook operator as well.
- Based on the info from Rook team this isn't a regression, pls update otherwise.