Bug 2135384 - OSD deployment is not remaining in scaled down state [NEEDINFO]
Summary: OSD deployment is not remaining in scaled down state
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-managed-service
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: ---
Assignee: Leela Venkaiah Gangavarapu
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-10-17 13:26 UTC by Jilju Joy
Modified: 2023-08-09 17:00 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-02-06 10:25:32 UTC
Embargoed:
jijoy: needinfo? (fbalak)


Attachments (Terms of Use)

Description Jilju Joy 2022-10-17 13:26:51 UTC
Description of problem:
While analyzing the failure of the test case test_ceph_osd_stopped_pd, it was noticed that (thanks to Filip Balak) with dev addon which contains the topology changes, if one of rook-ceph-osd-* deployments are scaled down to 0, then in few moments (within a minute) it was scaled up back to 1 automatically. This behavior is different from what happens without the topology change where deployment remains scaled down to 0. Cluster size was 4Ti.

ocs-ci test case given below failed due to this behavior. 
tests/manage/monitoring/pagerduty/test_deployment_status.py::test_ceph_osd_stopped_pd

must-gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-o14-pr/jijoy-o14-pr_20221014T041828/logs/failed_testcase_ocs_logs_1665734797/test_ceph_osd_stopped_pd_ocs_logs/


=======================================================================
Version-Release number of selected component (if applicable):

OCP 4.10.35
ocs-osd-deployer.v2.0.8
odf-operator.v4.10.5

======================================================================
How reproducible:
3/3

Steps to Reproduce:
1. Scale down any one of rook-ceph-osd-* deployment to 0 in a Managed Service provider cluster installed with ocs-provider-dev addon.
2. Wait for some time and verify the replica of rook-ceph-osd-* deployment.

Actual results:
In few moments after the step 1(within a minute) rook-ceph-osd-* deployments are scaled up back to 1 automatically

Expected results:
deployment remains scaled down to 0

Additional info:

Comment 1 Travis Nielsen 2022-10-18 13:43:05 UTC
While the timing of scaling an OSD back up may have changed between releases, if you expect an OSD to stay scaled down, you'll need to scale down the rook operator as well.

Comment 3 Leela Venkaiah Gangavarapu 2022-11-28 06:12:57 UTC
- Based on the info from Rook team this isn't a regression, pls update otherwise.


Note You need to log in before you can comment on or make changes to this bug.