Bug 2292442 - [RFE] Make mon out timeout configurable
Summary: [RFE] Make mon out timeout configurable
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.14
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Malay Kumar parida
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-06-14 19:27 UTC by Ales Nosek
Modified: 2024-08-13 07:47 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description Ales Nosek 2024-06-14 19:27:38 UTC
Rook Ceph allows setting cephcluster.spec.healthCheck.daemonHealth.mon.timeout to a custom value. It can also be set to 0 which disables the mon failover. We would like this value to be configurable in ODF including the option to disable it.

For mon failover, ODF currently uses a default value of 10 minutes. It doesn't look like it can be changed. The 10-minute value is too low for our use case: we deploy ODF on bare metal clusters with OpenShift Virtualization. During node draining, the virtual machines are live migrated away from the node. The live migration process can take 40-60 minutes depending on how many virtual machines are on the node and how fast the virtual machine memory can be copied over the network to another cluster node. Due to the mon failover value being too low, a failover of all three monitors occurs for us on every OpenShift upgrade.

We would like the option to disable the mon failover as well. Recently, we had a scenario (https://bugzilla.redhat.com/show_bug.cgi?id=2292435) where the mon failover likely caused a Ceph mon outage. In the interim, until this issue is confirmed and fixed, we would like to disable the mon failover.

Comment 4 Ales Nosek 2024-06-14 19:29:11 UTC
This bug was originally filed in Red Hat's Jira:
https://issues.redhat.com/browse/RHSTOR-5939


Note You need to log in before you can comment on or make changes to this bug.