Bug 1889616

Summary:	[RFE] Implement 2 pods per noobaa component (core and db)
Product:	[Red Hat Storage] Red Hat OpenShift Container Storage	Reporter:	Manjunatha <mmanjuna>
Component:	Multi-Cloud Object Gateway	Assignee:	Nimrod Becker <nbecker>
Status:	CLOSED DUPLICATE	QA Contact:	Raz Tamir <ratamir>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.5	CC:	amanzane, assingh, bkunal, ddomingu, etamir, hnallurv, nbecker, nberry, ocs-bugs, owasserm, rui.moura
Target Milestone:	---	Keywords:	FutureFeature
Target Release:	---	Flags:	mmanjuna: needinfo?
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-04-05 12:53:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Manjunatha 2020-10-20 08:22:33 UTC

Description of problem (please be detailed as possible and provide log
snippests):
Support to have at least 2 pods per noobaa component (endpoint, core and db). Because K8S spinning a new pod could last up to 10 minutes and those 10 minutes means with a whole object storage downtime (PUT and GET requests).

Comment 4 Nimrod Becker 2020-10-20 08:49:33 UTC

4.6 is at dev freeze, as an RFE it should move to 4.7

Comment 5 Neha Berry 2020-10-20 19:04:05 UTC

@nimrod is the ask in this BZ different than Bug 1874243 -  [RFE] Noobaa resources are impacted with a downtime during admin operations, such as upgrade, due to no HA for noobaa-core and noobaa-db

Comment 6 Nimrod Becker 2020-10-21 06:35:01 UTC

In its core, the solution would be the same, the other bug talks about admin ops, this one about failure. But the same solution would apply.

Comment 7 Orit Wasserman 2020-12-01 07:27:19 UTC

Hi,
We are trying very hard to reduce OCS footprint, doubling Noobaa pods is going in the opposite direction.
Any addition of pods or resources to OCS should be consulted and agreed with the OCS architects.

I would recommend focusing on reducing the time it takes to detect a failure and respining a new Nooba pod. The Rook team successfully reduced the OSD pods respin to less than a minute, where most of the time spent was related to detach/attach PVs.

There is a misconception that having 2 instances will make recovery faster but if you take into account the failure detection period, the failover and the time required to make sure the other instance is down, you may find it is higher than respining a new pod.

Regards,
Orit