1889616 – [RFE] Implement 2 pods per noobaa component (core and db)

Bug 1889616 - [RFE] Implement 2 pods per noobaa component (core and db) [NEEDINFO]

Summary: [RFE] Implement 2 pods per noobaa component (core and db)

Keywords:
Status:	CLOSED DUPLICATE of bug 1853638
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	Multi-Cloud Object Gateway
Sub Component:
Version:	4.5
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Nimrod Becker
QA Contact:	Raz Tamir
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-10-20 08:22 UTC by Manjunatha
Modified:	2024-10-01 16:59 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-04-05 12:53:41 UTC
Embargoed:
Dependent Products:
Flags:	mmanjuna: needinfo?

Attachments	(Terms of Use)

Description Manjunatha 2020-10-20 08:22:33 UTC

Description of problem (please be detailed as possible and provide log
snippests):
Support to have at least 2 pods per noobaa component (endpoint, core and db). Because K8S spinning a new pod could last up to 10 minutes and those 10 minutes means with a whole object storage downtime (PUT and GET requests).

Comment 4 Nimrod Becker 2020-10-20 08:49:33 UTC

4.6 is at dev freeze, as an RFE it should move to 4.7

Comment 5 Neha Berry 2020-10-20 19:04:05 UTC

@nimrod is the ask in this BZ different than Bug 1874243 -  [RFE] Noobaa resources are impacted with a downtime during admin operations, such as upgrade, due to no HA for noobaa-core and noobaa-db

Comment 6 Nimrod Becker 2020-10-21 06:35:01 UTC

In its core, the solution would be the same, the other bug talks about admin ops, this one about failure. But the same solution would apply.

Comment 7 Orit Wasserman 2020-12-01 07:27:19 UTC

Hi,
We are trying very hard to reduce OCS footprint, doubling Noobaa pods is going in the opposite direction.
Any addition of pods or resources to OCS should be consulted and agreed with the OCS architects.

I would recommend focusing on reducing the time it takes to detect a failure and respining a new Nooba pod. The Rook team successfully reduced the OSD pods respin to less than a minute, where most of the time spent was related to detach/attach PVs.

There is a misconception that having 2 instances will make recovery faster but if you take into account the failure detection period, the failover and the time required to make sure the other instance is down, you may find it is higher than respining a new pod.

Regards,
Orit

Note You need to log in before you can comment on or make changes to this bug.