Description of problem (please be detailed as possible and provide log snippests): Support to have at least 2 pods per noobaa component (endpoint, core and db). Because K8S spinning a new pod could last up to 10 minutes and those 10 minutes means with a whole object storage downtime (PUT and GET requests).
4.6 is at dev freeze, as an RFE it should move to 4.7
@nimrod is the ask in this BZ different than Bug 1874243 - [RFE] Noobaa resources are impacted with a downtime during admin operations, such as upgrade, due to no HA for noobaa-core and noobaa-db
In its core, the solution would be the same, the other bug talks about admin ops, this one about failure. But the same solution would apply.
Hi, We are trying very hard to reduce OCS footprint, doubling Noobaa pods is going in the opposite direction. Any addition of pods or resources to OCS should be consulted and agreed with the OCS architects. I would recommend focusing on reducing the time it takes to detect a failure and respining a new Nooba pod. The Rook team successfully reduced the OSD pods respin to less than a minute, where most of the time spent was related to detach/attach PVs. There is a misconception that having 2 instances will make recovery faster but if you take into account the failure detection period, the failover and the time required to make sure the other instance is down, you may find it is higher than respining a new pod. Regards, Orit