1853638 – [RFE] - Can Force deletion of noobaa-db be automatically handled in case on hosting node shutdown (similar to OSD & MONS)

Bug 1853638 - [RFE] - Can Force deletion of noobaa-db be automatically handled in case on hosting node shutdown (similar to OSD & MONS)

Summary: [RFE] - Can Force deletion of noobaa-db be automatically handled in case on h...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	Multi-Cloud Object Gateway
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	ODF 4.9.0
Assignee:	Nimrod Becker
QA Contact:	Ben Eli
Docs Contact:
URL:
Whiteboard:
Duplicates (4):	1889616 1898969 1931940 1949727 (view as bug list)
Depends On:
Blocks:	1931936 2011326
TreeView+	depends on / blocked

Reported:	2020-07-03 12:26 UTC by Neha Berry
Modified:	2024-10-01 16:41 UTC (History)
CC List:	14 users (show)
Fixed In Version:	v4.9.0-51.ci
Doc Type:	Enhancement
Doc Text:	.Movement of Core and DB pods is enabled when a node fails OpenShift Container Platform does not mark the node as disconnected unless it is deleted. As a result, Core and DB pods, which are the statefulsets are not automatically evicted on such failed nodes. With this update, when a node fails, the DB and Core pods are evicted and moved to a new node.
Clone Of:
Environment:
Last Closed:	2021-12-13 17:44:23 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	noobaa noobaa-operator pull 672	None	open	NooBaa Component Rescheduling	2021-08-04 14:08:42 UTC
Github	red-hat-storage ocs-ci pull 4966/	None	None	None	2022-02-04 07:56:11 UTC
Red Hat Product Errata	RHSA-2021:5086	None	None	None	2021-12-13 17:44:44 UTC

Description Neha Berry 2020-07-03 12:26:23 UTC

Description of problem:
===============================

Currently, if the node hosting the noobaa-db pod is shutdown, one has to explicitly force delete the noobaa-db pod since it stays stuck in Terminating state (as it has an RWO PVC mounted). The Bug 1783961 was closed as WONTFIX in OCS 4.2 timeframe.

reason:

NAME                           READY   AGE
statefulset.apps/noobaa-core   1/1     4d1h
statefulset.apps/noobaa-db     1/1     4d1h



But, recently on an ask from IBM, following changes were done for automated force deletion of  OSDs and MONs which used to exhibit similar behavior on hosting node shutdown. With the fix for Bug 1830015 and Bug #1835908(OCS 4.4.1- Bug 1848184), now rook-ceph operator pod force deletes an OSD and MON pod to enable it to get scheduled on another spare node.


>> Can similar functionality or approach be taken by noobaa-operator or other >> operator to handle the force deletion of the Terminating noobaa-db pod ?

If this ask is invalid and cannot be resolved due to other constraints, please let me know.  Otherwise, I felt like raising it as an RFE in case it can be achieved in later releases (not necessarily on priority)

Reasons for ask:
----------------

1. Till one force deletes the pod or the shutdown node is POWERED ON, the noobaa-db stays in Terminating state and hence noobaa DB is inaccessible

2. As a result of this, noobaa-endpoint and other noobaa-pods also get affected and keep changing state to CrashLoopBackOff



Version-Release number of selected component (if applicable):
==========================================

Since OCS 4.2 , the issue is documented in KNOWN Issues in every release note

How reproducible:
Always

Comment 3 Nimrod Becker 2020-09-30 12:26:22 UTC

Won't make it to 4.6, shoould push to 4.7

Comment 4 Nimrod Becker 2020-10-01 09:18:28 UTC

Following a triage with QE, moving to 4.7

Comment 5 Nimrod Becker 2020-11-19 11:08:40 UTC

*** Bug 1898969 has been marked as a duplicate of this bug. ***

Comment 8 Nimrod Becker 2021-03-04 13:13:17 UTC

*** Bug 1931940 has been marked as a duplicate of this bug. ***

Comment 11 Nimrod Becker 2021-04-05 12:53:41 UTC

*** Bug 1889616 has been marked as a duplicate of this bug. ***

Comment 14 Nimrod Becker 2021-04-18 06:53:12 UTC

*** Bug 1949727 has been marked as a duplicate of this bug. ***

Comment 28 errata-xmlrpc 2021-12-13 17:44:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:5086

Note You need to log in before you can comment on or make changes to this bug.