1901442 – Backing store in a state of IO_ERROR (when using non-production memory limits for Noobaa)

Bug 1901442 - Backing store in a state of IO_ERROR (when using non-production memory limits for Noobaa)

Summary: Backing store in a state of IO_ERROR (when using non-production memory limits...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	Multi-Cloud Object Gateway
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Nimrod Becker
QA Contact:	Raz Tamir
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-11-25 08:57 UTC by aberner
Modified:	2023-08-09 16:49 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-09-01 06:51:07 UTC
Embargoed:

Attachments	(Terms of Use)
Prometheus screenshot 1 (87.49 KB, image/png) 2020-12-02 08:19 UTC, aberner	no flags	Details
Prometheus screenshot 2 (88.01 KB, image/png) 2020-12-02 08:21 UTC, aberner	no flags	Details
Prometheus screenshot 3 (88.06 KB, image/png) 2020-12-02 08:21 UTC, aberner	no flags	Details
Prometheus screenshot 4 (191.72 KB, image/png) 2020-12-02 08:23 UTC, aberner	no flags	Details
View All

Comment 7 aberner 2020-12-02 08:12:53 UTC

We were able to reproduce the issue manually in vsphere on a limited cluster (same 500mb memory endpoint) as well, while the production cluster passed.
After further investigation with Ohad, we found out that the cause of this issue is Nodejs allocating memory over the limit of the pod which causes it to restart. 
This is why the platform is not relevant and will happen on any platform if the resource is limited.

Since the issue will not reproduce on a production environment (by default nodejs limits it's memory usage to less then 2gb) The next course of action should be to find a way to limit NodeJS to the limitation of the pod (for the dev env).


Im attaching screenshots from Prometheus that clearly shows the memory spike right before the endpoint restarts.

Comment 8 aberner 2020-12-02 08:19:29 UTC

Created attachment 1735505 [details]
Prometheus screenshot 1

Comment 9 aberner 2020-12-02 08:21:31 UTC

Created attachment 1735506 [details]
Prometheus screenshot 2

Comment 10 aberner 2020-12-02 08:21:57 UTC

Created attachment 1735507 [details]
Prometheus screenshot 3

Comment 11 aberner 2020-12-02 08:23:06 UTC

Created attachment 1735508 [details]
Prometheus screenshot 4

Note You need to log in before you can comment on or make changes to this bug.