1707488 – containerized RGW default memory too high

Bug 1707488 - containerized RGW default memory too high

Summary: containerized RGW default memory too high

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	4.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	z2
Target Release:	4.1
Assignee:	Guillaume Abrioux
QA Contact:	Ameena Suhani S H
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-05-07 15:39 UTC by Tim Wilkinson
Modified:	2020-09-30 17:25 UTC (History)
CC List:	12 users (show)
Fixed In Version:	ceph-ansible-4.0.29-1.el8cp, ceph-ansible-4.0.29-1.el7cp
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-09-30 17:24:49 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 5531	0	None	closed	rgw: set container memory limit to 4g	2020-11-21 15:51:13 UTC
Red Hat Product Errata	RHBA-2020:4144	0	None	None	None	2020-09-30 17:25:27 UTC

Comment 7 Giridhar Ramaraju 2019-08-05 13:09:13 UTC

Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 8 Giridhar Ramaraju 2019-08-05 13:10:32 UTC

Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 12 Ben England 2020-01-27 21:38:35 UTC

about comment 1:

you said you set the CGroup memory limit to 2GB, but the OOM kill happened at 6 GB.     Why wasn't it killed at 2 GB?

Also, why didn't the clients fail over to a different RGW server and continue running?  Perhaps a load balancer wasn't used?

about comment 3:

if "we cannot identify a reliable memory limit" then the proposed workaround is not really preventing the problem from occurring later, just postponing it, right.   We have to know ahead of time how much memory RGW requires for a variety of reasons.

Comment 13 Ben England 2020-01-27 21:42:48 UTC

cc'ing Karan Singh, who has worked with RGW in some really large configurations (1 billion objects).

https://docs.google.com/document/d/1uKq5TLZFDc5IWVCa5EekWQU6eoB5QOmBXE05FVpy6QU/edit

Karan, any sign of RGW daemon memory usage growth during your tests?

Comment 15 Yaniv Kaul 2020-04-22 13:36:05 UTC

Matt, what's the next step here?

Comment 19 karan singh 2020-05-22 15:16:54 UTC

Matt / Mkogon

I am in the middle of ingesting 10 Billion objects (as I write this email, 800 Million has been successfully ingested) if you guys want me to capture this data point, you need to provide me the instructions to capture this. Currently, I do not get any metrics with the name of RGW memory in Prometheus. 

If you like i can give you SSH access to the env

Comment 28 errata-xmlrpc 2020-09-30 17:24:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4144

Note You need to log in before you can comment on or make changes to this bug.