Bug 1707488 - containerized RGW default memory too high
Summary: containerized RGW default memory too high
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 4.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: z2
: 4.1
Assignee: Guillaume Abrioux
QA Contact: Ameena Suhani S H
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-07 15:39 UTC by Tim Wilkinson
Modified: 2020-09-30 17:25 UTC (History)
12 users (show)

Fixed In Version: ceph-ansible-4.0.29-1.el8cp, ceph-ansible-4.0.29-1.el7cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-30 17:24:49 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 5531 0 None closed rgw: set container memory limit to 4g 2020-11-21 15:51:13 UTC
Red Hat Product Errata RHBA-2020:4144 0 None None None 2020-09-30 17:25:27 UTC

Comment 7 Giridhar Ramaraju 2019-08-05 13:09:13 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 8 Giridhar Ramaraju 2019-08-05 13:10:32 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 12 Ben England 2020-01-27 21:38:35 UTC
about comment 1:

you said you set the CGroup memory limit to 2GB, but the OOM kill happened at 6 GB.     Why wasn't it killed at 2 GB?

Also, why didn't the clients fail over to a different RGW server and continue running?  Perhaps a load balancer wasn't used?

about comment 3:

if "we cannot identify a reliable memory limit" then the proposed workaround is not really preventing the problem from occurring later, just postponing it, right.   We have to know ahead of time how much memory RGW requires for a variety of reasons.

Comment 13 Ben England 2020-01-27 21:42:48 UTC
cc'ing Karan Singh, who has worked with RGW in some really large configurations (1 billion objects).

https://docs.google.com/document/d/1uKq5TLZFDc5IWVCa5EekWQU6eoB5QOmBXE05FVpy6QU/edit

Karan, any sign of RGW daemon memory usage growth during your tests?

Comment 15 Yaniv Kaul 2020-04-22 13:36:05 UTC
Matt, what's the next step here?

Comment 19 karan singh 2020-05-22 15:16:54 UTC
Matt / Mkogon

I am in the middle of ingesting 10 Billion objects (as I write this email, 800 Million has been successfully ingested) if you guys want me to capture this data point, you need to provide me the instructions to capture this. Currently, I do not get any metrics with the name of RGW memory in Prometheus. 

If you like i can give you SSH access to the env

Comment 28 errata-xmlrpc 2020-09-30 17:24:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4144


Note You need to log in before you can comment on or make changes to this bug.