Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1707488

Summary: containerized RGW default memory too high
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tim Wilkinson <twilkins>
Component: Ceph-AnsibleAssignee: Guillaume Abrioux <gabrioux>
Status: CLOSED ERRATA QA Contact: Ameena Suhani S H <amsyedha>
Severity: high Docs Contact:
Priority: high    
Version: 4.0CC: amsyedha, aschoen, bengland, ceph-eng-bugs, gmeno, jharriga, karan, mbenjamin, mkogan, nthomas, tserlin, vereddy
Target Milestone: z2Keywords: Reopened
Target Release: 4.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-ansible-4.0.29-1.el8cp, ceph-ansible-4.0.29-1.el7cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-30 17:24:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 7 Giridhar Ramaraju 2019-08-05 13:09:13 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 8 Giridhar Ramaraju 2019-08-05 13:10:32 UTC
Updating the QA Contact to a Hemant. Hemant will be rerouting them to the appropriate QE Associate. 

Regards,
Giri

Comment 12 Ben England 2020-01-27 21:38:35 UTC
about comment 1:

you said you set the CGroup memory limit to 2GB, but the OOM kill happened at 6 GB.     Why wasn't it killed at 2 GB?

Also, why didn't the clients fail over to a different RGW server and continue running?  Perhaps a load balancer wasn't used?

about comment 3:

if "we cannot identify a reliable memory limit" then the proposed workaround is not really preventing the problem from occurring later, just postponing it, right.   We have to know ahead of time how much memory RGW requires for a variety of reasons.

Comment 13 Ben England 2020-01-27 21:42:48 UTC
cc'ing Karan Singh, who has worked with RGW in some really large configurations (1 billion objects).

https://docs.google.com/document/d/1uKq5TLZFDc5IWVCa5EekWQU6eoB5QOmBXE05FVpy6QU/edit

Karan, any sign of RGW daemon memory usage growth during your tests?

Comment 15 Yaniv Kaul 2020-04-22 13:36:05 UTC
Matt, what's the next step here?

Comment 19 karan singh 2020-05-22 15:16:54 UTC
Matt / Mkogon

I am in the middle of ingesting 10 Billion objects (as I write this email, 800 Million has been successfully ingested) if you guys want me to capture this data point, you need to provide me the instructions to capture this. Currently, I do not get any metrics with the name of RGW memory in Prometheus. 

If you like i can give you SSH access to the env

Comment 28 errata-xmlrpc 2020-09-30 17:24:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4144