Bug 1472868 - With RGWs configured in a load balancer, quota stats cache doesn't work
Summary: With RGWs configured in a load balancer, quota stats cache doesn't work
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RGW
Version: 2.3
Hardware: x86_64
OS: Linux
low
low
Target Milestone: rc
: 3.1
Assignee: Orit Wasserman
QA Contact: Vidushi Mishra
URL:
Whiteboard:
Depends On: 1523246
Blocks: 1473188 1473436 1584264
TreeView+ depends on / blocked
 
Reported: 2017-07-19 14:55 UTC by Benjamin Schmaus
Modified: 2021-06-10 12:38 UTC (History)
16 users (show)

Fixed In Version: RHEL: ceph-12.2.5-12.el7cp Ubuntu: ceph_12.2.5-3redhat1xenial
Doc Type: Bug Fix
Doc Text:
.Quota stats cache is no longer invalid Previously in {product}, quota values sometimes were not properly decremented. This could cause exceed errors when the quota was not actually exceeded. With this update to Ceph, quota values are properly decremented and no incorrect errors are printed.
Clone Of:
Environment:
Last Closed: 2018-09-26 18:16:41 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 20661 0 None None None 2017-07-19 14:55:00 UTC
Ceph Project Bug Tracker 20934 0 None None None 2017-08-21 16:07:09 UTC
Red Hat Product Errata RHBA-2018:2819 0 None None None 2018-09-26 18:17:47 UTC

Description Benjamin Schmaus 2017-07-19 14:55:01 UTC
Description of problem:

With RGWs configured in a load balancer, quota stats cache can possibly run into unbound values. We have found errors like below in our clusters running Jewel. This happens when PUT and DELETE operations do not hit the same RGW and an eventual update_stats() from a DELETE operation tries to decrement the stats cache. This can be easily verified by having RGWs configured in a load balancer(I've used HAProxy in RR mode) and running a script to upload/delete objects, with the user quota enabled.

20 quota: can't use cached stats, exceeded soft threshold (num objs): 18446744073709551615 >= 190000

10 quota exceeded: stats.num_kb_rounded=18446744073709549572 size_kb=1024 user_quota.max_size_kb=5242880000


Version-Release number of selected component (if applicable):
2.3

How reproducible:
90%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 Orit Wasserman 2017-08-21 16:07:52 UTC
Additional fix is needed, upstream pr:
https://github.com/ceph/ceph/pull/17116

Comment 9 shilpa 2017-09-11 07:58:41 UTC
@orit,

Could you please suggest a reproducer for this?

Comment 10 Orit Wasserman 2017-09-11 08:31:56 UTC
(In reply to shilpa from comment #9)
> @orit,
> 
> Could you please suggest a reproducer for this?

Hi Shilpa,
You will need a cluster with at least two radosgw.
You will need to configure a load balancer for example HAproxy.
Use a script that does several PUT objects and than several DELETE objects.
The load balancer would send the requests to different radosgws and you should hit the bug.

Comment 19 Christina Meno 2017-10-03 20:23:33 UTC
Moving back to assigned based on Marcus' comments above.

Matt/Orit Is this something we have ability to fix/retest in the next week?

Comment 32 John Brier 2018-08-31 19:28:09 UTC
Thanks Orit!

Comment 34 errata-xmlrpc 2018-09-26 18:16:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819


Note You need to log in before you can comment on or make changes to this bug.