Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1472868

Summary:	With RGWs configured in a load balancer, quota stats cache doesn't work
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Benjamin Schmaus <bschmaus>
Component:	RGW	Assignee:	Orit Wasserman <owasserm>
Status:	CLOSED ERRATA	QA Contact:	Vidushi Mishra <vimishra>
Severity:	low	Docs Contact:
Priority:	low
Version:	2.3	CC:	anharris, cbodley, ceph-eng-bugs, gmeno, hnallurv, jbrier, kbader, kdreyer, mbenjamin, mhackett, mwatts, owasserm, smanjara, sweil, tchandra, tserlin
Target Milestone:	rc
Target Release:	3.1
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	RHEL: ceph-12.2.5-12.el7cp Ubuntu: ceph_12.2.5-3redhat1xenial	Doc Type:	Bug Fix
Doc Text:	.Quota stats cache is no longer invalid Previously in {product}, quota values sometimes were not properly decremented. This could cause exceed errors when the quota was not actually exceeded. With this update to Ceph, quota values are properly decremented and no incorrect errors are printed.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-09-26 18:16:41 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1523246
Bug Blocks:	1473188, 1473436, 1584264

Description Benjamin Schmaus 2017-07-19 14:55:01 UTC

Description of problem:

With RGWs configured in a load balancer, quota stats cache can possibly run into unbound values. We have found errors like below in our clusters running Jewel. This happens when PUT and DELETE operations do not hit the same RGW and an eventual update_stats() from a DELETE operation tries to decrement the stats cache. This can be easily verified by having RGWs configured in a load balancer(I've used HAProxy in RR mode) and running a script to upload/delete objects, with the user quota enabled.

20 quota: can't use cached stats, exceeded soft threshold (num objs): 18446744073709551615 >= 190000

10 quota exceeded: stats.num_kb_rounded=18446744073709549572 size_kb=1024 user_quota.max_size_kb=5242880000


Version-Release number of selected component (if applicable):
2.3

How reproducible:
90%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 Orit Wasserman 2017-08-21 16:07:52 UTC

Additional fix is needed, upstream pr:
https://github.com/ceph/ceph/pull/17116

Comment 9 shilpa 2017-09-11 07:58:41 UTC

@orit,

Could you please suggest a reproducer for this?

Comment 10 Orit Wasserman 2017-09-11 08:31:56 UTC

(In reply to shilpa from comment #9)
> @orit,
> 
> Could you please suggest a reproducer for this?

Hi Shilpa,
You will need a cluster with at least two radosgw.
You will need to configure a load balancer for example HAproxy.
Use a script that does several PUT objects and than several DELETE objects.
The load balancer would send the requests to different radosgws and you should hit the bug.

Comment 19 Christina Meno 2017-10-03 20:23:33 UTC

Moving back to assigned based on Marcus' comments above.

Matt/Orit Is this something we have ability to fix/retest in the next week?

Comment 32 John Brier 2018-08-31 19:28:09 UTC

Thanks Orit!

Comment 34 errata-xmlrpc 2018-09-26 18:16:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2819