Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1238147 - Object expirer daemon times out and raises exception while attempting to expire a million objects
Object expirer daemon times out and raises exception while attempting to expi...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: gluster-swift (Show other bugs)
3.0
Unspecified Unspecified
high Severity medium
: ---
: RHGS 3.1.1
Assigned To: Thiago da Silva
SATHEESARAN
: ZStream
Depends On:
Blocks: 1251815
  Show dependency treegraph
 
Reported: 2015-07-01 05:45 EDT by Prashanth Pai
Modified: 2015-10-05 03:17 EDT (History)
5 users (show)

See Also:
Fixed In Version: swiftonfile-1.13.1-3
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-10-05 03:17:31 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1845 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.1 update 2015-10-05 07:06:22 EDT

  None (edit)
Description Prashanth Pai 2015-07-01 05:45:24 EDT
Description of problem:
While the object expirer in it's current form works very well for cases where the number of objects to be expired is small to medium in number (thousands). The expirer is not able to complete it's pass and expire a million objects slated to expired at the same time.

For every object to be expired in real data volume, there's a corresponding tracker object (zero byte file) in "gsexpiring" volume. The object-expirer attempts to do a GET on the container in "gsexpiring" volume. The container-server times out before it can reply to object-server. This is because all the tracker objects end up in the same container and the container-server has to crawl a fairly huge filesystem tree which (not surprisingly) is a slow operation. The "bug" or flaw here is all the tracker objects ending up in the same container when the timestamp to be expired is same.

Version-Release number of selected component (if applicable):
RHS3.0u4
gluster-swift 1.13.1

How reproducible:
1. PUT a million object set to be expired at the same time.
2. Run the object-server daemon to expire them.

Actual results:

object-expirer: Unhandled exception: 
Traceback (most recent call last) :
 File “/usr/lib/python2.6/site-packages/swift/obj/expirer.py”, line 115, in run_once
 container) :
 File “/usr/lib/python2.6/site-packages/swift/common/internal_client.py”, line 247, in _iter_items
  {}, acceptable_statuses)
 File "/usr/lib/python2.6/site-packages/swift/common/internal_client.py", line 186, in make_request
 _('Unexpected response: %s') % resp.status, resp)
UnexpectedResponse: Unexpected response: 503 Internal Server Error (txn: txXXX-YYY)

From logs:
Swift: ERROR with Container server 127.0.0.1:6011/gsexpiring re: Trying to GET /v1/gsexpiring/1431648000: Timeout (10s) (txn: tx....)


Expected results:
The object expirer to smoothly finish it's pass and expired the objects.


Additional info:
The enhancement is to place tracker objects into multiple container instead of one. This fix is present in later versions of Swift but not in 1.13.1. The fix has to be backported to 1.13.1
Comment 4 Prashanth Pai 2015-08-18 02:55:25 EDT
This fix is present in later versions of Swift but not in 1.13.1. The fix has been back-ported to 1.13.1 in downstream RPMs by RHOS team (thanks to Pete Zaitcev)

Link to upstream Swift fix in master: https://review.openstack.org/113394
Comment 5 SATHEESARAN 2015-09-03 05:31:18 EDT
Tested with swiftonfile-1.13.1-4.el7rhgs.noarch with the following steps

1. Created a million objects to get expired at the same time
2. Checked that all the objects are cleaned up when they got expired
Comment 7 errata-xmlrpc 2015-10-05 03:17:31 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html

Note You need to log in before you can comment on or make changes to this bug.