1238147 – Object expirer daemon times out and raises exception while attempting to expire a million objects

Bug 1238147 - Object expirer daemon times out and raises exception while attempting to expire a million objects

Summary: Object expirer daemon times out and raises exception while attempting to expi...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	gluster-swift
Sub Component:
Version:	rhgs-3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.1.1
Assignee:	Thiago da Silva
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1251815
TreeView+	depends on / blocked

Reported:	2015-07-01 09:45 UTC by Prashanth Pai
Modified:	2019-10-10 09:54 UTC (History)
CC List:	5 users (show)
Fixed In Version:	swiftonfile-1.13.1-3
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-10-05 07:17:31 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1845	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.1 update	2015-10-05 11:06:22 UTC

Description Prashanth Pai 2015-07-01 09:45:24 UTC

Description of problem:
While the object expirer in it's current form works very well for cases where the number of objects to be expired is small to medium in number (thousands). The expirer is not able to complete it's pass and expire a million objects slated to expired at the same time.

For every object to be expired in real data volume, there's a corresponding tracker object (zero byte file) in "gsexpiring" volume. The object-expirer attempts to do a GET on the container in "gsexpiring" volume. The container-server times out before it can reply to object-server. This is because all the tracker objects end up in the same container and the container-server has to crawl a fairly huge filesystem tree which (not surprisingly) is a slow operation. The "bug" or flaw here is all the tracker objects ending up in the same container when the timestamp to be expired is same.

Version-Release number of selected component (if applicable):
RHS3.0u4
gluster-swift 1.13.1

How reproducible:
1. PUT a million object set to be expired at the same time.
2. Run the object-server daemon to expire them.

Actual results:

object-expirer: Unhandled exception:
Traceback (most recent call last) :
File “/usr/lib/python2.6/site-packages/swift/obj/expirer.py”, line 115, in run_once
container) :
File “/usr/lib/python2.6/site-packages/swift/common/internal_client.py”, line 247, in _iter_items
{}, acceptable_statuses)
File "/usr/lib/python2.6/site-packages/swift/common/internal_client.py", line 186, in make_request
_('Unexpected response: %s') % resp.status, resp)
UnexpectedResponse: Unexpected response: 503 Internal Server Error (txn: txXXX-YYY)

From logs:
Swift: ERROR with Container server 127.0.0.1:6011/gsexpiring re: Trying to GET /v1/gsexpiring/1431648000: Timeout (10s) (txn: tx....)

Expected results:
The object expirer to smoothly finish it's pass and expired the objects.

Additional info:
The enhancement is to place tracker objects into multiple container instead of one. This fix is present in later versions of Swift but not in 1.13.1. The fix has to be backported to 1.13.1

Comment 4 Prashanth Pai 2015-08-18 06:55:25 UTC

This fix is present in later versions of Swift but not in 1.13.1. The fix has been back-ported to 1.13.1 in downstream RPMs by RHOS team (thanks to Pete Zaitcev)

Link to upstream Swift fix in master: https://review.openstack.org/113394

Comment 5 SATHEESARAN 2015-09-03 09:31:18 UTC

Tested with swiftonfile-1.13.1-4.el7rhgs.noarch with the following steps

1. Created a million objects to get expired at the same time
2. Checked that all the objects are cleaned up when they got expired

Comment 7 errata-xmlrpc 2015-10-05 07:17:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html

Note You need to log in before you can comment on or make changes to this bug.