Bug 1312812

Summary: [RHEL-7] gluster-swift: Object expiration feature does not scale well as object expirer daemon times out
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Prashanth Pai <ppai>
Component: gluster-swiftAssignee: Prashanth Pai <ppai>
Status: CLOSED ERRATA QA Contact: surabhi <sbhaloth>
Severity: medium Docs Contact:
Priority: medium    
Version: rhgs-3.1CC: asriram, nlevinki, rcyriac, rhinduja, rhs-bugs, thiago
Target Milestone: ---Keywords: FutureFeature, ZStream
Target Release: RHGS 3.1.3   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: swiftonfile-2.3.0-2 Doc Type: Enhancement
Doc Text:
The Improve object expiration feature is enhanced by reducing chances of object expirer daemon timing out. Since the number of zero-byte tracker objects in gsexpiring volume increases to millions in number, container server takes a lot of time to return the listing of objects to object expirer daemon. This causes object expirer daemon to timeout on the request sent to container server. Consequently, object expirer daemon cannot proceed with expiration of objects. Hence, it is now less likely for object expirer daemon to timeout on request sent to container server.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-23 05:31:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1311386    

Description Prashanth Pai 2016-02-29 10:00:24 UTC
Description of problem:
gluster-swift object expiration feature needs a separate volume to store tracker objects which are zero byte files. Creation of these zero byte files results in PUT performance overhead. Further, the admin/user has to create a separate volume for these tracker files. Further, crawling this volume is also a slow process and it's metadata intensive.

Comment 3 Prashanth Pai 2016-04-07 06:54:57 UTC
Upstream change on review here: http://review.gluster.org/13913

Comment 4 Prashanth Pai 2016-04-15 14:14:05 UTC
Downstream change: https://code.engineering.redhat.com/gerrit/#/c/72322/ (MERGED)

Comment 5 Prashanth Pai 2016-04-19 08:36:29 UTC
After an upgrade, the new expirer can be invoked as follows:

swift-init object-expirer stop
gluster-swift-object-expirer /etc/swift/object-expirer.conf &

As of now, it cannot be invoked using systemctl

Comment 6 Prashanth Pai 2016-04-21 11:20:16 UTC
With the decision to keep RHGS 3.1.3 rhel6 variant in icehouse and rhel7 variant in, there's disparity in name of the object expirer daemon and subsequently the command the user has to invoke to manage lifecycle of the object expirer daemon.

As of build swiftonfile-2.3.0-1.el7rhgs (rhel7) and swiftonfile-1.13.1-6.el6rhs (rhel6):

On el6:
# swift-init object-expirer [start|stop|restart|status]

On el7:
# gluster-swift-object-expirer -o /etc/swift/object-expirer.conf &

This creates ambiguity and confusion in object expiration documentation for rhel6 and rhel7. Further, rhel7 users have to stop the old deamon and start the new one which has a different name.

Fix:
On rhel7, allow starting of the new object expirer daemon just like how it is in rhel6 i.e:

# swift-init object-expirer [start|stop|restart|status]

The above mentioned change has been put in swiftonfile-2.3.0-2 build.

Comment 7 Prashanth Pai 2016-04-25 07:23:26 UTC
As per conversation with QE, rephrased the BZ title to alleviate confusion and to be more accurate about the problem this BZ tries to address.

Comment 8 surabhi 2016-06-01 05:41:40 UTC
Executed object expiration tests with possible options on 500000 objects and there are no issues seen.Also verified the new swift object expired deamon, it starts and stops properly.

The basic perf still needs to tested from 3.1.2  and the upgrade path. Testing in progress, if any issues found there will raise another BZ. Marking this as verified.

Comment 12 errata-xmlrpc 2016-06-23 05:31:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1289