Description of problem: While the object expirer in it's current form works very well for cases where the number of objects to be expired is small to medium in number (thousands). The expirer is not able to complete it's pass and expire a million objects slated to expired at the same time. For every object to be expired in real data volume, there's a corresponding tracker object (zero byte file) in "gsexpiring" volume. The object-expirer attempts to do a GET on the container in "gsexpiring" volume. The container-server times out before it can reply to object-server. This is because all the tracker objects end up in the same container and the container-server has to crawl a fairly huge filesystem tree which (not surprisingly) is a slow operation. The "bug" or flaw here is all the tracker objects ending up in the same container when the timestamp to be expired is same. Version-Release number of selected component (if applicable): RHS3.0u4 gluster-swift 1.13.1 How reproducible: 1. PUT a million object set to be expired at the same time. 2. Run the object-server daemon to expire them. Actual results: object-expirer: Unhandled exception: Traceback (most recent call last) : File “/usr/lib/python2.6/site-packages/swift/obj/expirer.py”, line 115, in run_once container) : File “/usr/lib/python2.6/site-packages/swift/common/internal_client.py”, line 247, in _iter_items {}, acceptable_statuses) File "/usr/lib/python2.6/site-packages/swift/common/internal_client.py", line 186, in make_request _('Unexpected response: %s') % resp.status, resp) UnexpectedResponse: Unexpected response: 503 Internal Server Error (txn: txXXX-YYY) From logs: Swift: ERROR with Container server 127.0.0.1:6011/gsexpiring re: Trying to GET /v1/gsexpiring/1431648000: Timeout (10s) (txn: tx....) Expected results: The object expirer to smoothly finish it's pass and expired the objects. Additional info: The enhancement is to place tracker objects into multiple container instead of one. This fix is present in later versions of Swift but not in 1.13.1. The fix has to be backported to 1.13.1
This fix is present in later versions of Swift but not in 1.13.1. The fix has been back-ported to 1.13.1 in downstream RPMs by RHOS team (thanks to Pete Zaitcev) Link to upstream Swift fix in master: https://review.openstack.org/113394
Tested with swiftonfile-1.13.1-4.el7rhgs.noarch with the following steps 1. Created a million objects to get expired at the same time 2. Checked that all the objects are cleaned up when they got expired
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1845.html