Bug 1302613

Summary: Replicator does not delete handoff copies, thus leaving multiple copies in cluster and disk usage increases
Product: Red Hat OpenStack Reporter: Christian Schwede (cschwede) <cschwede>
Component: openstack-swiftAssignee: Pete Zaitcev <zaitcev>
Status: CLOSED ERRATA QA Contact: Mike Abrams <mabrams>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: bbilgin, derekh, furlongm, mabrams, scohen, srevivo, zaitcev
Target Milestone: ---Keywords: ZStream
Target Release: 7.0 (Kilo)Flags: cschwede: needinfo+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-swift-2.3.0-5.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-05 19:15:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Christian Schwede (cschwede) 2016-01-28 09:33:19 UTC
Description of problem:

The swift object replicator does not delete a handoff copy that is successfully replicated to its primary nodes. Thus data is never deleted when a rebalance changes the cluster layout; for example when adding new disks data is replicated to the new disks, but not removed from the existing ones. Therefore a rebalance will actually not rebalance the data distribution. In fact total data usage will increase without adding new data.

Most likely happens with geo-replicated clusters.

Version-Release number of selected component (if applicable):

2.3
2.5

openstack-swift-object-2.3.0-2.el7ost.noarch

How reproducible:

Always.

Steps to Reproduce:
1. Create a Swift cluster with multiple regions
2. Upload data
3. Add or remove a node or disk from cluster and rebalance
4. Data is left on the old disks and not removed, even if it is a handoff node.

Actual results:

Data is NOT removed from a handoff node after successful replication.

Expected results:

Data is removed from a handoff node after successful replication.

Additional info:

Fixed upstream: https://github.com/openstack/swift/commit/d01cd425094c2e56e4e89dbf3eaf887815dd5b62

Also affects Swift 2.5 (as stated in the linked Launchpad bug entry).

Error message from object-replicator:

Jan 28 11:13:17 ******* object-replicator: Error syncing handoff partition: #012Traceback (most recent call last):#012  File "/usr/lib/python2.7/site-packages/swift/obj/replicator.py", line 269, in update_deleted#012    delete_objs = delete_objs.intersection(cand_objs)#012AttributeError: 'list' object has no attribute 'intersection'

Comment 4 Christian Schwede (cschwede) 2016-02-02 07:46:58 UTC
This has been also fixed upstream in stable/kilo: https://review.openstack.org/#/c/232696/

Comment 12 errata-xmlrpc 2016-10-05 19:15:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2028.html