Bug 1399079 - Swift replication might skip suffixes temporarily
Summary: Swift replication might skip suffixes temporarily
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-swift
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
medium
urgent
Target Milestone: z1
: 10.0 (Newton)
Assignee: Pete Zaitcev
QA Contact: Mike Abrams
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-28 09:08 UTC by Christian Schwede (cschwede)
Modified: 2017-04-22 04:57 UTC (History)
11 users (show)

Fixed In Version: openstack-swift-2.10.1-1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-01 14:36:10 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1634967 0 None None None 2016-11-28 09:09:08 UTC
Launchpad 1644807 0 None None None 2016-11-28 09:08:17 UTC
OpenStack gerrit 402509 0 None None None 2017-01-16 16:53:02 UTC
OpenStack gerrit 406101 0 None None None 2017-01-16 16:53:51 UTC
Red Hat Product Errata RHBA-2017:0235 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 Storage Bug Fix and Enhancement Advisory 2017-02-01 19:34:42 UTC

Description Christian Schwede (cschwede) 2016-11-28 09:08:17 UTC
Description of problem:

1. Upload an object
2. Replicator runs
3. Upload another object that uses the same partition but different suffix
4. Loose a disk before replicator touched this partition, replace it
5. Run replicators again

Object suffix from #1 will be replicated, object suffixes from #2 won't.

10% of the hashes will be outdated and updated on every replicator run.
However, the order of partitions are randomized, and this means that it could require more than 10 replicator runs before a hashes.pkl is fixed, while others will be updated much more often. This should be more deterministic.

In fact it is very likely that it takes 50..70 replication cycles until this is fixed, depending on the partition count per replicator.

See upstream bugs for more details:

https://bugs.launchpad.net/swift/+bug/1634967
https://bugs.launchpad.net/swift/+bug/1644807


Version-Release number of selected component (if applicable):

Mitaka, Newton (OSP9 & OSP10)

How reproducible: Always

Steps to Reproduce: See above.

Actual results:See above.

Expected results: All hashesh.pkl updated after the 10th replication pass.

Comment 1 Christian Schwede (cschwede) 2016-11-28 09:16:59 UTC
The bug in the hashes.pkl has been fixed on master, and backports have been submitted for review:

https://review.openstack.org/#/q/Ie2700f6e6171f2ecfa7d07b0f18b79e90cbf1c8a,n,z

The fix to make the invalidation of the hashes.pkl deterministic is ready for review on master:

https://review.openstack.org/#/c/402376/

IMO both patches are critical, and we need to backport them both for Mitaka and Newton & publish updated rpms quickly. Thoughts?

Comment 3 Elise Gafford 2016-11-30 14:10:30 UTC
Requires backports through 9.0 when repaired upstream.

Comment 5 Jon Schlueter 2017-01-16 16:53:03 UTC
updating for stable/newton reviews

Comment 11 errata-xmlrpc 2017-02-01 14:36:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0235.html


Note You need to log in before you can comment on or make changes to this bug.