1383268 – Deployed Swift rings use way too high partition power

Bug 1383268 - Deployed Swift rings use way too high partition power

Summary: Deployed Swift rings use way too high partition power

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	10.0 (Newton)
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	10.0 (Newton)
Assignee:	Christian Schwede (cschwede)
QA Contact:	Arik Chernetsky
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-10-10 10:22 UTC by Christian Schwede (cschwede)
Modified:	2016-12-14 16:15 UTC (History)
CC List:	10 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-5.0.0-0.8.0rc3.el7ost,instack-undercloud-5.0.0-2.el7ost,
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-12-14 16:15:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1631926	None	None	None	2016-10-10 11:36:54 UTC
OpenStack gerrit	384436	None	None	None	2016-10-10 11:37:14 UTC
OpenStack gerrit	384439	None	None	None	2016-10-10 11:44:57 UTC
Red Hat Product Errata	RHEA-2016:2948	normal	SHIPPED_LIVE	Red Hat OpenStack Platform 10 enhancement update	2016-12-14 19:55:27 UTC

Internal Links: 1381721

Description Christian Schwede (cschwede) 2016-10-10 10:22:15 UTC

Description of problem:

The partition power of the deployed Swift rings both on the undercloud and overcloud are set to 18, which is way too high. It will create serious problems later on small deployments, especially with replication.

Using a partition power of 18 creates 262.144 partitions in the cluster. If these are spread only across 3 disks (for example only the controller nodes), each partition will be replicated individually - and this takes a lot of time. Additionally, a lot of extra inodes will be created, and depending on the usage one might suffer from a inode cache misses, slowing down the whole node.

Version-Release number of selected component (if applicable):

All OSP releases that use TripleO.

How reproducible:

Always

Steps to Reproduce:
1. Deploy using OOO/director.
2. Check partition power on under- and overcloud using "swift-ring-builder /etc/swift/object.builder"

Actual results:
Partition power of 18. This is the default in puppet-swift if nothing else is defined.

Expected results:
Way lower partition power, for example 10 for small deployments with only a few disks.

Additional info:
Please note that it is not possible to lower the partition power once the cluster has been deployed. Increasing the partition power later on is proposed as a patch upstream, but not yet merged; therefore starting with a lower value is much safer than using a high value that will create trouble later.

There is already a partition power of 10 defined in tripleo-heat-templates, but this is not used. Looking at /etc/puppet/hieradata/puppet-stack-config.yaml on the undercloud I see this:

tripleo::ringbuilder::part_power: 10
tripleo::ringbuilder::replicas: 3
tripleo::ringbuilder::min_part_hours: 1
swift_mount_check: false
swift::ringbuilder::replicas: 1

I think the first line should be "swift::ringbuilder::part_power: 10"? Also, replicas is defined twice, and the one with "3" is unused (which makes sense, because there is only a single disk on the undercloud).

On the overcloud I see this in /etc/puppet/hieradata/service_configs.yaml:

swift::ringbuilder::part_power: 10
tripleo::profile::base::swift::ringbuilder::replicas: 3

I think this should be "tripleo::profile::base::swift::ringbuilder::part_power: 10?

Using a partition power of 10 creates 2^10 = 1024 partitions. Each partition is replicated 3 times, therefore a total of 3072 partitions will be spread across all disks. It is recommended to use at least ~ 100 partitions per disk; 3072 partitions is enough for up to ~ 30 disks (even more disks is usable, but data gets less evenly distributed then). Bigger deployments should use a higher value depending on the number of disks on initial deployment as well as expected growth.

Comment 1 Paul Grist 2016-10-10 15:50:06 UTC

Setting target release, this is a must fix for OSP10

Comment 2 Pete Zaitcev 2016-10-11 19:45:41 UTC

I'm not sure if it actually would've helped Alex K. case to bump the
ring offset down to 10. It was more about concurrency and unnecessary
writes, I thought.

Comment 3 Christian Schwede (cschwede) 2016-10-12 06:07:35 UTC

Pete: actually it helped quite a bit, because replication was way faster with a lower partpower. Each partition starts it own replication pass, therefore one needs much more time for finishing a full replication run, adding quite a bit of I/O load to the already busy disks.

Comment 4 Christian Schwede (cschwede) 2016-10-14 13:43:32 UTC

Moving this to POST; proposed patches landed on master and stable/newton.

Comment 12 errata-xmlrpc 2016-12-14 16:15:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html

Note You need to log in before you can comment on or make changes to this bug.