Description of problem:
The partition power of the deployed Swift rings both on the undercloud and overcloud are set to 18, which is way too high. It will create serious problems later on small deployments, especially with replication.
Using a partition power of 18 creates 262.144 partitions in the cluster. If these are spread only across 3 disks (for example only the controller nodes), each partition will be replicated individually - and this takes a lot of time. Additionally, a lot of extra inodes will be created, and depending on the usage one might suffer from a inode cache misses, slowing down the whole node.
Version-Release number of selected component (if applicable):
All OSP releases that use TripleO.
Steps to Reproduce:
1. Deploy using OOO/director.
2. Check partition power on under- and overcloud using "swift-ring-builder /etc/swift/object.builder"
Partition power of 18. This is the default in puppet-swift if nothing else is defined.
Way lower partition power, for example 10 for small deployments with only a few disks.
Please note that it is not possible to lower the partition power once the cluster has been deployed. Increasing the partition power later on is proposed as a patch upstream, but not yet merged; therefore starting with a lower value is much safer than using a high value that will create trouble later.
There is already a partition power of 10 defined in tripleo-heat-templates, but this is not used. Looking at /etc/puppet/hieradata/puppet-stack-config.yaml on the undercloud I see this:
I think the first line should be "swift::ringbuilder::part_power: 10"? Also, replicas is defined twice, and the one with "3" is unused (which makes sense, because there is only a single disk on the undercloud).
On the overcloud I see this in /etc/puppet/hieradata/service_configs.yaml:
I think this should be "tripleo::profile::base::swift::ringbuilder::part_power: 10?
Using a partition power of 10 creates 2^10 = 1024 partitions. Each partition is replicated 3 times, therefore a total of 3072 partitions will be spread across all disks. It is recommended to use at least ~ 100 partitions per disk; 3072 partitions is enough for up to ~ 30 disks (even more disks is usable, but data gets less evenly distributed then). Bigger deployments should use a higher value depending on the number of disks on initial deployment as well as expected growth.
Setting target release, this is a must fix for OSP10
I'm not sure if it actually would've helped Alex K. case to bump the
ring offset down to 10. It was more about concurrency and unnecessary
writes, I thought.
Pete: actually it helped quite a bit, because replication was way faster with a lower partpower. Each partition starts it own replication pass, therefore one needs much more time for finishing a full replication run, adding quite a bit of I/O load to the already busy disks.
Moving this to POST; proposed patches landed on master and stable/newton.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.