Bug 1385483 - Tripleo::Profile::Base::Swift::Ringbuilder fails on rebalance if redeploying within 24 hours
Summary: Tripleo::Profile::Base::Swift::Ringbuilder fails on rebalance if redeploying ...
Keywords:
Status: CLOSED DUPLICATE of bug 1437499
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Christian Schwede (cschwede)
QA Contact: Mike Abrams
URL:
Whiteboard:
Depends On: 1310865
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-17 07:41 UTC by Dan Macpherson
Modified: 2017-04-10 13:30 UTC (History)
7 users (show)

Fixed In Version: puppet-swift-10.3.0-2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-10 13:30:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1437499 0 urgent CLOSED TripleO/Director deployment fails if Swift rings rebalanced within within min_part_hours 2021-02-22 00:41:40 UTC

Description Dan Macpherson 2016-10-17 07:41:25 UTC
Running "openstack overcloud deploy" within 24 hours of last run fails at Controller Step 3. This is because Swift needs to wait 24 hours before it can rebalance.

The Puppet error log from Step 3 on a controller node:

[root@overcloud-controller-0 deployed]# cat 2016-10-17-07-21-39Z-cf8c55e0-90ba-46ad-b601-4641bf17d04f-stderr.log
Warning: Scope(Class[Cinder::Api]): keystone_enabled is deprecated, use auth_strategy instead.
Warning: Scope(Class[Keystone]): Fernet token is recommended in Mitaka release. The default for token_provider will be changed to 'fernet' in O release.
Warning: Scope(Class[Keystone]): admin_password is required, please set admin_password to a value != admin_token. admin_token will be removed in a later release
Warning: Scope(Class[Keystone::Roles::Admin]): the main class is setting the admin password differently from this\
      class when calling bootstrap. This will lead to the password\
      flip-flopping and cause authentication issues for the admin user.\
      Please ensure that keystone::roles::admin::password and\
      keystone::admin_password are set the same.
Warning: Scope(Class[Heat]): keystone_user_domain_id is deprecated, use the name option instead.
Warning: Scope(Class[Heat]): keystone_project_domain_id is deprecated, use the name option instead.
Warning: Scope(Class[Nova]): Could not look up qualified variable '::nova::scheduler::filter::cpu_allocation_ratio'; class ::nova::scheduler::filter has not been evaluated
Warning: Scope(Class[Nova]): Could not look up qualified variable '::nova::scheduler::filter::ram_allocation_ratio'; class ::nova::scheduler::filter has not been evaluated
Warning: Scope(Class[Nova]): Could not look up qualified variable '::nova::scheduler::filter::disk_allocation_ratio'; class ::nova::scheduler::filter has not been evaluated
Warning: Scope(Class[Mongodb::Server]): Replset specified, but no replset_members or replset_config provided.
Warning: Scope(Class[Ceilometer]): Both $metering_secret and $telemetry_secret defined, using $telemetry_secret
Warning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.
Error: /Stage[main]/Tripleo::Profile::Base::Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]: Failed to call refresh: swift-ring-builder /etc/swift/account.builder rebalance 999 returned 1 instead of one of [0]
Error: /Stage[main]/Tripleo::Profile::Base::Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[account]/Exec[rebalance_account]: swift-ring-builder /etc/swift/account.builder rebalance 999 returned 1 instead of one of [0]
Error: /Stage[main]/Tripleo::Profile::Base::Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[container]/Exec[rebalance_container]: Failed to call refresh: swift-ring-builder /etc/swift/container.builder rebalance 999 returned 1 instead of one of [0]
Error: /Stage[main]/Tripleo::Profile::Base::Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[container]/Exec[rebalance_container]: swift-ring-builder /etc/swift/container.builder rebalance 999 returned 1 instead of one of [0]
Error: /Stage[main]/Tripleo::Profile::Base::Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]: Failed to call refresh: swift-ring-builder /etc/swift/object.builder rebalance 999 returned 1 instead of one of [0]
Error: /Stage[main]/Tripleo::Profile::Base::Swift::Ringbuilder/Swift::Ringbuilder::Rebalance[object]/Exec[rebalance_object]: swift-ring-builder /etc/swift/object.builder rebalance 999 returned 1 instead of one of [0]

Running the rebalance command manually causes the following:

[root@overcloud-controller-0 deployed]# swift-ring-builder /etc/swift/account.builder rebalance 999
No partitions could be reassigned.
The time between rebalances must be at least min_part_hours: 24 hours (20:59:53 remaining)

However, you can use the -f option to force a rebalance:

[root@overcloud-controller-0 deployed]# swift-ring-builder /etc/swift/account.builder rebalance 999 -f
Reassigned 0 (0.00%) partitions. Balance is now 100.00.  Dispersion is now 0.00
-------------------------------------------------------------------------------
NOTE: Balance of 100.00 indicates you should push this 
      ring, wait at least 24 hours, and rebalance/repush.
-------------------------------------------------------------------------------

Otherwise, it might be an idea to add a check to see if a rebalance is possible.

This applies to OSP10 using puppet-tripleo 5.2.0-1.el7ost

Comment 1 Christian Schwede (cschwede) 2017-02-08 09:29:19 UTC
The min_part_hours is set to 1 hour in Newton by default. It can be set to a different value using the SwiftMinPartHours.

There is one caveat with this approach: if the input ring is slightly different (because each node built it's own ring and rebalances at a slightly different time) some nodes might rebalance while others are not.

To fully fix this, we need to fix bz#1310865 first, and then do some kind of a  preliminary check (because there might be still nodes that rebalance at a different time). Failed rebalances should only happen on real errors; otherwise it should be simply skipped.

Comment 2 Red Hat Bugzilla Rules Engine 2017-02-08 09:29:31 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 3 Christian Schwede (cschwede) 2017-04-10 13:30:39 UTC
Closing this bug; there is another one which is ON_QA and is basically the same. Turns out that warnings (but not failures) during rebalance aborted the deployment.

*** This bug has been marked as a duplicate of bug 1437499 ***


Note You need to log in before you can comment on or make changes to this bug.