Bug 1310865
Summary: | Director might break Swift cluster when replacing / adding new nodes | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Christian Schwede (cschwede) <cschwede> | |
Component: | openstack-tripleo-heat-templates | Assignee: | Christian Schwede (cschwede) <cschwede> | |
Status: | CLOSED ERRATA | QA Contact: | Mike Abrams <mabrams> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | unspecified | CC: | augol, cschwede, dbecker, ddomingo, dmaley, egafford, felipe.alfaro, gfidente, jcoufal, jliberma, jraju, jschluet, markmc, mburns, mcornea, morazi, pgrist, rhel-osp-director-maint, scohen, thiago, zaitcev | |
Target Milestone: | beta | Keywords: | Triaged, ZStream | |
Target Release: | 11.0 (Ocata) | |||
Hardware: | Unspecified | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | openstack-tripleo-heat-templates-6.0.0-0.20170218023452.edbaaa9.el7ost | Doc Type: | Bug Fix | |
Doc Text: |
Cause: Swift rings became inconsistent when new storage or controller nodes were added or existing ones replaced.
Consequence: Unavailability of data and increased and infinite replication between storage nodes, leading to higher load and network traffic.
Fix: A new process stores a copy of them on the undercloud after each deployment, and retrieves them before any new deployment or update to ensure consistency across all nodes. This removes the need to manually maintaining and copying them across nodes.
Result: Simplified deployment of new or replaced nodes using Swift storage.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1321088 (view as bug list) | Environment: | ||
Last Closed: | 2017-05-17 19:27:09 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1300189, 1319901, 1321088, 1385483 |
Description
Christian Schwede (cschwede)
2016-02-22 20:38:57 UTC
Possible workarounds for this: 1. Disable ring building on the nodes, pls see linked patch review 2. Use a customized template and copy the .builder files from another node before puppet runs @cschwede, do you know if this patch will be backported to OSP 7? @Dan: No, I don't think this will be backported to OSP7. However, it was backported upstream to Liberty (https://review.openstack.org/#/c/295426/), and is included in OSP8 (just checked the last puddle; it's included in openstack-tripleo-heat-templates/0.8.14-1.el7ost). Today Giulio and me discussed the next steps to improve Swift support in Director. The idea to solve the issue described in this BZ is to use the ringsync mechanism already provided by puppet-swift. The ring will be managed on one node, and other nodes will fetch the .ring.gz from that node. https://github.com/openstack/puppet-swift/blob/master/manifests/ringsync.pp https://github.com/openstack/puppet-swift/blob/master/manifests/ringserver.pp It is important that is done on a node with the oldest ring files (including the whole history); for example the first node that was deployed. The managing node will also need information about the IPs and devices on all nodes. There are a few more RFE that will be worked on in the future. These are (ordered by prio): https://bugzilla.redhat.com/show_bug.cgi?id=1276691 multi disks on swift node There is already workaround: https://mojo.redhat.com/community/consulting-customer-training/services-innovation-and-incubation/technical-advanced-content/blog/2015/11/02/director-multiple-disks-for-swift-nodes https://bugzilla.redhat.com/show_bug.cgi?id=1303093 Add ability to disable Swift from overcloud deployment https://bugzilla.redhat.com/show_bug.cgi?id=1303093 Permit usage of unmanaged Swift clusters The ideas in the last two RFEs were used for a customer recently. https://bugzilla.redhat.com/show_bug.cgi?id=1320185 Allow for customization of the swift nodes disk topology This would make it possible to deploy a cluster with a more customized setup without manually managing the Swift rings. For example: - different number of disks per node - SSDs for account/containers - different regions and zones based on the datacenter layout. There is a wrong BZ reference (Thx Thiago!). Correct one: https://bugzilla.redhat.com/show_bug.cgi?id=1320209 Permit usage of unmanaged Swift clusters This should be probably fixed by the patch from comment #5 + documentation. Future work should be a separate bugzilla. Correct? This bug did not make the OSP 8.0 release. It is being deferred to OSP 10. It seems that the workaround from upstream (disabling Ring management) is included in openstack-tripleo-heat-templates-0.8.14-7.el7ost.noarch.rpm (from the GA release puddle)? This doesn't fix the bug itself, but at least there is a known workaround for it. I see we have the osp7 fix for this ON_QA (bug 1321088) however I don't see osp8 or osp9 clones. Are they needed? nm I see openstack-tripleo-heat-templates-0.8.14-9.el7ost is available in the channel, and according to comment 12 this would include the needed changes. Added a link to an upstream patch that actually fixes this issue. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1245 |