Bug 1300189 - [RFE] Replace OpenStack node deployed by Director - Documentation
[RFE] Replace OpenStack node deployed by Director - Documentation
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation (Show other bugs)
7.0 (Kilo)
Unspecified Unspecified
high Severity medium
: ga
: 8.0 (Liberty)
Assigned To: Dan Macpherson
RHOS Documentation Team
: Documentation, FutureFeature
Depends On: 1310865 1321088
  Show dependency treegraph
Reported: 2016-01-20 03:21 EST by Ondrej
Modified: 2016-09-21 11:19 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-09-21 11:19:00 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Ondrej 2016-01-20 03:21:05 EST
Description of problem:

The Red Hat OpenStack Director documentation currently misses documentation on how to replace the following node roles: controller, block storage, object storage, ceph storage. Currently, the only node role that is covered is Compute.

Manual steps for replacing controller node have been documented(BZ 1258068), but this is a workaround to the fact that the product is not able to re-deploy using just the tripleO suit (heat+puppet) as it should by design

Version-Release number of selected component (if applicable):

How reproducible:
not available, manual steps for replacing controller nodes available

Steps to Reproduce:

Actual results:
openstack nodes can't be replaced using director and templates (only compute can)

Expected results:
openstack nodes can be replaced using director and templates

Additional info:
Comment 5 Jaromir Coufal 2016-02-29 10:37:42 EST
There should be documentation on how to replace a OpenStack node at least manually. Moving this to documentation for OSP8, we will look into support from director in the future, tracked by different bz.
Comment 6 Jaromir Coufal 2016-02-29 10:39:23 EST
Derek, can you add this to the list of your team's backlog for OSP8 GA? Thanks, Jarda
Comment 7 Dan Macpherson 2016-03-03 16:17:01 EST
So the first step in this is restructuring the scaling section so that we can add additional replacement scenarios. You can see the restructure here:


The next step is to add the missing node replacement scenarios. For far, we've only got:

* Compute Nodes: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Scaling_the_Overcloud.html#sect-Replacing_Compute_Nodes

* Controller Nodes: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Scaling_the_Overcloud.html#Replacing_Controller_Nodes

So we'll need to include documentation on replacing the storage node types (Cinder, Swift, and Ceph)
Comment 8 Irina Petrova 2016-03-16 11:52:56 EDT
Hi guys,

We have a request to add documentation not only in terms of *replacing* Ceph and Swift node type scenarios but also for *adding* Swift and controller nodes. 

In case any of this is not supported, it would be nice at least to have a note about it.

Do we have any updates?

Comment 9 Felipe Alfaro Solana 2016-03-18 08:27:18 EDT
I can't believe that an Enterprise product doesn't have a procedure (or documentation) on how to replace nodes. What shall we do when one of our Ceph nodes fails and needs to get replaced? You could engage with the InkTank developers as I'm sure they know this for sure. The same holds true for Swift.
Comment 10 Pablo Caruana 2016-03-18 13:43:58 EDT
(In reply to Felipe Alfaro Solana from comment #9)
> I can't believe that an Enterprise product doesn't have a procedure (or
> documentation) on how to replace nodes. What shall we do when one of our
> Ceph nodes fails and needs to get replaced? You could engage with the
> InkTank developers as I'm sure they know this for sure. The same holds true
> for Swift.

Felipe, already commented on your case may be we are missing some detail but https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html-single/Director_Installation_and_Usage/index.html#sect-Scaling_the_Overcloud 
is pretty clear what scenarios are supported currently (others are on roadmaps by different RFEs

Now want to be sure you have any documentation for doing manually? if not we can work on this meanwhile the RFE is completed for the Director integration.

- Replacing a Ceph node

There is an RFE raised for safely scaling down osd /ceph nodes from director .
For now , we need to perform the osd removal task manually , taking into consideration of rebalancing data .
Remove ceph Nodes:
i. Removing OSDS : Perform all steps in 'CHAPTER 15. REMOVING OSDS (MANUAL)'
ii. Remove node from director .
Add Ceph Nodes:
i. Add node to ironic .
ii. Run deploy from director after increment the ceph scale by the number of nodes added that are planned to be used for ceph.

Let me know for any specific questions as I want to be sure about the missing elements here, specially as your comment is not being exactly friendly for the product/documentation team.

Comment 11 Dan Macpherson 2016-03-21 10:56:45 EDT
Have tested and verified Pablo's process in comment #10. Am currently writing documentation for it now.
Comment 12 Dan Macpherson 2016-03-21 22:12:32 EDT
So we've hit a blocker for this documentation. There's an issue with adding and replacing new Swift nodes, specifically building a ring that's the same on all nodes. If you add a new Swift node to the cluster, it has no previous information about the current nodes in ring, so the director creates new ring files for that node.

The workaround is to disable automatic ring building and build the ring after deploying the nodes. There's a patch for the Heat template collection in BZ#1310865 that allows you to disable ring building. Only problem is this patch seems to be targeted for OSP 8 and I'm not clear on when this patch will make it to OSP 7.

I'm going to hunt down an alternative method of replacing Swift nodes until the patch gets merged with OSP 7.
Comment 13 Vikhyat Umrao 2016-04-06 06:12:25 EDT
Hello Dan,

In This link we need to add one more line :


For OSD removal steps , "perform same steps to all the osds in this node" as we are talking here replacing a whole Ceph node and this node could have multiple OSDs.

Comment 14 Dan Macpherson 2016-04-10 22:06:05 EDT
Hi Vikhyat,

As far as I know, the nodes deployed with the director use only one OSD per node.

- Dan
Comment 15 Vikhyat Umrao 2016-04-11 01:05:19 EDT
Hello Dan,

No, We can have multiple OSDs per node.

-- vikhyat
Comment 16 Dan Macpherson 2016-04-11 06:29:04 EDT
Vikhyat, you're absolutely right. My apologies! I originally thought it was one OSD per node, but I realise now it's one OSD per disk mapping. Will make a modification to the docs.
Comment 17 Vikhyat Umrao 2016-04-11 09:03:51 EDT
NP Dan! thank you for working on this doc bz.
Comment 19 Dan Macpherson 2016-09-21 11:06:58 EDT
Hi Ondrej,

What did you want to do about this BZ? Apparently we were waiting for a patch backport for OSP 7, but this doesn't seem to have happened AFAIK.

The customer portal case is closed. Are there any further actions required for this BZ?

- Dan
Comment 22 Dan Macpherson 2016-09-21 11:19:00 EDT
Thanks, Eduardo. Closing.

Note You need to log in before you can comment on or make changes to this bug.