Red Hat Bugzilla – Bug 1300189
[RFE] Replace OpenStack node deployed by Director - Documentation
Last modified: 2016-09-21 11:19:00 EDT
Description of problem:
The Red Hat OpenStack Director documentation currently misses documentation on how to replace the following node roles: controller, block storage, object storage, ceph storage. Currently, the only node role that is covered is Compute.
Manual steps for replacing controller node have been documented(BZ 1258068), but this is a workaround to the fact that the product is not able to re-deploy using just the tripleO suit (heat+puppet) as it should by design
Version-Release number of selected component (if applicable):
not available, manual steps for replacing controller nodes available
Steps to Reproduce:
openstack nodes can't be replaced using director and templates (only compute can)
openstack nodes can be replaced using director and templates
There should be documentation on how to replace a OpenStack node at least manually. Moving this to documentation for OSP8, we will look into support from director in the future, tracked by different bz.
Derek, can you add this to the list of your team's backlog for OSP8 GA? Thanks, Jarda
So the first step in this is restructuring the scaling section so that we can add additional replacement scenarios. You can see the restructure here:
The next step is to add the missing node replacement scenarios. For far, we've only got:
* Compute Nodes: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Scaling_the_Overcloud.html#sect-Replacing_Compute_Nodes
* Controller Nodes: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Scaling_the_Overcloud.html#Replacing_Controller_Nodes
So we'll need to include documentation on replacing the storage node types (Cinder, Swift, and Ceph)
We have a request to add documentation not only in terms of *replacing* Ceph and Swift node type scenarios but also for *adding* Swift and controller nodes.
In case any of this is not supported, it would be nice at least to have a note about it.
Do we have any updates?
I can't believe that an Enterprise product doesn't have a procedure (or documentation) on how to replace nodes. What shall we do when one of our Ceph nodes fails and needs to get replaced? You could engage with the InkTank developers as I'm sure they know this for sure. The same holds true for Swift.
(In reply to Felipe Alfaro Solana from comment #9)
> I can't believe that an Enterprise product doesn't have a procedure (or
> documentation) on how to replace nodes. What shall we do when one of our
> Ceph nodes fails and needs to get replaced? You could engage with the
> InkTank developers as I'm sure they know this for sure. The same holds true
> for Swift.
Felipe, already commented on your case may be we are missing some detail but https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html-single/Director_Installation_and_Usage/index.html#sect-Scaling_the_Overcloud
is pretty clear what scenarios are supported currently (others are on roadmaps by different RFEs
Now want to be sure you have any documentation for doing manually? if not we can work on this meanwhile the RFE is completed for the Director integration.
- Replacing a Ceph node
There is an RFE raised for safely scaling down osd /ceph nodes from director .
For now , we need to perform the osd removal task manually , taking into consideration of rebalancing data .
Remove ceph Nodes:
i. Removing OSDS : Perform all steps in 'CHAPTER 15. REMOVING OSDS (MANUAL)'
ii. Remove node from director .
Add Ceph Nodes:
i. Add node to ironic .
ii. Run deploy from director after increment the ceph scale by the number of nodes added that are planned to be used for ceph.
Let me know for any specific questions as I want to be sure about the missing elements here, specially as your comment is not being exactly friendly for the product/documentation team.
Have tested and verified Pablo's process in comment #10. Am currently writing documentation for it now.
So we've hit a blocker for this documentation. There's an issue with adding and replacing new Swift nodes, specifically building a ring that's the same on all nodes. If you add a new Swift node to the cluster, it has no previous information about the current nodes in ring, so the director creates new ring files for that node.
The workaround is to disable automatic ring building and build the ring after deploying the nodes. There's a patch for the Heat template collection in BZ#1310865 that allows you to disable ring building. Only problem is this patch seems to be targeted for OSP 8 and I'm not clear on when this patch will make it to OSP 7.
I'm going to hunt down an alternative method of replacing Swift nodes until the patch gets merged with OSP 7.
In This link we need to add one more line :
For OSD removal steps , "perform same steps to all the osds in this node" as we are talking here replacing a whole Ceph node and this node could have multiple OSDs.
As far as I know, the nodes deployed with the director use only one OSD per node.
No, We can have multiple OSDs per node.
Vikhyat, you're absolutely right. My apologies! I originally thought it was one OSD per node, but I realise now it's one OSD per disk mapping. Will make a modification to the docs.
NP Dan! thank you for working on this doc bz.
What did you want to do about this BZ? Apparently we were waiting for a patch backport for OSP 7, but this doesn't seem to have happened AFAIK.
The customer portal case is closed. Are there any further actions required for this BZ?
Thanks, Eduardo. Closing.