Bug 1300189

Summary: [RFE] Replace OpenStack node deployed by Director - Documentation
Product: Red Hat OpenStack Reporter: Ondrej <ochalups>
Component: documentationAssignee: Dan Macpherson <dmacpher>
Status: CLOSED CURRENTRELEASE QA Contact: RHOS Documentation Team <rhos-docs>
Severity: medium Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: dcadzow, dmacpher, ealcaniz, felipe.alfaro, hbrock, ipetrova, jcoufal, mburns, mcornea, ochalups, pcaruana, rhel-osp-director-maint, srevivo, vumrao
Target Milestone: gaKeywords: Documentation, FutureFeature
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-21 15:19:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1310865, 1321088    
Bug Blocks:    

Description Ondrej 2016-01-20 08:21:05 UTC
Description of problem:

The Red Hat OpenStack Director documentation currently misses documentation on how to replace the following node roles: controller, block storage, object storage, ceph storage. Currently, the only node role that is covered is Compute.

Manual steps for replacing controller node have been documented(BZ 1258068), but this is a workaround to the fact that the product is not able to re-deploy using just the tripleO suit (heat+puppet) as it should by design


Version-Release number of selected component (if applicable):


How reproducible:
not available, manual steps for replacing controller nodes available
(https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/Replacing_Controller_Nodes.html)

Steps to Reproduce:
1.
2.
3.

Actual results:
openstack nodes can't be replaced using director and templates (only compute can)

Expected results:
openstack nodes can be replaced using director and templates

Additional info:

Comment 5 Jaromir Coufal 2016-02-29 15:37:42 UTC
There should be documentation on how to replace a OpenStack node at least manually. Moving this to documentation for OSP8, we will look into support from director in the future, tracked by different bz.

Comment 6 Jaromir Coufal 2016-02-29 15:39:23 UTC
Derek, can you add this to the list of your team's backlog for OSP8 GA? Thanks, Jarda

Comment 7 Dan Macpherson 2016-03-03 21:17:01 UTC
So the first step in this is restructuring the scaling section so that we can add additional replacement scenarios. You can see the restructure here:

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Scaling_the_Overcloud.html

The next step is to add the missing node replacement scenarios. For far, we've only got:

* Compute Nodes: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Scaling_the_Overcloud.html#sect-Replacing_Compute_Nodes

* Controller Nodes: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Scaling_the_Overcloud.html#Replacing_Controller_Nodes

So we'll need to include documentation on replacing the storage node types (Cinder, Swift, and Ceph)

Comment 8 Irina Petrova 2016-03-16 15:52:56 UTC
Hi guys,

We have a request to add documentation not only in terms of *replacing* Ceph and Swift node type scenarios but also for *adding* Swift and controller nodes. 

In case any of this is not supported, it would be nice at least to have a note about it.

Do we have any updates?

Best,
Irina

Comment 9 Felipe Alfaro Solana 2016-03-18 12:27:18 UTC
I can't believe that an Enterprise product doesn't have a procedure (or documentation) on how to replace nodes. What shall we do when one of our Ceph nodes fails and needs to get replaced? You could engage with the InkTank developers as I'm sure they know this for sure. The same holds true for Swift.

Comment 10 Pablo Caruana 2016-03-18 17:43:58 UTC
(In reply to Felipe Alfaro Solana from comment #9)
> I can't believe that an Enterprise product doesn't have a procedure (or
> documentation) on how to replace nodes. What shall we do when one of our
> Ceph nodes fails and needs to get replaced? You could engage with the
> InkTank developers as I'm sure they know this for sure. The same holds true
> for Swift.

Felipe, already commented on your case may be we are missing some detail but https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html-single/Director_Installation_and_Usage/index.html#sect-Scaling_the_Overcloud 
is pretty clear what scenarios are supported currently (others are on roadmaps by different RFEs



Now want to be sure you have any documentation for doing manually? if not we can work on this meanwhile the RFE is completed for the Director integration.

- Replacing a Ceph node

There is an RFE raised for safely scaling down osd /ceph nodes from director .
For now , we need to perform the osd removal task manually , taking into consideration of rebalancing data .
Remove ceph Nodes:
i. Removing OSDS : Perform all steps in 'CHAPTER 15. REMOVING OSDS (MANUAL)'
https://access.redhat.com/documentation/en/red-hat-ceph-storage/version-1.3/red-hat-ceph-storage-13-red-hat-ceph-administration-guide/
ii. Remove node from director .
Add Ceph Nodes:
i. Add node to ironic .
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html-single/Director_Installation_and_Usage/index.html#sect-Scaling_the_Overcloud
ii. Run deploy from director after increment the ceph scale by the number of nodes added that are planned to be used for ceph.

Let me know for any specific questions as I want to be sure about the missing elements here, specially as your comment is not being exactly friendly for the product/documentation team.

Regards,
Pablo.

Comment 11 Dan Macpherson 2016-03-21 14:56:45 UTC
Have tested and verified Pablo's process in comment #10. Am currently writing documentation for it now.

Comment 12 Dan Macpherson 2016-03-22 02:12:32 UTC
So we've hit a blocker for this documentation. There's an issue with adding and replacing new Swift nodes, specifically building a ring that's the same on all nodes. If you add a new Swift node to the cluster, it has no previous information about the current nodes in ring, so the director creates new ring files for that node.

The workaround is to disable automatic ring building and build the ring after deploying the nodes. There's a patch for the Heat template collection in BZ#1310865 that allows you to disable ring building. Only problem is this patch seems to be targeted for OSP 8 and I'm not clear on when this patch will make it to OSP 7.

I'm going to hunt down an alternative method of replacing Swift nodes until the patch gets merged with OSP 7.

Comment 13 Vikhyat Umrao 2016-04-06 10:12:25 UTC
Hello Dan,

In This link we need to add one more line :

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html-single/Director_Installation_and_Usage/index.html#Replacing_Ceph_Storage_Nodes

For OSD removal steps , "perform same steps to all the osds in this node" as we are talking here replacing a whole Ceph node and this node could have multiple OSDs.

Regards,
Vikhyat

Comment 14 Dan Macpherson 2016-04-11 02:06:05 UTC
Hi Vikhyat,

As far as I know, the nodes deployed with the director use only one OSD per node.

- Dan

Comment 15 Vikhyat Umrao 2016-04-11 05:05:19 UTC
Hello Dan,

No, We can have multiple OSDs per node.

-- vikhyat

Comment 16 Dan Macpherson 2016-04-11 10:29:04 UTC
Vikhyat, you're absolutely right. My apologies! I originally thought it was one OSD per node, but I realise now it's one OSD per disk mapping. Will make a modification to the docs.

Comment 17 Vikhyat Umrao 2016-04-11 13:03:51 UTC
NP Dan! thank you for working on this doc bz.

Comment 19 Dan Macpherson 2016-09-21 15:06:58 UTC
Hi Ondrej,

What did you want to do about this BZ? Apparently we were waiting for a patch backport for OSP 7, but this doesn't seem to have happened AFAIK.

The customer portal case is closed. Are there any further actions required for this BZ?

- Dan

Comment 22 Dan Macpherson 2016-09-21 15:19:00 UTC
Thanks, Eduardo. Closing.