Bug 1314429

Summary: New cases of not ignoring updates to OS::Nova::Server causing redeployment
Product: Red Hat OpenStack Reporter: Jiri Stransky <jstransk>
Component: openstack-tripleo-commonAssignee: Steve Baker <sbaker>
Status: CLOSED CURRENTRELEASE QA Contact: Alexander Chuzhoy <sasha>
Severity: unspecified Docs Contact:
Priority: high    
Version: 8.0 (Liberty)CC: dmacpher, dmatthew, jcoufal, jschluet, kbasil, mandreou, mburns, rhel-osp-director-maint, sasha, sbaker, shardy, slinaber, srevivo, zbitter
Target Milestone: gaKeywords: TestOnly
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-0.3.0-3.el7ost Doc Type: Bug Fix
Doc Text:
The behavior of the OS::Nova::Server resource is to replace servers whenever certain documented properties change during a stack update. This could cause possible unintentional replacement of Overcloud nodes if properties change during a major upgrade. This fix makes the Undercloud's Heat service never replace server resources when properties change.
Story Points: ---
Clone Of: 1303094 Environment:
Last Closed: 2016-04-18 16:37:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1303094    
Bug Blocks:    

Description Jiri Stransky 2016-03-03 15:06:24 UTC
+++ This bug was initially created as a clone of Bug #1303094 +++

(cut out the long clone text, please look at the original bug if needed)

There are new cases of property changes causing unwanted node replacement on upgrade Liberty.

https://review.openstack.org/#/c/286739/ -- this patch could cause node replacement but probably doesn't unless the HostnameMap parameter is utilized.

https://review.openstack.org/#/c/266930/ -- this patch is probably the cause for node replacement happening.


Solution suggestion:

There is already a fix for bug #1303094 to prevent node replacement because of user_data. Expanding that fix to prevent node replacement when changing any OS::Nova::Server properties would probably be desirable for TripleO. Node replacement because of properties changing could potentially result in data loss, so it would be good to have it completely prevented by default, and only enable it subsequently if we discover that we need it for some new feature, and when we're ready to deal with node replacement correctly.

Comment 3 Steve Baker 2016-03-03 20:19:53 UTC
(In reply to Jiri Stransky from comment #0)
> +++ This bug was initially created as a clone of Bug #1303094 +++
> 
> (cut out the long clone text, please look at the original bug if needed)
> 
> There are new cases of property changes causing unwanted node replacement on
> upgrade Liberty.
> 
> https://review.openstack.org/#/c/286739/ -- this patch could cause node
> replacement but probably doesn't unless the HostnameMap parameter is
> utilized.

Changing the name property does not cause replacement [1] but there is a nova call made to perform the rename. 

> https://review.openstack.org/#/c/266930/ -- this patch is probably the cause
> for node replacement happening.

I would argue that upstream heat should ignore changes to scheduler_hints - replacing a server for changing a boot time hint seems extreme.

> Solution suggestion:
> 
> There is already a fix for bug #1303094 to prevent node replacement because
> of user_data. Expanding that fix to prevent node replacement when changing
> any OS::Nova::Server properties would probably be desirable for TripleO.
> Node replacement because of properties changing could potentially result in
> data loss, so it would be good to have it completely prevented by default,
> and only enable it subsequently if we discover that we need it for some new
> feature, and when we're ready to deal with node replacement correctly.

Yes, we could continue to play whack-a-mole each time upgrade testing shows another property causing server replacement, but just preventing replacements altogether in our custom server resource seems like a good approach - and this would be a temporary measure until Mitaka when heat update restrictions can be used.

[1] http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Nova::Server-prop-name

Comment 4 Zane Bitter 2016-03-03 22:40:15 UTC
(In reply to Steve Baker from comment #3)
> I would argue that upstream heat should ignore changes to scheduler_hints -
> replacing a server for changing a boot time hint seems extreme.

That thought occurred to me too, but I'm not sure I agree. Say you have a bunch of servers deployed in random places and you want to start ensuring that e.g. some of them are co-located, how would you force that to happen? (Possible answer: rebuild rather than replace the servers if possible.) And if you didn't want that to happen, why would you have changed the scheduler hint?

Comment 5 Marios Andreou 2016-03-04 07:27:16 UTC
sbaker fixup @ https://review.openstack.org/#/c/288273/ "Prevent any property change from replacing OS::Nova::Server"

Comment 6 Steve Baker 2016-03-14 20:46:01 UTC
Upstream stable/liberty change has landed, so this will make it into 8.0 on the next tripleo-common rebase (or backport, I don't know what process is being followed here)

https://review.openstack.org/#/c/291771/

Comment 13 Alexander Chuzhoy 2016-04-14 15:55:38 UTC
Verified:

Environment:
openstack-tripleo-common-0.3.1-1.el7ost.noarch


Re-ran the deployment command with 
Add a yaml with " NovaComputeSchedulerHints: {"some": "json"} ".
Got:
Stack overcloud UPDATE_COMPLETE

The nova uuid are the same as before.