Bug 1379007

Summary: OSP 8 to 9 upgrade with Director - failures with step 3.4.2. Installing Aodh
Product: Red Hat OpenStack Reporter: Matt Flusche <mflusche>
Component: rhosp-directorAssignee: Sofer Athlan-Guyot <sathlang>
Status: CLOSED NOTABUG QA Contact: Omri Hochman <ohochman>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 9.0 (Mitaka)CC: augol, dbecker, emacchi, mburns, mflusche, morazi, rhel-osp-director-maint, tvignaud
Target Milestone: ---Keywords: Triaged
Target Release: 9.0 (Mitaka)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-13 20:40:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Matt Flusche 2016-09-23 22:14:42 UTC
Description of problem:

During an OSP 8 to OSP 9 upgrade (w/ Director) if the heat stack update in step "3.4.2. Installing Aodh" (Doc: Upgrading Red Hat OpenStack Platform ) fails or hangs several issues could occur.

- Can't list nested stacks from Director

$ source ~stack/stackrc
$ heat stack-list -n
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
ERROR: could not convert string to float: 

This issue is caused by a mismatch in the OSP8 & 9 templates.  Specifically the NeutronTenantMtu heat parameters.

OSP8 heat template parameter: 

NeutronTenantMtu
type: number
default: 1400

OSP9 heat template parameter:

NeutronTenantMtu
type: string
default: ''

Tracing this issue down; in the heat db I see a mismatch in the raw_template table for the nested stack associated with the OS::TripleO::Controller (or Compute) resources.  The template data contains the (OSP8) type: number and the environment data contains a default (OSP9) null ('') value.  This causes the following exception in the heat-engine.log

2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher [req-64a7ef3c-16bc-4ca5-bee1-87b4abb8d678 - admin - default default] Exception during message handling: could not convert string to float:
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 138, in _dispatch_and_reply
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher     incoming.message))
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _dispatch
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher     return self._do_dispatch(endpoint, method, ctxt, args)
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 127, in _do_dispatch
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher     result = func(ctxt, **new_args)
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 117, in wrapper
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher     return f(*args, **kwargs)
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 329, in wrapped
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher     return func(self, ctx, *args, **kwargs)
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/heat/engine/service.py", line 518, in show_stack
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher     stack, resolve_outputs=resolve_outputs) for stack in stacks]
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/heat/engine/api.py", line 221, in format_stack
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher     rpc_api.STACK_PARAMETERS: stack.parameters.map(six.text_type),
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/heat/engine/parameters.py", line 540, in map
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher     for n, p in six.iteritems(self.params) if filter_func(p))
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/heat/engine/parameters.py", line 540, in <genexpr>
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher     for n, p in six.iteritems(self.params) if filter_func(p))
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/heat/engine/parameters.py", line 290, in __str__
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher     value = self.value()
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/heat/engine/parameters.py", line 316, in value
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher     return Schema.str_to_num(super(NumberParam, self).value())
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/heat/engine/constraints.py", line 181, in str_to_num
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher     return float(value)
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher ValueError: could not convert string to float:
2016-09-23 17:26:14.216 12320 ERROR oslo_messaging.rpc.dispatcher
2016-09-23 17:26:14.218 12320 ERROR oslo_messaging._drivers.common [req-64a7ef3c-16bc-4ca5-bee1-87b4abb8d678 - admin - default default] Returning exception could not convert string to float:  to caller

The next issue observed when trying to do another deployment after the inital failure.

- metadata seems to be broken on all the overcloud nodes.  Any software deployment will hang indefinitely.
- I noticed that the os-collect-config configuration changes during this initial upgrade step.

In OSP8 it uses a heat-cfn endpoint and in OSP9 a swift container is accessed to pull heat software deployment data.  Seems like something in the conversion may be causing issues here.

During this broke condition os-collect-config will loop on the software deployment data and continue to re-apply the same configuration every minute; even when no heat stack update is running.

Version-Release number of selected component (if applicable):
Current OSP 9 bits


How reproducible:
100%

Steps to Reproduce:
1.Create OSP8 deployment
2.Upgrade Director to OSP9 (or use OSP9 director to deploy OSP8 and skip this step).
3.Shutdown os-collect-config on an overcloud node to create a deployment failure (timeout).
4. Run step  3.4.2. Installing Aodh from the upgrade guide.
5. Let deployment fail
6. Observe issue with: heat stack-list -n
7. restart os-collect-config on overcloud node(s)
8. restart deployment 
9. Observe that heat software deployments will hang and never complete.

Actual results:
The deployment breaks and is not recoverable by re-applying the stack update after resolving the failure cause.


Expected results:
heat stack update should recover and complete the upgrade step.


Additional info:
I believe setting the NeutronTenantMtu parameter will resolve the "convert string to float" issue but os-collect-config is still broken.