Bug 1496782

Summary: [mix-version] fail to deploy osp 11 overcloud from osp 12 undercloud using compat tht
Product: Red Hat OpenStack Reporter: Raviv Bar-Tal <rbartal>
Component: openstack-tripleo-commonAssignee: Dan Prince <dprince>
Status: CLOSED ERRATA QA Contact: Raviv Bar-Tal <rbartal>
Severity: high Docs Contact:
Priority: high    
Version: 12.0 (Pike)CC: aschultz, brad, ccamacho, dbecker, dprince, jcoufal, jschluet, mandreou, m.andre, mbracho, mburns, morazi, ohochman, rhel-osp-director-maint, slinaber, tdunnon, tvignaud
Target Milestone: betaKeywords: AutomationBlocker, TestBlocker, Triaged
Target Release: 12.0 (Pike)Flags: rbartal: needinfo+
rbartal: needinfo+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-common-7.6.3-0.20171010234828.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-13 22:11:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
deploy comman, files and logs
none
sosreport
none
new deploy
none
new sos report
none
Patch to modify tripleo-common to resolve the issue
none
sosreport none

Description Raviv Bar-Tal 2017-09-28 12:08:45 UTC
Description of problem:
Trying to deploy osp11 overcloud from osp12 undercloud using the compat tht rpm (openstack-tripleo-heat-templates-compat-6.2.0-4) fails with error:
"The environment is not a valid YAML mapping data type."


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. install osp12 undercloud + openstack-tripleo-heat-templates-compat rpm.
2. point the --templat to /usr/share/openstack-tripleo-heat-templates/ocata (and any other reference to tht dir)
3. Deploy overcloud

Actual results:
deploy failed

Expected results:
deploy pass

Additional info:
Attached are sosreport and command related files.

Comment 1 Raviv Bar-Tal 2017-09-28 12:15:15 UTC
Created attachment 1331943 [details]
deploy comman, files and logs

Comment 2 Raviv Bar-Tal 2017-09-28 12:15:57 UTC
Created attachment 1331944 [details]
sosreport

Comment 3 Alex Schultz 2017-09-28 14:26:49 UTC
From the screen log in the tar the deployment seems to fail on mysql going away during deployment.

 Stack overcloud CREATE_FAILED 

overcloud.AllNodesDeploySteps.ControllerDeployment_Step3.0:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: 6c3b8b30-a414-428d-85cd-d78b4454677f
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
  deploy_stdout: |
    ...
    Debug: Prefetching nova_manage resources for nova_cell_v2
    Debug: Executing: '/usr/bin/nova-manage cell_v2 list_cells --verbose'
    Debug: Storing state
    Debug: Stored state in 0.16 seconds
    Debug: Applying settings catalog for sections reporting, metrics
    Debug: Finishing transaction 120889140
    Debug: Received report to process from controller-0
    Debug: Evicting cache entry for environment 'production'
    Debug: Caching environment 'production' (ttl = 0 sec)
    Debug: Processing report from controller-0 with processor Puppet::Reports::Store
    (truncated, view all with --long)
  deploy_stderr: |
    ...
      File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 484, in _start
        engine_args, maker_args)
      File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 508, in _setup_for_connection
        sql_connection=sql_connection, **engine_kwargs)
      File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py", line 179, in create_engine
        test_conn = _test_connection(engine, max_retries, retry_interval)
      File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py", line 365, in _test_connection
        six.reraise(type(de_ref), de_ref)
      File "<string>", line 2, in reraise
    DBConnectionError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')
    (truncated, view all with --long)
Heat Stack create failed.
Heat Stack create failed.

Comment 4 Raviv Bar-Tal 2017-10-03 14:24:20 UTC
This BZ should be a blocker as the deployment fails and we don't have a workaround.

From customer pov: he can not deploy older version overcloud from current undercloud.

Comment 5 Raviv Bar-Tal 2017-10-03 14:35:07 UTC
Created attachment 1333761 [details]
new deploy

Comment 6 Raviv Bar-Tal 2017-10-03 14:36:12 UTC
Created attachment 1333762 [details]
new sos report

Comment 7 Raviv Bar-Tal 2017-10-03 14:41:45 UTC
Hi Alex.
I saw your comment and realized that I attached wrong files to the BZ.

I add the correct file with 'new' prefix,

Please review them and reassign the BZ as necessary.

Thanks.
Raviv

Comment 8 Alex Schultz 2017-10-03 15:01:01 UTC
2017-09-28 07:10:40.814 27653 DEBUG tripleo_common.actions.templates [req-9242beca-9a8a-4f01-84bd-12f69f46cd09 e0d78fa7a4e440b39b433afb033ffea6 309c65c7ac4f4dfc81e919b22f2a6677 - default default] Environments: [{'path': 'overcloud-resource-registry-puppet.yaml'}, {'path': 'environments/docker.yaml'}, {'path': 'environments/docker-ha.yaml'}] run /usr/lib/python2.7/site-packages/tripleo_common/actions/templates.py:323
2017-09-28 07:10:40.967 27653 DEBUG tripleo_common.actions.templates [req-9242beca-9a8a-4f01-84bd-12f69f46cd09 e0d78fa7a4e440b39b433afb033ffea6 309c65c7ac4f4dfc81e919b22f2a6677 - default default] _env_path_is_object https://192.168.24.2:13808/v1/AUTH_309c65c7ac4f4dfc81e919b22f2a6677/overcloud/overcloud-resource-registry-puppet.yaml: True _env_path_is_object /usr/lib/python2.7/site-packages/tripleo_common/actions/templates.py:359
2017-09-28 07:10:46.739 27653 DEBUG tripleo_common.actions.templates [req-9242beca-9a8a-4f01-84bd-12f69f46cd09 e0d78fa7a4e440b39b433afb033ffea6 309c65c7ac4f4dfc81e919b22f2a6677 - default default] _env_path_is_object https://192.168.24.2:13808/v1/AUTH_309c65c7ac4f4dfc81e919b22f2a6677/overcloud/environments/docker.yaml: True _env_path_is_object /usr/lib/python2.7/site-packages/tripleo_common/actions/templates.py:359
2017-09-28 07:10:47.347 27653 DEBUG tripleo_common.actions.templates [req-9242beca-9a8a-4f01-84bd-12f69f46cd09 e0d78fa7a4e440b39b433afb033ffea6 309c65c7ac4f4dfc81e919b22f2a6677 - default default] _env_path_is_object https://192.168.24.2:13808/v1/AUTH_309c65c7ac4f4dfc81e919b22f2a6677/overcloud/environments/docker-ha.yaml: True _env_path_is_object /usr/lib/python2.7/site-packages/tripleo_common/actions/templates.py:359
2017-09-28 07:10:47.363 27653 ERROR tripleo_common.actions.templates [req-9242beca-9a8a-4f01-84bd-12f69f46cd09 e0d78fa7a4e440b39b433afb033ffea6 309c65c7ac4f4dfc81e919b22f2a6677 - default default] Error occurred while processing plan files.: ValueError: The environment is not a valid YAML mapping data type.


I wonder if this is related to the hardcoding of docker deploy for OSP12. Moving over to containers for them to take a look at it.

Comment 9 Raviv Bar-Tal 2017-10-09 13:42:04 UTC
+'ed the need info for myself and Alex, I think we have answered our "need info" request.

Comment 11 Dan Prince 2017-10-17 21:15:40 UTC
Created attachment 1339873 [details]
Patch to modify tripleo-common to resolve the issue

See attached patch which is under review and should be resolved soon.

Comment 13 Raviv Bar-Tal 2017-11-02 13:45:56 UTC
Hi Toure,
I have test this bug with on my system and got the same failed result,
The rpm I have is openstack-tripleo-common-7.6.3-0.20171028055750.el7ost.noarch
which is newer and should work,
This bug is also relevent to up scaling the overcloud after undercloud upgrade,
Which is a scenario where customer realize he needs more modes for live migration after he started the upgrade procedure.

New sos report is attached.

Comment 14 Raviv Bar-Tal 2017-11-02 13:51:00 UTC
Created attachment 1347031 [details]
sosreport

Comment 15 Brad P. Crochet 2017-11-02 14:34:32 UTC
Assigning back to dprince, since he is the one that handled this to begin with.

Comment 17 Martin André 2017-11-02 16:03:38 UTC
We're found the issue, in the original patch we looked at the existence of environments/docker.yaml file in the swift plan, however this file *exists* in OSP11. When looking for environments/docker-ha.yaml instead, we get past the error. I'm going to submit a follow-up patch.

Comment 20 Jon Schlueter 2017-11-10 14:03:48 UTC
openstack-tripleo-common-7.6.3-2.el7ost built including patch for this please check and update status if this resolves it.

Comment 21 Raviv Bar-Tal 2017-11-13 09:26:41 UTC
Hi,
The patch is not included in osp12 beta puddle,
I have installed the undercloud and have this package:
openstack-tripleo-common-7.6.3-0.20171028055750.el7ost.noarch

But it doe's not include the fix.

Comment 22 Dan Prince 2017-11-13 15:30:18 UTC
Per Jon Schluter's latest comment perhaps the latest build (openstack-tripleo-common-7.6.3-2.el7ost) resolves this?

Comment 26 errata-xmlrpc 2017-12-13 22:11:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462