Bug 1427569 - OSP10 -> OSP11 upgrade fails when Nova services are running on a standalone node
Summary: OSP10 -> OSP11 upgrade fails when Nova services are running on a standalone node
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: 11.0 (Ocata)
Assignee: Sofer Athlan-Guyot
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-28 16:00 UTC by Marius Cornea
Modified: 2017-05-17 20:02 UTC (History)
10 users (show)

Fixed In Version: openstack-tripleo-heat-templates-6.0.0-4.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-17 20:02:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 452830 0 None MERGED Ensure upgrade step orchestration accross roles. 2020-09-19 07:34:23 UTC
Red Hat Product Errata RHEA-2017:1245 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 11.0 Bug Fix and Enhancement Advisory 2017-05-17 23:01:50 UTC

Description Marius Cornea 2017-02-28 16:00:27 UTC
Description of problem:
OSP10 -> OSP11 upgrade fails when Nova services are running on a standalone role. 

roles_data file:
http://paste.openstack.org/show/600798/

Upgrade fails during major-upgrade-composable-steps.yaml with the following error:

stdout: overcloud.AllNodesDeploySteps.ControllerUpgrade_Step2:
  resource_type: OS::Heat::SoftwareDeploymentGroup
  physical_resource_id: 170d8e1d-58e0-4720-8149-a9fd4f2b9e1d
  status: CREATE_FAILED
  status_reason: |
    CREATE aborted
overcloud.AllNodesDeploySteps.NovacontrolUpgrade_Step5.0:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: 5cf72dbf-9b22-4f63-8b98-25061864df35
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
    TASK [Run puppet apply to set tranport_url in nova.conf] ***********************
    changed: [localhost]
    
    TASK [Setup cell_v2 (map cell0)] ***********************************************
    fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["nova-manage", "cell_v2", "map_cell0"], "delta": "0:02:12.569490", "end": "2017-02-28 15:41:23.802908", "failed": true, "rc": 1, "start": "2017-02-28 15:39:11.233418", "stderr": "", "stdout": "An error has occurred:
Traceback (most recent call last):
  File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 1594, in main
    ret = fn(*fn_args, **fn_kwargs)
  File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 1140, in map_cell0
    self._map_cell0(database_connection=database_connection)
  File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 1170, in _map_cell0
    cell_mapping.create()
  File \"/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py\", line 226, in wrapper
    return fn(self, *args, **kwargs)
  File \"/usr/lib/python2.7/site-packages/nova/objects/cell_mapping.py\", line 71, in create
    db_mapping = self._create_in_db(self._context, self.obj_get_changes())
  File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 893, in wrapper
    with self._transaction_scope(context):
  File \"/usr/lib64/python2.7/contextlib.py\", line 17, in __enter__
    return self.gen.next()
  File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 944, in _transaction_scope
    allow_async=self._allow_async) as resource:
  File \"/usr/lib64/python2.7/contextlib.py\", line 17, in __enter__
    return self.gen.next()
  File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 558, in _session
    bind=self.connection, mode=self.mode)
  File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 317, in _create_session
    self._start()
  File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 403, in _start
    engine_args, maker_args)
  File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 427, in _setup_for_connection
    sql_connection=sql_connection, **engine_kwargs)
  File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py\", line 155, in create_engine
    test_conn = _test_connection(engine, max_retries, retry_interval)
  File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py\", line 339, in _test_connection
    six.reraise(type(de_ref), de_ref)
  File \"<string>\", line 2, in reraise
DBConnectionError: (pymysql.err.OperationalError) (2003, \"Can't connect to MySQL server on '172.17.1.13' ([Errno 113] EHOSTUNREACH)\")", "stdout_lines": ["An error has occurred:", "Traceback (most recent call last):", "  File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 1594, in main", "    ret = fn(*fn_args, **fn_kwargs)", "  File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 1140, in map_cell0", "    self._map_cell0(database_connection=database_connection)", "  File \"/usr/lib/python2.7/site-packages/nova/cmd/manage.py\", line 1170, in _map_cell0", "    cell_mapping.create()", "  File \"/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py\", line 226, in wrapper", "    return fn(self, *args, **kwargs)", "  File \"/usr/lib/python2.7/site-packages/nova/objects/cell_mapping.py\", line 71, in create", "    db_mapping = self._create_in_db(self._context, self.obj_get_changes())", "  File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 893, in wrapper", "    with self._transaction_scope(context):", "  File \"/usr/lib64/python2.7/contextlib.py\", line 17, in __enter__", "    return self.gen.next()", "  File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 944, in _transaction_scope", "    allow_async=self._allow_async) as resource:", "  File \"/usr/lib64/python2.7/contextlib.py\", line 17, in __enter__", "    return self.gen.next()", "  File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 558, in _session", "    bind=self.connection, mode=self.mode)", "  File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 317, in _create_session", "    self._start()", "  File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 403, in _start", "    engine_args, maker_args)", "  File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py\", line 427, in _setup_for_connection", "    sql_connection=sql_connection, **engine_kwargs)", "  File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py\", line 155, in create_engine", "    test_conn = _test_connection(engine, max_retries, retry_interval)", "  File \"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/engines.py\", line 339, in _test_connection", "    six.reraise(type(de_ref), de_ref)", "  File \"<string>\", line 2, in reraise", "DBConnectionError: (pymysql.err.OperationalError) (2003, \"Can't connect to MySQL server on '172.17.1.13' ([Errno 113] EHOSTUNREACH)\")"], "warnings": []}
    	to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/b106b80f-8c24-4896-98d3-06ddf74f7508_playbook.retry



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Deploy OSP10 overcloud with a standalone running nova control plane services
2. Upgrade OSP10 to OSP11

Actual results:
Upgrade fails while running Setup cell_v2 (map cell0) step.

Expected results:
Upgrade succeeds.

Additional info:
172.17.1.13 is the internal API VIP but it cannot be reached because the cluster is not running when this step is run.

Comment 2 Sofer Athlan-Guyot 2017-03-24 12:20:57 UTC
Got a successful run in CI, so moving this one to POST.  Checking if it's still working with the latest puddle.

Comment 3 Marius Cornea 2017-03-24 15:26:09 UTC
(In reply to Sofer Athlan-Guyot from comment #2)
> Got a successful run in CI, so moving this one to POST.  Checking if it's
> still working with the latest puddle.

I wasn't able to reproduce this issue with latest puddle. I think we're good on this one.

Comment 5 Sofer Athlan-Guyot 2017-04-03 11:01:40 UTC
Adding compute for visibility.

Comment 6 Sofer Athlan-Guyot 2017-04-03 11:33:11 UTC
Removing compute, as it's unrelated.  The pcs cluster is not started making the database migration failed as the vip configured in nova::cell0_database_connection isn't reachable.  But this is happening at step5 while all the database should be back in step4.

Comment 7 Sofer Athlan-Guyot 2017-04-03 15:59:02 UTC
Hi,

so the upgrade of the custom role novacontrol is happening at the same
time than the upgrade of the controller node:

I prefix Novacontrol with N and controller logs with C:

 - C: step0: Apr 03 09:08:55

 - N: step0: Apr 03 09:07:36
 - N: step1: Apr 03 09:08:27
 - N: step2: Apr 03 09:08:52

 - C: step1: Apr 03 09:13:56

 - N: step3: Apr 03 09:14:24
 - N: step4: Apr 03 09:14:40

 - C: step2: Apr 03 09:15:01

 - N: step5: Apr 03 09:17:28

 - C: step3: Apr 03 09:20:48
 - C: step4: never happened
 - C: step5: never happened
 

So the Novacontrol role got the time to reach the step5 while the
controller was still at step3.

We shouldn't have this kind of intermixed upgrade happening.  Will
check further on why this happen.

Comment 8 Sofer Athlan-Guyot 2017-04-06 09:57:55 UTC
In stable/ocata.

Comment 11 errata-xmlrpc 2017-05-17 20:02:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1245


Note You need to log in before you can comment on or make changes to this bug.