Bug 1479327

Summary: OSP10 -> OSP11 upgrade: major upgrade composable step fails on composable roles deployment while running cinder-manage db sync
Product: Red Hat OpenStack Reporter: Marius Cornea <mcornea>
Component: openstack-tripleo-heat-templatesAssignee: Sofer Athlan-Guyot <sathlang>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: urgent Docs Contact:
Priority: high    
Version: 11.0 (Ocata)CC: ccollett, dbecker, jjoyce, mbultel, mburns, morazi, rhel-osp-director-maint, sathlang
Target Milestone: z3Keywords: Regression, Triaged, ZStream
Target Release: 11.0 (Ocata)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-6.2.1-2.el7ost Doc Type: Bug Fix
Doc Text:
Previously, upgrading a composable roles deployment running cinder-manage db sync would fail with duplicate columns in the cinder database. This was because cinder-manage db sync was run multiple times. With this update, the sync only runs only one controller. As a result, the cinder database schema successfully upgrades.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-31 17:37:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2017-08-08 11:24:50 UTC
Description of problem:
OSP10 -> OSP11 upgrade: major upgrade composable step fails on composable roles deployment while running cinder-manage db sync:

[stack@undercloud-0 ~]$ openstack stack failures list overcloud
overcloud.AllNodesDeploySteps.ControllerUpgrade_Step5.0:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: 2c5739e7-6099-4830-98ef-eb610d747866
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
    TASK [Gathering Facts] *********************************************************
    ok: [localhost]
    
    TASK [Sync cinder DB] **********************************************************
    fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["cinder-manage", "db", "sync"], "delta": "0:00:01.579537", "end": "2017-08-08 11:10:19.235581", "failed": true, "rc": 1, "start": "2017-08-08 11:10:17.656044", "stderr": "Option \"logdir\" from group \"DEFAULT\" is deprecated. Use option \"log-dir\" from group \"DEFAULT\".", "stderr_lines": ["Option \"logdir\" from group \"DEFAULT\" is deprecated. Use option \"log-dir\" from group \"DEFAULT\"."], "stdout": "", "stdout_lines": []}
    	to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/84534f12-7925-4c8c-96c8-b2fd69857f54_playbook.retry
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=1    changed=0    unreachable=0    failed=1   
    
    (truncated, view all with --long)
  deploy_stderr: |

overcloud.AllNodesDeploySteps.ServiceApiUpgrade_Step5:
  resource_type: OS::Heat::SoftwareDeploymentGroup
  physical_resource_id: f3714032-4cce-4f76-9797-18303b065d5e
  status: CREATE_FAILED
  status_reason: |
    CREATE aborted
overcloud.AllNodesDeploySteps.CephStorageUpgrade_Step5:
  resource_type: OS::Heat::SoftwareDeploymentGroup
  physical_resource_id: ae7b3aab-8b18-433a-bbff-97aa4148ce83
  status: CREATE_FAILED
  status_reason: |
    CREATE aborted



Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-6.1.0-2.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP11 with custom role running systemd managed services

#!/bin/bash

timeout 100m openstack overcloud deploy --templates /usr/share/openstack-tripleo-heat-templates \
  -r /home/stack/composable_roles/roles/roles_data.yaml \
  -e /home/stack/composable_roles/roles/nodes.yaml \
  -e /home/stack/composable_roles/roles/network-config.yaml \
--environment-file /usr/share/openstack-tripleo-heat-templates/environments/services/sahara.yaml \
--environment-file /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e /home/stack/composable_roles/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/composable_roles/network/network-environment.yaml \
-e /home/stack/composable_roles/enable-tls.yaml \
-e /home/stack/composable_roles/inject-trust-anchor.yaml \
-e /home/stack/composable_roles/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml \
-e /home/stack/composable_roles/debug.yaml \
--log-file overcloud_deployment_36.log

roles_data: http://paste.openstack.org/show/617763/

Actual results:
major upgrade composable step fails

Expected results:
Upgrade completes fine

Additional info:
Attaching sosreports.

Comment 2 Sofer Athlan-Guyot 2017-08-08 13:22:11 UTC
We have this exception in the cinder-manage.log of ctl0, 

2017-08-08 11:10:19.121 359275 INFO migrate.versioning.api [-] 89 -> 90...
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters [-] DBAPIError exception wrapped from (pymysql.err.InternalError) (1060, u"Duplicate column name 'race_preventer'") [SQL: u'\nALTER TABLE workers ADD race_preventer INTEGER NOT NULL DEFAULT 0']
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call last):
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters context)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters cursor.execute(statement, parameters)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 166, in execute
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters result = self._query(query)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 322, in _query
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters conn.query(q)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 841, in query
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters self._affected_rows = self._read_query_result(unbuffered=unbuffered)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1029, in _read_query_result
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters result.read()
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1312, in read
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters first_packet = self.connection._read_packet()
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 991, in _read_packet
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters packet.check_error()
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters err.raise_mysql_exception(self._data)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters raise errorclass(errno, errval)
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters InternalError: (1060, u"Duplicate column name 'race_preventer'")
2017-08-08 11:10:19.137 359275 ERROR oslo_db.sqlalchemy.exc_filters
2017-08-08 11:10:19.140 359275 CRITICAL cinder [-] DBError: (pymysql.err.InternalError) (1060, u"Duplicate column name 'race_preventer'") [SQL: u'\nALTER TABLE workers ADD race_preventer INTEGER NOT NULL DEFAULT 0']

and we can see that the migration already happened on ctl1:

2017-08-08 11:10:18.959 349866 DEBUG migrate.versioning.repository [-] Config: OrderedDict([('db_settings', OrderedDict([('__name__', 'db_settings'), ('repository_id', 'cinder'), ('version_table', 'migrate_version'), ('required_dbs', '[]')]))]) __init__ /usr/lib/python2.7/site-packages/migrate/versioning/repository.py:83
2017-08-08 11:10:18.969 349866 INFO migrate.versioning.api [-] 79 -> 80...
2017-08-08 11:10:18.977 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:18.977 349866 INFO migrate.versioning.api [-] 80 -> 81...
2017-08-08 11:10:18.982 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:18.983 349866 INFO migrate.versioning.api [-] 81 -> 82...
2017-08-08 11:10:18.988 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:18.989 349866 INFO migrate.versioning.api [-] 82 -> 83...
2017-08-08 11:10:18.994 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:18.995 349866 INFO migrate.versioning.api [-] 83 -> 84...
2017-08-08 11:10:19.000 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.000 349866 INFO migrate.versioning.api [-] 84 -> 85...
2017-08-08 11:10:19.025 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.025 349866 INFO migrate.versioning.api [-] 85 -> 86...
2017-08-08 11:10:19.053 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.053 349866 INFO migrate.versioning.api [-] 86 -> 87...
2017-08-08 11:10:19.068 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.068 349866 INFO migrate.versioning.api [-] 87 -> 88...
2017-08-08 11:10:19.101 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.101 349866 INFO migrate.versioning.api [-] 88 -> 89...
2017-08-08 11:10:19.117 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.117 349866 INFO migrate.versioning.api [-] 89 -> 90...
2017-08-08 11:10:19.142 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.142 349866 INFO migrate.versioning.api [-] 90 -> 91...
2017-08-08 11:10:19.175 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.175 349866 INFO migrate.versioning.api [-] 91 -> 92...
2017-08-08 11:10:19.181 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.182 349866 INFO migrate.versioning.api [-] 92 -> 93...
2017-08-08 11:10:19.187 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.188 349866 INFO migrate.versioning.api [-] 93 -> 94...
2017-08-08 11:10:19.194 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.194 349866 INFO migrate.versioning.api [-] 94 -> 95...
2017-08-08 11:10:19.202 349866 INFO migrate.versioning.api [-] done
2017-08-08 11:10:19.202 349866 INFO migrate.versioning.api [-] 95 -> 96...
2017-08-08 11:10:19.210 349866 INFO migrate.versioning.api [-] done

hence the duplicate error.

Comment 3 Sofer Athlan-Guyot 2017-08-18 07:11:32 UTC
Backported to stable/ocata upstream.

Comment 7 errata-xmlrpc 2017-10-31 17:37:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3098