osp-director-9: upgrade from 8.0-> 9.0 fails due to down pcs cluster on controllers (UPDATE_FAILED: ControllerPacemakerUpgradeDeployment_Step_1) Description: ------------- Upgrade attempt : 8.0 Async -> 9.0 puddle: 2016-06-02.1. upgrade failed on ControllerPacemakerUpgradeDeployment_Step_1 , it seems that the pcs cluster was down on the controllers, when I tried to start the cluster manually I couldn't, because openstack-keystone service refuse to start ( see logs) Environment: ------------- openstack-puppet-modules-8.0.4-1.el7ost.noarch openstack-tripleo-puppet-elements-2.0.0-2.el7ost.noarch puppet-3.6.2-2.el7.noarch instack-undercloud-4.0.0-2.el7ost.noarch instack-0.0.8-3.el7ost.noarch openstack-tripleo-heat-templates-2.0.0-8.el7ost.noarch openstack-heat-engine-6.0.0-3.el7ost.noarch openstack-tripleo-heat-templates-kilo-2.0.0-8.el7ost.noarch python-heatclient-1.2.0-1.el7ost.noarch openstack-heat-common-6.0.0-3.el7ost.noarch openstack-heat-api-cfn-6.0.0-3.el7ost.noarch openstack-heat-api-cloudwatch-6.0.0-3.el7ost.noarch heat-cfntools-1.3.0-2.el7ost.noarch openstack-heat-templates-0-0.8.20150605git.el7ost.noarch openstack-tripleo-heat-templates-liberty-2.0.0-8.el7ost.noarch openstack-heat-api-6.0.0-3.el7ost.noarch After running the command : ----------------------------- openstack overcloud deploy --templates /usr/share/openstack-tripleo-heat-templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /home/stack/network-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-pacemaker.yaml the console view: ------------------ 2016-02-10 23:10:41 [ControllerDeployment]: SIGNAL_COMPLETE Unknown 2016-02-10 23:10:43 [NetworkDeployment]: SIGNAL_COMPLETE Unknown 2016-02-10 23:10:44 [0]: SIGNAL_COMPLETE Unknown 2016-02-10 23:10:46 [NetworkDeployment]: SIGNAL_COMPLETE Unknown 2016-02-10 23:10:46 [NetworkDeployment]: SIGNAL_COMPLETE Unknown 2016-02-10 23:10:47 [1]: SIGNAL_COMPLETE Unknown 2016-02-10 23:10:47 [2]: SIGNAL_COMPLETE Unknown ERROR: Authentication failed: Authentication required There was an error running Upgrade overcloud - step #3:upgrade. Exiting.... troubleshoot : --------------- [stack@undercloud72 ~]$ openstack stack resource list -n5 overcloud | grep -v COMPLETE +--------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+---------------------+-----------------------------------------------------------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name | +--------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+---------------------+-----------------------------------------------------------------------------------------------+ | UpdateWorkflow | 5dfa85d2-27e6-404b-a9d9-cab976c9735f | OS::TripleO::Tasks::UpdateWorkflow | UPDATE_FAILED | 2016-02-10T22:58:41 | overcloud | | ControllerPacemakerUpgradeDeployment_Step1 | 305c60ab-fb0d-4b75-8333-e78d095fe012 | OS::Heat::SoftwareDeploymentGroup | CREATE_FAILED | 2016-02-10T22:58:51 | overcloud-UpdateWorkflow-hoca4u7thveg | | 1 | 81c45d3d-ce78-4a4b-8352-00b967a16ac9 | OS::Heat::SoftwareDeployment | CREATE_IN_PROGRESS | 2016-02-10T22:58:52 | overcloud-UpdateWorkflow-hoca4u7thveg-ControllerPacemakerUpgradeDeployment_Step1-cjtsjqiwxqyl | | 0 | 01adb662-c21a-478e-b22d-128205676f9a | OS::Heat::SoftwareDeployment | CREATE_IN_PROGRESS | 2016-02-10T22:58:52 | overcloud-UpdateWorkflow-hoca4u7thveg-ControllerPacemakerUpgradeDeployment_Step1-cjtsjqiwxqyl | | 2 | 666147f7-e933-47ce-9868-127445043787 | OS::Heat::SoftwareDeployment | CREATE_IN_PROGRESS | 2016-02-10T22:58:52 | overcloud-UpdateWorkflow-hoca4u7thveg-ControllerPacemakerUpgradeDeployment_Step1-cjtsjqiwxqyl | +--------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+---------------------+-----------------------------------------------------------------------------------------------+ pcs status on the controller: [root@overcloud-controller-0 ~]# pcs status Cluster name: tripleo_cluster Last updated: Fri Jun 3 18:34:34 2016 Last change: Fri Jun 3 17:30:30 2016 by root via crm_resource on overcloud-controller-0 Stack: corosync Current DC: overcloud-controller-1 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum 3 nodes and 112 resources configured Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Full list of resources: ip-10.19.184.210 (ocf::heartbeat:IPaddr2): (target-role:Stopped) Stopped ip-192.168.200.10 (ocf::heartbeat:IPaddr2): (target-role:Stopped) Stopped Clone Set: haproxy-clone [haproxy] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] ip-192.168.0.6 (ocf::heartbeat:IPaddr2): (target-role:Stopped) Stopped ip-10.19.105.10 (ocf::heartbeat:IPaddr2): (target-role:Stopped) Stopped ip-10.19.104.11 (ocf::heartbeat:IPaddr2): (target-role:Stopped) Stopped ip-10.19.104.10 (ocf::heartbeat:IPaddr2): (target-role:Stopped) Stopped Master/Slave Set: redis-master [redis] Masters: [ overcloud-controller-1 ] Slaves: [ overcloud-controller-0 overcloud-controller-2 ] Master/Slave Set: galera-master [galera] Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: mongod-clone [mongod] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: rabbitmq-clone [rabbitmq] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: memcached-clone [memcached] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-l3-agent-clone [neutron-l3-agent] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-ceilometer-alarm-notifier-clone [openstack-ceilometer-alarm-notifier] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-heat-engine-clone [openstack-heat-engine] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-heat-api-clone [openstack-heat-api] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-nova-api-clone [openstack-nova-api] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-keystone-clone [openstack-keystone] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-glance-registry-clone [openstack-glance-registry] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-cinder-api-clone [openstack-cinder-api] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-glance-api-clone [openstack-glance-api] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: delay-clone [delay] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: neutron-server-clone [neutron-server] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: httpd-clone [httpd] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-ceilometer-alarm-evaluator-clone [openstack-ceilometer-alarm-evaluator] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] openstack-cinder-volume (systemd:openstack-cinder-volume): Stopped Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor] Stopped: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] Failed Actions: * openstack-keystone_start_0 on overcloud-controller-0 'not running' (7): call=234, status=complete, exitreason='none', last-rc-change='Fri Jun 3 18:04:06 2016', queued=0ms, exec=2173ms * httpd_start_0 on overcloud-controller-0 'not running' (7): call=211, status=complete, exitreason='none', last-rc-change='Fri Jun 3 18:02:32 2016', queued=0ms, exec=10541ms * openstack-keystone_start_0 on overcloud-controller-1 'not running' (7): call=233, status=complete, exitreason='none', last-rc-change='Fri Jun 3 18:04:06 2016', queued=0ms, exec=2179ms * httpd_start_0 on overcloud-controller-1 'not running' (7): call=209, status=complete, exitreason='none', last-rc-change='Fri Jun 3 18:02:32 2016', queued=0ms, exec=10495ms * openstack-keystone_start_0 on overcloud-controller-2 'not running' (7): call=228, status=complete, exitreason='none', last-rc-change='Fri Jun 3 18:04:06 2016', queued=0ms, exec=2086ms * httpd_start_0 on overcloud-controller-2 'not running' (7): call=209, status=complete, exitreason='none', last-rc-change='Fri Jun 3 18:02:32 2016', queued=0ms, exec=10511ms PCSD Status: overcloud-controller-0: Online overcloud-controller-1: Online overcloud-controller-2: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled keyston.log (after attempt to start the service manually ) ---------------------- 2016-06-03 18:04:08.571 29355 WARNING keystone.assignment.core [req-f3f23f35-da11-421b-88fd-f76f38208b8b - - - - -] Deprecated: Use of the identity driver config to automatically configure the same assignment driver has been deprecated, in the "O" release, the assignment driver will need to be expicitly configured if different than the default (SQL). 2016-06-03 18:04:08.591 29355 WARNING keystone.common.manager [req-f3f23f35-da11-421b-88fd-f76f38208b8b - - - - -] Deprecated: Direct import of driver 'keystone.catalog.backends.sql.Catalog' is deprecated as of Liberty in favor of its entrypoint from 'keystone.catalog' and may be removed in N. 2016-06-03 18:04:08.616 29355 WARNING oslo_log.versionutils [req-f3f23f35-da11-421b-88fd-f76f38208b8b - - - - -] Deprecated: keystone.contrib.federation.backends.sql.Federation is deprecated as of Mitaka in favor of sql and may be removed in O. 2016-06-03 18:04:08.621 29355 CRITICAL keystone [req-f3f23f35-da11-421b-88fd-f76f38208b8b - - - - -] ValueError: Empty module name 2016-06-03 18:04:08.621 29355 ERROR keystone Traceback (most recent call last): 2016-06-03 18:04:08.621 29355 ERROR keystone File "/usr/bin/keystone-all", line 10, in <module> 2016-06-03 18:04:08.621 29355 ERROR keystone sys.exit(main()) 2016-06-03 18:04:08.621 29355 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/cmd/all.py", line 38, in main 2016-06-03 18:04:08.621 29355 ERROR keystone eventlet_server.run(possible_topdir) 2016-06-03 18:04:08.621 29355 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/server/eventlet.py", line 155, in run 2016-06-03 18:04:08.621 29355 ERROR keystone startup_application_fn=create_servers) 2016-06-03 18:04:08.621 29355 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/server/common.py", line 49, in setup_backends 2016-06-03 18:04:08.621 29355 ERROR keystone drivers = backends.load_backends() 2016-06-03 18:04:08.621 29355 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/server/backends.py", line 62, in load_backends 2016-06-03 18:04:08.621 29355 ERROR keystone shadow_users_api=identity.ShadowUsersManager(), 2016-06-03 18:04:08.621 29355 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/common/dependency.py", line 96, in __wrapped_init__ 2016-06-03 18:04:08.621 29355 ERROR keystone init(self, *args, **kwargs) 2016-06-03 18:04:08.621 29355 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/identity/core.py", line 1572, in __init__ 2016-06-03 18:04:08.621 29355 ERROR keystone super(ShadowUsersManager, self).__init__(CONF.shadow_users.driver) 2016-06-03 18:04:08.621 29355 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/common/manager.py", line 183, in __init__ 2016-06-03 18:04:08.621 29355 ERROR keystone self.driver = load_driver(self.driver_namespace, driver_name) 2016-06-03 18:04:08.621 29355 ERROR keystone File "/usr/lib/python2.7/site-packages/keystone/common/manager.py", line 80, in load_driver 2016-06-03 18:04:08.621 29355 ERROR keystone driver = importutils.import_object(driver_name, *args) 2016-06-03 18:04:08.621 29355 ERROR keystone File "/usr/lib/python2.7/site-packages/oslo_utils/importutils.py", line 44, in import_object 2016-06-03 18:04:08.621 29355 ERROR keystone return import_class(import_str)(*args, **kwargs) 2016-06-03 18:04:08.621 29355 ERROR keystone File "/usr/lib/python2.7/site-packages/oslo_utils/importutils.py", line 30, in import_class 2016-06-03 18:04:08.621 29355 ERROR keystone __import__(mod_str) 2016-06-03 18:04:08.621 29355 ERROR keystone ValueError: Empty module name 2016-06-03 18:04:08.621 29355 ERROR keystone (END)
Created attachment 1164621 [details] Messages from controller
Created attachment 1164622 [details] keystone.log from controller
Hi Omri, just had a poke at the logs as promised yesterday and since qe is blocking here. Thanks for the logs, especially /var/log/messages is helpful. I am not sure what the root is yet, but afaics it isn't the keystone error you paste in the description. From the logs I can see that the original upgrade was run at ~ Jun 3 01:12:54 - but i can also see when you tried to stop/start the cluster again manually at ~ Jun 3 13:29:59. During that second time you also may be hitting https://bugzilla.redhat.com/show_bug.cgi?id=1343905 as I see a trace like: Jun 3 13:29:47 localhost rabbitmq-cluster(rabbitmq)[30714]: INFO: RabbitMQ server is not running Jun 3 13:29:47 localhost lrmd[15185]: notice: rabbitmq_stop_0:30714:stderr [ Error: {aborted,{no_exists,[rabbit_user, ] Jun 3 13:29:47 localhost lrmd[15185]: notice: rabbitmq_stop_0:30714:stderr [ [{{internal_user,'$1','_','_'}, ] Jun 3 13:29:47 localhost lrmd[15185]: notice: rabbitmq_stop_0:30714:stderr [ [{'/=','$1',<<"guest">>}], ] Jun 3 13:29:47 localhost lrmd[15185]: notice: rabbitmq_stop_0:30714:stderr [ ['$_']}]]}} ] Jun 3 13:29:47 localhost crmd[15188]: notice: Operation rabbitmq_stop_0: ok (node=overcloud-controller-0, call=759, rc=0, cib-update=296, confirmed=true) Jun 3 13:29:57 localhost galera(galera)[32334]: INFO: MySQL is not running I thought this was initially the root here but then I put the timestamps together. So, in any case, right now I think the mostly likely root is a problem stopping openstack-heat-engine, during the extraconfig/tasks/major_upgrade_controller_pacemaker_1.sh ([1] for reference) and this concerns me as I haven't seen/noticed before (though let's first see if we can reproduce?). AFAICS: Cluster being stopped correctly, line 41@[1] Jun 3 01:12:54 localhost systemd: Stopped Pacemaker High Availability Cluster Manager. Jun 3 01:12:55 localhost systemd: Stopping Corosync Cluster Engine... The swift services being brought down, line 45@[1] Jun 3 01:12:56 localhost systemd: Stopped OpenStack Object Storage (swift) - Proxy Server Packages being updated, line 58@[1] Jun 3 01:15:41 localhost yum[36749]: Updated: openstack-utils-2015.2-1.el7ost.noarch So this confused me... looks like everything is being executed (not much left in [1] at this point apart from the crudinis). I spotted the heat-engine error, basically an error stopping heat-engine (which eventually does stop, the full trace here repeats a few times, but is failed): Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.794 42827 ERROR heat.engine.service [-] Failed to stop engine service, Cannot switch to MAINLOOP from MAINLOOP Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.833 42817 ERROR heat.engine.service [-] Failed to stop engine service, Cannot switch to MAINLOOP from MAINLOOP Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters [req-823967b7-242e-4e4f-8c86-a6f441fd9728 - -] DB exception wrapped. Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call last): Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters context) Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters cursor.execute(statement, parameters) Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 146, in execute Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters result = self._query(query) Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 296, in _query Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters conn.query(q) Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 781, in query Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters self._affected_rows = self._read_query_result(unbuffered=unbuffered) Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 942, in _read_query_result Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters result.read() Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1138, in read Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters first_packet = self.connection._read_packet() Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 895, in _read_packet Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters packet_header = self._read_bytes(4) Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 912, in _read_bytes Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters data = self._rfile.read(num_bytes) Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/pymysql/_socketio.py", line 59, in readinto Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters return self._sock.recv_into(b) Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 346, in recv_into Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters timeout_exc=socket.timeout("timed out")) Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 201, in _trampoline Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters mark_as_closed=self._mark_as_closed) Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib/python2.7/site-packages/eventlet/hubs/__init__.py", line 144, in trampoline Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters assert hub.greenlet is not current, 'do not call blocking functions from the mainloop' Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters AssertionError: do not call blocking functions from the mainloop Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.850 42794 ERROR oslo_db.sqlalchemy.exc_filters Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.821 42827 ERROR oslo_db.sqlalchemy.exc_filters [req-8d154c6d-9552-4162-bea0-fab590b14e7d - -] DB exception wrapped. Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.821 42827 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call last): Jun 3 01:10:26 localhost heat-engine: 2016-06-03 05:10:26.821 42827 ERROR oslo_db.sqlalchemy.exc_filters File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context Jun 3 01:10:27 localhost heat-engine: 2016-06-03 05:10:27.067 42827 ERROR sqlalchemy.pool.QueuePool OperationalError: (2014, 'Command Out of Sync') Jun 3 01:10:27 localhost systemd: Stopped Openstack Heat Engine Service. Jun 3 01:10:27 localhost systemd: Unit openstack-heat-engine.service entered failed state. Jun 3 01:10:27 localhost systemd: openstack-heat-engine.service failed. Jun 3 01:10:27 localhost systemd: Stopping OpenStack Cinder Volume Server... SO. Can you please see if you can reproduce this, particularly if we see the same heat-engine issue then it is something to investigate more. As mentioned there are/where other issues, like the rabbit not rejoining the cluster (which you did hit, and would likely have hit anyway had the upgrade proceeded here) that have since been fixed. thank, marios [1] https://github.com/openstack/tripleo-heat-templates/blob/a47d580018625d1b6363228f4254ded317c2091e/extraconfig/tasks/major_upgrade_controller_pacemaker_1.sh
sorry, one more thing, can you please include ALL commands used to this point so we can get a solid reproducer. I mean, the fail/bug here is about major_upgrade_controller_pacemaker_1.sh i.e. during the controller upgrade. So was init, aodh migration, keystone migration etc completed correctly and can you please include all the cli for those.
(In reply to marios from comment #4) > Can you please see if you can reproduce this, particularly if we see the > same heat-engine issue then it is something to investigate more. As > mentioned there are/where other issues, like the rabbit not rejoining the > cluster (which you did hit, and would likely have hit anyway had the upgrade > proceeded here) that have since been fixed. > > thank, marios Thanks Marios - I was trying to reproduce this issue and encountered this Bz : https://bugzilla.redhat.com/show_bug.cgi?id=1351338 so it seems that we might had a progress, but now we are failing on a different step in the overcloud upgrade process (comment #7 on #1351338) (In reply to marios from comment #5) > sorry, one more thing, can you please include ALL commands used to this > point so we can get a solid reproducer. I mean, the fail/bug here is about > major_upgrade_controller_pacemaker_1.sh i.e. during the controller upgrade. > > So was init, aodh migration, keystone migration etc completed correctly and > can you please include all the cli for those. Sure thing, we have automation script for it, and the steps that i took are documented in this etherpad : http://etherpad.corp.redhat.com/ospd-9-upgrade-ohochman