rhel-osp-director: 8.0->9.0 upgrade fails during major-upgrade-pacemaker.yaml step: ERROR: cluster remained unstable for more than 1800 seconds, exiting. Environment: instack-undercloud-4.0.0-10.el7ost.noarch openstack-tripleo-heat-templates-liberty-2.0.0-24.el7ost.noarch openstack-tripleo-heat-templates-kilo-0.8.14-16.el7ost.noarch openstack-tripleo-heat-templates-2.0.0-24.el7ost.noarch openstack-puppet-modules-8.1.5-1.el7ost.noarch Steps to reproduce: 1. Deploy 8.0 with: openstack overcloud deploy --templates --control-scale 3 --compute-scale 2 --ceph-storage-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server 10.5.26.10 --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml 2. Populate the overcloud. 3. Attempt to upgrade to 9.0 Result: The update fails in step with major-upgrade-pacemaker.yaml 2016-06-29 04:12:20 [ControllerDeployment]: SIGNAL_COMPLETE Unknown 2016-06-29 04:12:22 [NetworkDeployment]: SIGNAL_COMPLETE Unknown 2016-06-29 04:12:23 [2]: SIGNAL_COMPLETE Unknown 2016-06-29 04:48:30 [0]: SIGNAL_COMPLETE Unknown Stack overcloud UPDATE_FAILED Deployment failed: Heat Stack update failed. heat resource-list shows: +--------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+-----------------------------------------------------------------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name | +--------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+-----------------------------------------------------------------------------------------------+ | UpdateWorkflow | f7d617f0-2169-4aa4-b4df-bf46fc5b93c4 | OS::TripleO::Tasks::UpdateWorkflow | UPDATE_FAILED | 2016-06-29T03:46:14 | overcloud | | 0 | fce68cc0-623a-4a77-b6bd-667a796e9419 | OS::Heat::SoftwareDeployment | CREATE_FAILED | 2016-06-29T04:10:31 | overcloud-UpdateWorkflow-pmws3iu75y6x-ControllerPacemakerUpgradeDeployment_Step2-uwcdndxbfyhk | | ControllerPacemakerUpgradeDeployment_Step2 | 3851abff-44a5-4702-b466-4b411baf6055 | OS::Heat::SoftwareDeploymentGroup | CREATE_FAILED | 2016-06-29T04:10:31 | overcloud-UpdateWorkflow-pmws3iu75y6x | +--------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+-----------------------------------------------------------------------------------------------+ [stack@undercloud72 ~]$ echo -e `heat deployment-show fce68cc0-623a-4a77-b6bd-667a796e9419` /usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.0.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.) SubjectAltNameWarning WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead /usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.0.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.) SubjectAltNameWarning { "status": "FAILED", "server_id": "a091da3b-978f-4b73-bea2-318d89ee27b9", "config_id": "68c5ddd1-16d6-42c5-817b-07978ac753a6", "output_values": { "deploy_stdout": "overcloud-controller-1: Starting Cluster... overcloud-controller-0: Starting Cluster... overcloud-controller-2: Starting Cluster... OFFLINE: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] OFFLINE: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] OFFLINE: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] OFFLINE: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] ip-192.168.200.10 has started ip-10.19.94.10 has started ip-192.168.0.6 has started ip-10.19.95.10 has started ip-10.19.184.180 has started ip-10.19.94.11 has started galera has started mongod has started HTTP/1.1 200 OK Content-Type: text/plain Connection: close Content-Length: 32 Galera cluster node is synced. Running upgrade for neutron ... OK Running upgrade for bsn_extensions ... OK Running upgrade for networking-cisco ... OK Running upgrade for networking-odl ... OK Running upgrade for neutron-lbaas ... OK memcached has started rabbitmq has started redis has started ERROR: cluster remained unstable for more than 1800 seconds, exiting. ", "deploy_stderr": "Option \"verbose\" from group \"DEFAULT\" is deprecated for removal. Its value may be silently ignored in the future. Option \"os_endpoint_type\" from group \"service_credentials\" is deprecated. Use option \"interface\" from group \"service_credentials\". Value 'password-ceilometer-legacy' for '[service_credentials]/auth_type' is deprecated. And will be removed in Ceilometer 7.0. Use 'password' instead. 2016-07-30 02:13:48.665 10797 WARNING oslo_reports.guru_meditation_report [-] Guru mediation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports. Option \"logdir\" from group \"DEFAULT\" is deprecated. Use option \"log-dir\" from group \"DEFAULT\". 2016-07-30 02:13:52.938 10900 WARNING py.warnings [-] /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:241: NotSupportedWarning: Configuration option(s) ['use_tpool'] not supported exception.NotSupportedWarning 2016-07-30 02:13:53.149 10900 INFO migrate.versioning.api [-] 60 -> 61... 2016-07-30 02:13:53.184 10900 INFO migrate.versioning.api [-] done 2016-07-30 02:13:53.185 10900 INFO migrate.versioning.api [-] 61 -> 62... 2016-07-30 02:13:53.214 10900 INFO migrate.versioning.api [-] done 2016-07-30 02:13:53.214 10900 INFO migrate.versioning.api [-] 62 -> 63... 2016-07-30 02:13:53.223 10900 INFO migrate.versioning.api [-] done 2016-07-30 02:13:53.223 10900 INFO migrate.versioning.api [-] 63 -> 64... 2016-07-30 02:13:53.245 10900 INFO migrate.versioning.api [-] done 2016-07-30 02:13:53.246 10900 INFO migrate.versioning.api [-] 64 -> 65... 2016-07-30 02:13:53.293 10900 INFO migrate.versioning.api [-] done 2016-07-30 02:13:53.294 10900 INFO migrate.versioning.api [-] 65 -> 66... 2016-07-30 02:13:53.336 10900 INFO migrate.versioning.api [-] done 2016-07-30 02:13:53.337 10900 INFO migrate.versioning.api [-] 66 -> 67... 2016-07-30 02:13:53.365 10900 INFO migrate.versioning.api [-] done 2016-07-30 02:13:53.365 10900 INFO migrate.versioning.api [-] 67 -> 68... 2016-07-30 02:13:53.374 10900 INFO migrate.versioning.api [-] done 2016-07-30 02:13:53.374 10900 INFO migrate.versioning.api [-] 68 -> 69... 2016-07-30 02:13:53.383 10900 INFO migrate.versioning.api [-] done 2016-07-30 02:13:53.383 10900 INFO migrate.versioning.api [-] 69 -> 70... 2016-07-30 02:13:53.392 10900 INFO migrate.versioning.api [-] done 2016-07-30 02:13:53.392 10900 INFO migrate.versioning.api [-] 70 -> 71... 2016-07-30 02:13:53.401 10900 INFO migrate.versioning.api [-] done 2016-07-30 02:13:53.401 10900 INFO migrate.versioning.api [-] 71 -> 72... 2016-07-30 02:13:53.410 10900 INFO migrate.versioning.api [-] done Option \"verbose\" from group \"DEFAULT\" is deprecated for removal. Its value may be silently ignored in the future. /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:1056: OsloDBDeprecationWarning: EngineFacade is deprecated; please use oslo_db.sqlalchemy.enginefacade expire_on_commit=expire_on_commit, _conf=conf) 2016-07-30 02:13:56.042 11000 WARNING oslo_config.cfg [-] Option \"verbose\" from group \"DEFAULT\" is deprecated for removal. Its value may be silently ignored in the future. 2016-07-30 02:13:56.108 11000 INFO migrate.versioning.api [-] 65 -> 66... 2016-07-30 02:13:56.116 11000 INFO migrate.versioning.api [-] done 2016-07-30 02:13:56.116 11000 INFO migrate.versioning.api [-] 66 -> 67... 2016-07-30 02:13:56.124 11000 INFO migrate.versioning.api [-] done 2016-07-30 02:13:56.124 11000 INFO migrate.versioning.api [-] 67 -> 68... 2016-07-30 02:13:56.131 11000 INFO migrate.versioning.api [-] done 2016-07-30 02:13:56.132 11000 INFO migrate.versioning.api [-] 68 -> 69... 2016-07-30 02:13:56.138 11000 INFO migrate.versioning.api [-] done 2016-07-30 02:13:56.139 11000 INFO migrate.versioning.api [-] 69 -> 70... 2016-07-30 02:13:56.146 11000 INFO migrate.versioning.api [-] done 2016-07-30 02:13:56.146 11000 INFO migrate.versioning.api [-] 70 -> 71... 2016-07-30 02:13:56.177 11000 INFO migrate.versioning.api [-] done Option \"verbose\" from group \"DEFAULT\" is deprecated for removal. Its value may be silently ignored in the future. No handlers could be found for logger \"oslo_config.cfg\" INFO [alembic.runtime.migration] Context impl MySQLImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. INFO [alembic.runtime.migration] Context impl MySQLImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. INFO [alembic.runtime.migration] Running upgrade 34af2b5c5a59 -> 59cb5b6cf4d, Add availability zone INFO [alembic.runtime.migration] Running upgrade 59cb5b6cf4d -> 13cfb89f881a, add is_default to subnetpool INFO [alembic.runtime.migration] Running upgrade 13cfb89f881a -> 32e5974ada25, Add standard attribute table INFO [alembic.runtime.migration] Running upgrade 32e5974ada25 -> ec7fcfbf72ee, Add network availability zone INFO [alembic.runtime.migration] Running upgrade ec7fcfbf72ee -> dce3ec7a25c9, Add router availability zone INFO [alembic.runtime.migration] Running upgrade dce3ec7a25c9 -> c3a73f615e4, Add ip_version to AddressScope INFO [alembic.runtime.migration] Running upgrade c3a73f615e4 -> 659bf3d90664, Add tables and attributes to support external DNS integration INFO [alembic.runtime.migration] Running upgrade 659bf3d90664 -> 1df244e556f5, add_unique_ha_router_agent_port_bindings INFO [alembic.runtime.migration] Running upgrade 1df244e556f5 -> 19f26505c74f, Auto Allocated Topology - aka Get-Me-A-Network INFO [alembic.runtime.migration] Running upgrade 19f26505c74f -> 15be73214821, add dynamic routing model data INFO [alembic.runtime.migration] Running upgrade 15be73214821 -> b4caf27aae4, add_bgp_dragent_model_data INFO [alembic.runtime.migration] Running upgrade b4caf27aae4 -> 15e43b934f81, rbac_qos_policy INFO [alembic.runtime.migration] Running upgrade 15e43b934f81 -> 31ed664953e6, Add resource_versions row to agent table INFO [alembic.runtime.migration] Running upgrade 31ed664953e6 -> 2f9e956e7532, tag support INFO [alembic.runtime.migration] Running upgrade 2f9e956e7532 -> 3894bccad37f, add_timestamp_to_base_resources INFO [alembic.runtime.migration] Running upgrade 3894bccad37f -> 0e66c5227a8a, Add desc to standard attr table INFO [alembic.runtime.migration] Running upgrade 4af11ca47297 -> 1b294093239c, Drop embrane plugin table INFO [alembic.runtime.migration] Running upgrade 1b294093239c, 32e5974ada25 -> 8a6d8bdae39, standardattributes migration INFO [alembic.runtime.migration] Running upgrade 8a6d8bdae39 -> 2b4c2465d44b, DVR sheduling refactoring INFO [alembic.runtime.migration] Running upgrade 2b4c2465d44b -> e3278ee65050, Drop NEC plugin tables INFO [alembic.runtime.migration] Running upgrade e3278ee65050, 15e43b934f81 -> c6c112992c9, rbac_qos_policy INFO [alembic.runtime.migration] Running upgrade c6c112992c9 -> 5ffceebfada, network_rbac_external INFO [alembic.runtime.migration] Running upgrade 5ffceebfada, 0e66c5227a8a -> 4ffceebfcdc, standard_desc INFO [alembic.runtime.migration] Context impl MySQLImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. INFO [alembic.runtime.migration] Context impl MySQLImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. INFO [alembic.runtime.migration] Context impl MySQLImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. INFO [alembic.runtime.migration] Context impl MySQLImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. INFO [alembic.runtime.migration] Running upgrade 11ba2d65c8de -> 2e89171ea204, Add baremetal channel-group and is_native INFO [alembic.runtime.migration] Running upgrade 2e89171ea204 -> 13bd9ebffbf5, Add support for UCSM Service Profile Templates. INFO [alembic.runtime.migration] Context impl MySQLImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. INFO [alembic.runtime.migration] Context impl MySQLImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. INFO [alembic.runtime.migration] Running upgrade -> b89a299e19f9, Initial odl db, branchpoint INFO [alembic.runtime.migration] Running upgrade b89a299e19f9 -> 383acb0d38a0, Start of odl contract branch INFO [alembic.runtime.migration] Running upgrade b89a299e19f9 -> 247501328046, Start of odl expand branch INFO [alembic.runtime.migration] Running upgrade 247501328046 -> 37e242787ae5, Opendaylight Neutron mechanism driver refactor INFO [alembic.runtime.migration] Context impl MySQLImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. INFO [alembic.runtime.migration] Context impl MySQLImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. INFO [alembic.runtime.migration] Running upgrade 3345facd0452 -> 4a408dd491c2, Addition of Name column to lbaas_members and lbaas_healthmonitors table INFO [alembic.runtime.migration] Running upgrade 4a408dd491c2 -> 3426acbc12de, Add flavor id INFO [alembic.runtime.migration] Running upgrade 3426acbc12de -> 6aee0434f911, independent pools INFO [alembic.runtime.migration] Running upgrade 6aee0434f911 -> 3543deab1547, add_l7_tables INFO [alembic.runtime.migration] Running upgrade 3543deab1547 -> 62deca5010cd, Add tenant-id index for L7 tables Option \"verbose\" from group \"DEFAULT\" is deprecated for removal. Its value may be silently ignored in the future. Option \"notification_driver\" from group \"DEFAULT\" is deprecated. Use option \"driver\" from group \"oslo_messaging_notifications\". Option \"notification_topics\" from group \"DEFAULT\" is deprecated. Use option \"topics\" from group \"oslo_messaging_notifications\". ", "deploy_status_code": 1 }, "creation_time": "2016-06-29T04:10:35", "updated_time": "2016-06-29T04:48:26", "input_values": { "update_identifier": "", "deploy_identifier": 1467171750 }, "action": "CREATE", "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1", "id": "fce68cc0-623a-4a77-b6bd-667a796e9419" }
It looks like the root cause is bug 1343905 - RabbitMQ is among the resources that failed to start in pacemaker, and there's no running_nodes in the cluster status: [root@overcloud-controller-0 ~]# rabbitmqctl cluster_status Cluster status of node 'rabbit@overcloud-controller-0' ... [{nodes,[{disc,['rabbit@overcloud-controller-0', 'rabbit@overcloud-controller-1']}]}, {alarms,[]}] The cause could be not recent enough resource-agents package on the environment. [root@overcloud-controller-0 ~]# rpm -q resource-agents resource-agents-3.9.5-76.el7.x86_64 To pull in the latest fixes that allow RabbitMQ to reform the cluster properly, we need resource-agents-3.9.5-80.el7.x86_64 or newer.
*** This bug has been marked as a duplicate of bug 1343905 ***