Red Hat Bugzilla – 1361750 – rhel-osp-director: 8.0->9.0 upgrade fails during major-upgrade-pacemaker.yaml step: ERROR: cluster remained unstable for more than 1800 seconds, exiting.
rhel-osp-director: 8.0->9.0 upgrade fails during major-upgrade-pacemaker.yaml step: ERROR: cluster remained unstable for more than 1800 seconds, exiting.
DescriptionAlexander Chuzhoy
2016-07-30 03:11:46 UTC
rhel-osp-director: 8.0->9.0 upgrade fails during major-upgrade-pacemaker.yaml step: ERROR: cluster remained unstable for more than 1800 seconds, exiting.
Environment:
instack-undercloud-4.0.0-10.el7ost.noarch
openstack-tripleo-heat-templates-liberty-2.0.0-24.el7ost.noarch
openstack-tripleo-heat-templates-kilo-0.8.14-16.el7ost.noarch
openstack-tripleo-heat-templates-2.0.0-24.el7ost.noarch
openstack-puppet-modules-8.1.5-1.el7ost.noarch
Steps to reproduce:
1. Deploy 8.0 with:
openstack overcloud deploy --templates --control-scale 3 --compute-scale 2 --ceph-storage-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server 10.5.26.10 --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml
2. Populate the overcloud.
3. Attempt to upgrade to 9.0
Result:
The update fails in step with major-upgrade-pacemaker.yaml
2016-06-29 04:12:20 [ControllerDeployment]: SIGNAL_COMPLETE Unknown
2016-06-29 04:12:22 [NetworkDeployment]: SIGNAL_COMPLETE Unknown
2016-06-29 04:12:23 [2]: SIGNAL_COMPLETE Unknown
2016-06-29 04:48:30 [0]: SIGNAL_COMPLETE Unknown
Stack overcloud UPDATE_FAILED
Deployment failed: Heat Stack update failed.
heat resource-list shows:
+--------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+-----------------------------------------------------------------------------------------------+
| resource_name | physical_resource_id | resource_type | resource_status | updated_time | stack_name |
+--------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+-----------------------------------------------------------------------------------------------+
| UpdateWorkflow | f7d617f0-2169-4aa4-b4df-bf46fc5b93c4 | OS::TripleO::Tasks::UpdateWorkflow | UPDATE_FAILED | 2016-06-29T03:46:14 | overcloud |
| 0 | fce68cc0-623a-4a77-b6bd-667a796e9419 | OS::Heat::SoftwareDeployment | CREATE_FAILED | 2016-06-29T04:10:31 | overcloud-UpdateWorkflow-pmws3iu75y6x-ControllerPacemakerUpgradeDeployment_Step2-uwcdndxbfyhk |
| ControllerPacemakerUpgradeDeployment_Step2 | 3851abff-44a5-4702-b466-4b411baf6055 | OS::Heat::SoftwareDeploymentGroup | CREATE_FAILED | 2016-06-29T04:10:31 | overcloud-UpdateWorkflow-pmws3iu75y6x |
+--------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+---------------------+-----------------------------------------------------------------------------------------------+
[stack@undercloud72 ~]$ echo -e `heat deployment-show fce68cc0-623a-4a77-b6bd-667a796e9419`
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.0.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
SubjectAltNameWarning
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
/usr/lib/python2.7/site-packages/requests/packages/urllib3/connection.py:303: SubjectAltNameWarning: Certificate for 192.168.0.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
SubjectAltNameWarning
{ "status": "FAILED", "server_id": "a091da3b-978f-4b73-bea2-318d89ee27b9", "config_id": "68c5ddd1-16d6-42c5-817b-07978ac753a6", "output_values": { "deploy_stdout": "overcloud-controller-1: Starting Cluster...
overcloud-controller-0: Starting Cluster...
overcloud-controller-2: Starting Cluster...
OFFLINE: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
OFFLINE: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
OFFLINE: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
OFFLINE: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
ip-192.168.200.10 has started
ip-10.19.94.10 has started
ip-192.168.0.6 has started
ip-10.19.95.10 has started
ip-10.19.184.180 has started
ip-10.19.94.11 has started
galera has started
mongod has started
HTTP/1.1 200 OK
Content-Type: text/plain
Connection: close
Content-Length: 32
Galera cluster node is synced.
Running upgrade for neutron ...
OK
Running upgrade for bsn_extensions ...
OK
Running upgrade for networking-cisco ...
OK
Running upgrade for networking-odl ...
OK
Running upgrade for neutron-lbaas ...
OK
memcached has started
rabbitmq has started
redis has started
ERROR: cluster remained unstable for more than 1800 seconds, exiting.
", "deploy_stderr": "Option \"verbose\" from group \"DEFAULT\" is deprecated for removal. Its value may be silently ignored in the future.
Option \"os_endpoint_type\" from group \"service_credentials\" is deprecated. Use option \"interface\" from group \"service_credentials\".
Value 'password-ceilometer-legacy' for '[service_credentials]/auth_type' is deprecated. And will be removed in Ceilometer 7.0. Use 'password' instead.
2016-07-30 02:13:48.665 10797 WARNING oslo_reports.guru_meditation_report [-] Guru mediation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports.
Option \"logdir\" from group \"DEFAULT\" is deprecated. Use option \"log-dir\" from group \"DEFAULT\".
2016-07-30 02:13:52.938 10900 WARNING py.warnings [-] /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:241: NotSupportedWarning: Configuration option(s) ['use_tpool'] not supported
exception.NotSupportedWarning
2016-07-30 02:13:53.149 10900 INFO migrate.versioning.api [-] 60 -> 61...
2016-07-30 02:13:53.184 10900 INFO migrate.versioning.api [-] done
2016-07-30 02:13:53.185 10900 INFO migrate.versioning.api [-] 61 -> 62...
2016-07-30 02:13:53.214 10900 INFO migrate.versioning.api [-] done
2016-07-30 02:13:53.214 10900 INFO migrate.versioning.api [-] 62 -> 63...
2016-07-30 02:13:53.223 10900 INFO migrate.versioning.api [-] done
2016-07-30 02:13:53.223 10900 INFO migrate.versioning.api [-] 63 -> 64...
2016-07-30 02:13:53.245 10900 INFO migrate.versioning.api [-] done
2016-07-30 02:13:53.246 10900 INFO migrate.versioning.api [-] 64 -> 65...
2016-07-30 02:13:53.293 10900 INFO migrate.versioning.api [-] done
2016-07-30 02:13:53.294 10900 INFO migrate.versioning.api [-] 65 -> 66...
2016-07-30 02:13:53.336 10900 INFO migrate.versioning.api [-] done
2016-07-30 02:13:53.337 10900 INFO migrate.versioning.api [-] 66 -> 67...
2016-07-30 02:13:53.365 10900 INFO migrate.versioning.api [-] done
2016-07-30 02:13:53.365 10900 INFO migrate.versioning.api [-] 67 -> 68...
2016-07-30 02:13:53.374 10900 INFO migrate.versioning.api [-] done
2016-07-30 02:13:53.374 10900 INFO migrate.versioning.api [-] 68 -> 69...
2016-07-30 02:13:53.383 10900 INFO migrate.versioning.api [-] done
2016-07-30 02:13:53.383 10900 INFO migrate.versioning.api [-] 69 -> 70...
2016-07-30 02:13:53.392 10900 INFO migrate.versioning.api [-] done
2016-07-30 02:13:53.392 10900 INFO migrate.versioning.api [-] 70 -> 71...
2016-07-30 02:13:53.401 10900 INFO migrate.versioning.api [-] done
2016-07-30 02:13:53.401 10900 INFO migrate.versioning.api [-] 71 -> 72...
2016-07-30 02:13:53.410 10900 INFO migrate.versioning.api [-] done
Option \"verbose\" from group \"DEFAULT\" is deprecated for removal. Its value may be silently ignored in the future.
/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:1056: OsloDBDeprecationWarning: EngineFacade is deprecated; please use oslo_db.sqlalchemy.enginefacade
expire_on_commit=expire_on_commit, _conf=conf)
2016-07-30 02:13:56.042 11000 WARNING oslo_config.cfg [-] Option \"verbose\" from group \"DEFAULT\" is deprecated for removal. Its value may be silently ignored in the future.
2016-07-30 02:13:56.108 11000 INFO migrate.versioning.api [-] 65 -> 66...
2016-07-30 02:13:56.116 11000 INFO migrate.versioning.api [-] done
2016-07-30 02:13:56.116 11000 INFO migrate.versioning.api [-] 66 -> 67...
2016-07-30 02:13:56.124 11000 INFO migrate.versioning.api [-] done
2016-07-30 02:13:56.124 11000 INFO migrate.versioning.api [-] 67 -> 68...
2016-07-30 02:13:56.131 11000 INFO migrate.versioning.api [-] done
2016-07-30 02:13:56.132 11000 INFO migrate.versioning.api [-] 68 -> 69...
2016-07-30 02:13:56.138 11000 INFO migrate.versioning.api [-] done
2016-07-30 02:13:56.139 11000 INFO migrate.versioning.api [-] 69 -> 70...
2016-07-30 02:13:56.146 11000 INFO migrate.versioning.api [-] done
2016-07-30 02:13:56.146 11000 INFO migrate.versioning.api [-] 70 -> 71...
2016-07-30 02:13:56.177 11000 INFO migrate.versioning.api [-] done
Option \"verbose\" from group \"DEFAULT\" is deprecated for removal. Its value may be silently ignored in the future.
No handlers could be found for logger \"oslo_config.cfg\"
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Running upgrade 34af2b5c5a59 -> 59cb5b6cf4d, Add availability zone
INFO [alembic.runtime.migration] Running upgrade 59cb5b6cf4d -> 13cfb89f881a, add is_default to subnetpool
INFO [alembic.runtime.migration] Running upgrade 13cfb89f881a -> 32e5974ada25, Add standard attribute table
INFO [alembic.runtime.migration] Running upgrade 32e5974ada25 -> ec7fcfbf72ee, Add network availability zone
INFO [alembic.runtime.migration] Running upgrade ec7fcfbf72ee -> dce3ec7a25c9, Add router availability zone
INFO [alembic.runtime.migration] Running upgrade dce3ec7a25c9 -> c3a73f615e4, Add ip_version to AddressScope
INFO [alembic.runtime.migration] Running upgrade c3a73f615e4 -> 659bf3d90664, Add tables and attributes to support external DNS integration
INFO [alembic.runtime.migration] Running upgrade 659bf3d90664 -> 1df244e556f5, add_unique_ha_router_agent_port_bindings
INFO [alembic.runtime.migration] Running upgrade 1df244e556f5 -> 19f26505c74f, Auto Allocated Topology - aka Get-Me-A-Network
INFO [alembic.runtime.migration] Running upgrade 19f26505c74f -> 15be73214821, add dynamic routing model data
INFO [alembic.runtime.migration] Running upgrade 15be73214821 -> b4caf27aae4, add_bgp_dragent_model_data
INFO [alembic.runtime.migration] Running upgrade b4caf27aae4 -> 15e43b934f81, rbac_qos_policy
INFO [alembic.runtime.migration] Running upgrade 15e43b934f81 -> 31ed664953e6, Add resource_versions row to agent table
INFO [alembic.runtime.migration] Running upgrade 31ed664953e6 -> 2f9e956e7532, tag support
INFO [alembic.runtime.migration] Running upgrade 2f9e956e7532 -> 3894bccad37f, add_timestamp_to_base_resources
INFO [alembic.runtime.migration] Running upgrade 3894bccad37f -> 0e66c5227a8a, Add desc to standard attr table
INFO [alembic.runtime.migration] Running upgrade 4af11ca47297 -> 1b294093239c, Drop embrane plugin table
INFO [alembic.runtime.migration] Running upgrade 1b294093239c, 32e5974ada25 -> 8a6d8bdae39, standardattributes migration
INFO [alembic.runtime.migration] Running upgrade 8a6d8bdae39 -> 2b4c2465d44b, DVR sheduling refactoring
INFO [alembic.runtime.migration] Running upgrade 2b4c2465d44b -> e3278ee65050, Drop NEC plugin tables
INFO [alembic.runtime.migration] Running upgrade e3278ee65050, 15e43b934f81 -> c6c112992c9, rbac_qos_policy
INFO [alembic.runtime.migration] Running upgrade c6c112992c9 -> 5ffceebfada, network_rbac_external
INFO [alembic.runtime.migration] Running upgrade 5ffceebfada, 0e66c5227a8a -> 4ffceebfcdc, standard_desc
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Running upgrade 11ba2d65c8de -> 2e89171ea204, Add baremetal channel-group and is_native
INFO [alembic.runtime.migration] Running upgrade 2e89171ea204 -> 13bd9ebffbf5, Add support for UCSM Service Profile Templates.
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Running upgrade -> b89a299e19f9, Initial odl db, branchpoint
INFO [alembic.runtime.migration] Running upgrade b89a299e19f9 -> 383acb0d38a0, Start of odl contract branch
INFO [alembic.runtime.migration] Running upgrade b89a299e19f9 -> 247501328046, Start of odl expand branch
INFO [alembic.runtime.migration] Running upgrade 247501328046 -> 37e242787ae5, Opendaylight Neutron mechanism driver refactor
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Context impl MySQLImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
INFO [alembic.runtime.migration] Running upgrade 3345facd0452 -> 4a408dd491c2, Addition of Name column to lbaas_members and lbaas_healthmonitors table
INFO [alembic.runtime.migration] Running upgrade 4a408dd491c2 -> 3426acbc12de, Add flavor id
INFO [alembic.runtime.migration] Running upgrade 3426acbc12de -> 6aee0434f911, independent pools
INFO [alembic.runtime.migration] Running upgrade 6aee0434f911 -> 3543deab1547, add_l7_tables
INFO [alembic.runtime.migration] Running upgrade 3543deab1547 -> 62deca5010cd, Add tenant-id index for L7 tables
Option \"verbose\" from group \"DEFAULT\" is deprecated for removal. Its value may be silently ignored in the future.
Option \"notification_driver\" from group \"DEFAULT\" is deprecated. Use option \"driver\" from group \"oslo_messaging_notifications\".
Option \"notification_topics\" from group \"DEFAULT\" is deprecated. Use option \"topics\" from group \"oslo_messaging_notifications\".
", "deploy_status_code": 1 }, "creation_time": "2016-06-29T04:10:35", "updated_time": "2016-06-29T04:48:26", "input_values": { "update_identifier": "", "deploy_identifier": 1467171750 }, "action": "CREATE", "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 1", "id": "fce68cc0-623a-4a77-b6bd-667a796e9419" }
It looks like the root cause is bug 1343905 - RabbitMQ is among the resources that failed to start in pacemaker, and there's no running_nodes in the cluster status:
[root@overcloud-controller-0 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@overcloud-controller-0' ...
[{nodes,[{disc,['rabbit@overcloud-controller-0',
'rabbit@overcloud-controller-1']}]},
{alarms,[]}]
The cause could be not recent enough resource-agents package on the environment.
[root@overcloud-controller-0 ~]# rpm -q resource-agents
resource-agents-3.9.5-76.el7.x86_64
To pull in the latest fixes that allow RabbitMQ to reform the cluster properly, we need resource-agents-3.9.5-80.el7.x86_64 or newer.