Created attachment 1263617[details]
sosreport controller-0
Description of problem:
OSP10 -> OSP11 upgrade fails when Ironic services are enabled on a monolithic controller deployment. The major-upgrade-composable-steps fails after 4h which is the stack update timeout which indicates that it's getting stuck. The cause for this is that during the upgrade process the openstack-nova-compute.service is trying to start on the controller nodes while the rabbitmq servers are down(pacemaker cluster is down).
Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-6.0.0-0.20170307170102.3134785.0rc2.el7ost.noarch
How reproducible:
100%
Steps to Reproduce:
1. Deploy OSP10 with monolithic controllers and Ironic overcloud services activated
2. Run OSP10->OSP11 upgrade
Actual results:
Deployment fails during major-upgrade-composable-steps, after 4h timeout.
Expected results:
Deployment doesn't get stuck.
Additional info:
Attaching sosreports.
Note: this could be the same issue as with BZ#1431988 but I reported it as a separate one because the manifestation is different (timeout vs failure) and the topology where it shows up is different.
Comment 2Sofer Athlan-Guyot
2017-03-24 12:22:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHEA-2017:1245
Created attachment 1263617 [details] sosreport controller-0 Description of problem: OSP10 -> OSP11 upgrade fails when Ironic services are enabled on a monolithic controller deployment. The major-upgrade-composable-steps fails after 4h which is the stack update timeout which indicates that it's getting stuck. The cause for this is that during the upgrade process the openstack-nova-compute.service is trying to start on the controller nodes while the rabbitmq servers are down(pacemaker cluster is down). Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-6.0.0-0.20170307170102.3134785.0rc2.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP10 with monolithic controllers and Ironic overcloud services activated 2. Run OSP10->OSP11 upgrade Actual results: Deployment fails during major-upgrade-composable-steps, after 4h timeout. Expected results: Deployment doesn't get stuck. Additional info: Attaching sosreports.