Hide Forgot
Completed the Major Update to run the Keystone Upgrade (-e ~/pilot/templates/overcloud/environments/major-upgrade-keystone-liberty-mitaka.yaml ) and when finished, found the following error on the controllers: Failed Actions: * openstack-keystone_start_0 on overcloud-controller-0 'not running' (7): call=893, status=complete, exitreason='none', last-rc-change='Wed Nov 16 16:16:51 2016', queued=0ms, exec=2068ms * openstack-keystone_start_0 on overcloud-controller-1 'not running' (7): call=893, status=complete, exitreason='none', last-rc-change='Wed Nov 16 16:16:51 2016', queued=0ms, exec=2076ms * openstack-keystone_start_0 on overcloud-controller-2 'not running' (7): call=884, status=complete, exitreason='none', last-rc-change='Wed Nov 16 16:16:51 2016', queued=0ms, exec=2086ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled Finished: Step returns complete heat resource-list shows everything COMPLETE heat deployment-list show everything COMPLETE As this step deletes the keystone service, this should not there. I tried running this two times more to the same affect. Also once I did a PCS resource delete everything cleared up and worked. Cluster passed Sanity Test.
So to be accurate, 1. Complete Major Upgrade Steps to beginning KeyStone Upgrade 2. Sanity still worked at this point 3. Ran overcloud ... with major-upgrade-keystone-liberty-mitaka.yaml 4. Ran to COMPLETE 5. All items said COMPLETE 6. pcs status showed the errors 7. Reran the step 8. Same results 9. Checked the Yaml, saw pcs resource delete keystone should happen 10. Ran that 11. Pacemaker cleaned up in full 12. Sanity now passes.
Wanted to add that this seemed to happen when the deployment hit a registration issue, which may have been related.
Just to rule out any low hanging weirdness could you run please and add the output if any. for all the controllers if you see any difference in anyone of them ssh heat-admin@overcloud-controller-0 sudo pcs resource show "openstack-core-clone" ssh heat-admin@overcloud-controller-0 sudo pcs resource show "openstack-keystone-clone"
I cannot as the Test Cloud is no long available.
One possibility that I can see is in the center of this issue is this loop https://github.com/openstack/tripleo-heat-templates/blob/stable/mitaka/extraconfig/tasks/major_upgrade_pacemaker_migrations.sh#L49 I'll talk to people to see if someone can have an idea why that bit is failing
One possibility that we discussed was that somehow it didn't have a pacemaker service called openstack-keystone-clone in the beginning of the migration which would be very unusual. @randy am I safe to assume that we don't have the sos reports for that test cloud? it is going to be hard to pin this without them.
I'm going to CLOSED_WORKSFORME after discussion with Randy. There is CI in place that exercises this code path, but I want to ask Adriano to confirm a clean manual run one more time. @Adriano if you hit an issue with a manual run please reopen with additional details. If not, please confirm the manual run has been completed without triggering the keystone issue described and clear the needinfo flag. Thanks!
Just confirming it that I was not able to reproduce it.