Bug 1396122 - Keystone service not removed after Major Upgrade keystone-update Step
Summary: Keystone service not removed after Major Upgrade keystone-update Step
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Adriano Petrich
QA Contact: Omri Hochman
URL:
Whiteboard:
Depends On:
Blocks: 1305654
TreeView+ depends on / blocked
 
Reported: 2016-11-17 13:40 UTC by Randy Perryman
Modified: 2016-12-29 16:57 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-18 16:19:20 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Randy Perryman 2016-11-17 13:40:11 UTC
Completed the Major Update to run the Keystone Upgrade (-e ~/pilot/templates/overcloud/environments/major-upgrade-keystone-liberty-mitaka.yaml )  and when finished, found the following error on the controllers:

Failed Actions:
* openstack-keystone_start_0 on overcloud-controller-0 'not running' (7): call=893, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 16:16:51 2016', queued=0ms, exec=2068ms
* openstack-keystone_start_0 on overcloud-controller-1 'not running' (7): call=893, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 16:16:51 2016', queued=0ms, exec=2076ms
* openstack-keystone_start_0 on overcloud-controller-2 'not running' (7): call=884, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 16:16:51 2016', queued=0ms, exec=2086ms


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


Finished:
Step returns complete
heat resource-list shows everything COMPLETE
heat deployment-list show everything COMPLETE

As this step deletes the keystone service, this should not there.  I tried running this two times more to the same affect.  Also once I did a PCS resource delete everything cleared up and worked.  
Cluster passed Sanity Test.

Comment 1 Randy Perryman 2016-11-17 14:00:32 UTC
So to be accurate, 
1. Complete Major Upgrade Steps to beginning KeyStone Upgrade
2. Sanity still worked at this point
3. Ran overcloud ... with major-upgrade-keystone-liberty-mitaka.yaml
4. Ran to COMPLETE
5. All items said COMPLETE
6. pcs status showed the errors
7. Reran the step
8. Same results
9. Checked the Yaml, saw pcs resource delete keystone should happen
10. Ran that 
11. Pacemaker cleaned up in full
12. Sanity now passes.

Comment 2 Mike Orazi 2016-11-17 14:59:59 UTC
Wanted to add that this seemed to happen when the deployment hit a registration issue, which may have been related.

Comment 5 Adriano Petrich 2016-11-17 16:48:13 UTC
Just to rule out any low hanging weirdness could you run please and add the output if any. for all the controllers if you see any difference in anyone of them

ssh heat-admin@overcloud-controller-0 sudo pcs resource show "openstack-core-clone"

ssh heat-admin@overcloud-controller-0 sudo pcs resource show "openstack-keystone-clone"

Comment 6 Randy Perryman 2016-11-17 19:40:42 UTC
I cannot as the Test Cloud is no long available.

Comment 8 Adriano Petrich 2016-11-18 10:27:31 UTC
One possibility that I can see is in the center of this issue is this loop 

https://github.com/openstack/tripleo-heat-templates/blob/stable/mitaka/extraconfig/tasks/major_upgrade_pacemaker_migrations.sh#L49

I'll talk to people to see if someone can have an idea why that bit is failing

Comment 9 Adriano Petrich 2016-11-18 10:39:01 UTC
One possibility that we discussed was that somehow it didn't have a pacemaker service called openstack-keystone-clone in the beginning of the migration which would be very unusual. 

@randy am I safe to assume that we don't have the sos reports for that test cloud? 

it is going to be hard to pin this without them.

Comment 10 Mike Orazi 2016-11-18 16:19:20 UTC
I'm going to CLOSED_WORKSFORME after discussion with Randy.  There is CI in place that exercises this code path, but I want to ask Adriano to confirm a clean manual run one more time.  

@Adriano if you hit an issue with a manual run please reopen with additional details.  If not, please confirm the manual run has been completed without triggering the keystone issue described and clear the needinfo flag.

Thanks!

Comment 11 Adriano Petrich 2016-11-22 14:46:11 UTC
Just confirming it that I was not able to reproduce it.


Note You need to log in before you can comment on or make changes to this bug.