Bug 1396122

Summary: Keystone service not removed after Major Upgrade keystone-update Step
Product: Red Hat OpenStack Reporter: Randy Perryman <randy_perryman>
Component: rhosp-directorAssignee: Adriano Petrich <apetrich>
Status: CLOSED WORKSFORME QA Contact: Omri Hochman <ohochman>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: abeekhof, apetrich, arkady_kanevsky, cdevine, christopher_dearborn, dbecker, dcain, John_walsh, kasmith, kurt_hey, mandreou, mburns, michele, morazi, randy_perryman, rhel-osp-director-maint, smerrow, sreichar
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-18 16:19:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1305654    

Description Randy Perryman 2016-11-17 13:40:11 UTC
Completed the Major Update to run the Keystone Upgrade (-e ~/pilot/templates/overcloud/environments/major-upgrade-keystone-liberty-mitaka.yaml )  and when finished, found the following error on the controllers:

Failed Actions:
* openstack-keystone_start_0 on overcloud-controller-0 'not running' (7): call=893, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 16:16:51 2016', queued=0ms, exec=2068ms
* openstack-keystone_start_0 on overcloud-controller-1 'not running' (7): call=893, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 16:16:51 2016', queued=0ms, exec=2076ms
* openstack-keystone_start_0 on overcloud-controller-2 'not running' (7): call=884, status=complete, exitreason='none',
    last-rc-change='Wed Nov 16 16:16:51 2016', queued=0ms, exec=2086ms


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


Finished:
Step returns complete
heat resource-list shows everything COMPLETE
heat deployment-list show everything COMPLETE

As this step deletes the keystone service, this should not there.  I tried running this two times more to the same affect.  Also once I did a PCS resource delete everything cleared up and worked.  
Cluster passed Sanity Test.

Comment 1 Randy Perryman 2016-11-17 14:00:32 UTC
So to be accurate, 
1. Complete Major Upgrade Steps to beginning KeyStone Upgrade
2. Sanity still worked at this point
3. Ran overcloud ... with major-upgrade-keystone-liberty-mitaka.yaml
4. Ran to COMPLETE
5. All items said COMPLETE
6. pcs status showed the errors
7. Reran the step
8. Same results
9. Checked the Yaml, saw pcs resource delete keystone should happen
10. Ran that 
11. Pacemaker cleaned up in full
12. Sanity now passes.

Comment 2 Mike Orazi 2016-11-17 14:59:59 UTC
Wanted to add that this seemed to happen when the deployment hit a registration issue, which may have been related.

Comment 5 Adriano Petrich 2016-11-17 16:48:13 UTC
Just to rule out any low hanging weirdness could you run please and add the output if any. for all the controllers if you see any difference in anyone of them

ssh heat-admin@overcloud-controller-0 sudo pcs resource show "openstack-core-clone"

ssh heat-admin@overcloud-controller-0 sudo pcs resource show "openstack-keystone-clone"

Comment 6 Randy Perryman 2016-11-17 19:40:42 UTC
I cannot as the Test Cloud is no long available.

Comment 8 Adriano Petrich 2016-11-18 10:27:31 UTC
One possibility that I can see is in the center of this issue is this loop 

https://github.com/openstack/tripleo-heat-templates/blob/stable/mitaka/extraconfig/tasks/major_upgrade_pacemaker_migrations.sh#L49

I'll talk to people to see if someone can have an idea why that bit is failing

Comment 9 Adriano Petrich 2016-11-18 10:39:01 UTC
One possibility that we discussed was that somehow it didn't have a pacemaker service called openstack-keystone-clone in the beginning of the migration which would be very unusual. 

@randy am I safe to assume that we don't have the sos reports for that test cloud? 

it is going to be hard to pin this without them.

Comment 10 Mike Orazi 2016-11-18 16:19:20 UTC
I'm going to CLOSED_WORKSFORME after discussion with Randy.  There is CI in place that exercises this code path, but I want to ask Adriano to confirm a clean manual run one more time.  

@Adriano if you hit an issue with a manual run please reopen with additional details.  If not, please confirm the manual run has been completed without triggering the keystone issue described and clear the needinfo flag.

Thanks!

Comment 11 Adriano Petrich 2016-11-22 14:46:11 UTC
Just confirming it that I was not able to reproduce it.