Bug 1386719
Summary: | OSP9 to OSP10 upgrade pingtest fails. | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Marios Andreou <mandreou> |
Component: | openstack-tripleo-heat-templates | Assignee: | Marios Andreou <mandreou> |
Status: | CLOSED ERRATA | QA Contact: | Omri Hochman <ohochman> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 10.0 (Newton) | CC: | dbecker, jcoufal, jjoyce, jschluet, mburns, morazi, rhel-osp-director-maint |
Target Milestone: | rc | Keywords: | Triaged |
Target Release: | 10.0 (Newton) | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openstack-tripleo-heat-templates-5.0.0-1.3.el7ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-12-14 16:22:45 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1337794 | ||
Attachments: |
Description
Marios Andreou
2016-10-19 13:55:14 UTC
Created attachment 1212155 [details]
pingtest output
Created attachment 1216574 [details]
pingtest_output after controllers upgraded
Update, the description is slightly inaccurate because I filed the BZ at a later time to when it actually occurred. The description says that after converge I rebooted and then ran the pingtest. That is accurate, however, the pingtest issue first starts appearing after the controllers are upgraded successfully. So after UPDATE_COMPLETE, run pingtest and it fails as in the attachment. Created attachment 1216577 [details]
relevant journal messages from controller0
some more poking today. I discovered that swift services were down after the controllers are upgraded. We have a change in the hiera data that we use to determine which swift services to bringup. I opened a review at https://review.openstack.org/#/c/392680/ but it doesn't fully fix the problem (gets further, but still fails, attaching new log for this run) Created attachment 1216586 [details] pingtest after fixing swift, see comment #5 After some more poking today I suspect this may be related to overcloud password issues... once the swift services are back up and as you can see in the attachment from comment #6 the overcloud heat stack create fails for authorization... looking at controller0 logs I see from heat-engine log: 2016-11-03 10:34:44.513 7598 ERROR heat.engine.clients.keystoneclient [req-b4b24448-26c0-4618-9def-1edcc23eeb76 a3d2c3c619db4433a2da763bf966d7a3 f692f5e0499545028b7a0235d7480139 - - -] Domain admin client authentication failed and from keystone.log: 2016-11-03 10:34:44.510 11829 WARNING keystone.auth.plugins.core [req-dd2b7ce4-56d1-48e7-ad3d-99b86f2dda5a - - - - -] Could not find domain: Default 2016-11-03 10:34:44.511 11829 WARNING keystone.common.wsgi [req-dd2b7ce4-56d1-48e7-ad3d-99b86f2dda5a - - - - -] Authorization failed. The request you have made requires authentication. from 192.0.2.14 I am going to reset the environment and include the fix from BZ 1388930 (which is about the overcloud password changing) as well as the fix for the swift services and see if it reproduces then. Today I included the fixup for the overcloudrc issue (BZ 1388930) but have the same result. After controller upgrade (and with swift services now running) the pingtest fails exactly as attached from comment #6. I'll also attach some more logs, but seems to be an issue with heat<-->keystone and the admin domain Created attachment 1217111 [details] sanity check credentials are fixed with https://review.openstack.org/#/c/392593/ Created attachment 1217112 [details]
quite a bit of heat-engine.log scroll to end for domain admin auth failure
thanks to shardy got a possible lead on the heat domain auth failure described in previous comments... may be related to BZ 1388474 SO going to use this bug to fixup the swift services not being started like in comment #5 and the gerrit review linked above. It has merged both master and newton at https://review.openstack.org/#/c/393760/ so moving to POST I did *not* get a chance to continue debugging the heat domain issue from comment #7 but we should file a new BZ for that just pointing at the newton review rather than master Verified with openstack-tripleo-heat-templates-5.1.0-3.el7ost.noarch as part of QE verification we've added to our automation ping test to the overcloud workload and verified the it's reachable in between each of the upgrade steps . Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html |