The OSP 10 -> 11 upgrade fails when keystone is running in a separate node, console outputs can be found at [1]. - Checking overcloud failures: $ openstack stack failures list overcloud overcloud.AllNodesDeploySteps.ControllerUpgrade_Step0.2: resource_type: OS::Heat::SoftwareDeployment physical_resource_id: 3594da32-b7ca-4671-8821-df29642f4296 status: CREATE_FAILED status_reason: | Error: resources[2]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... TASK [Check if gnocchi_statsd is deployed] ************************************* changed: [localhost] TASK [PreUpgrade step0,validation: Check service openstack-gnocchi-statsd is running] *** fatal: [localhost]: FAILED! => {"changed": true, "cmd": "/usr/bin/systemctl show 'openstack-gnocchi-statsd' --property ActiveState | grep '\\bactive\\b'", "delta": "0:00:00.007118", "end": "2017-05-02 16:30:53.660109", "failed": true, "rc": 1, "start": "2017-05-02 16:30:53.652991", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []} to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/90e57c29-f7c1-466f-8f9d-b5bc4febc04c_playbook.retry PLAY RECAP ********************************************************************* localhost : ok=41 changed=37 unreachable=0 failed=1 (truncated, view all with --long) deploy_stderr: | - /var/log/gnocchi/statsd.log displays the following: http://paste.openstack.org/show/608617/ Manually restarting "openstack-gnocchi-statsd" works after the failure, might be the case where a ansible step is needed to restart statsd. [1] https://rhos-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/RHOS/view/RHOS11/job/qe-DFG-security-poc-upgrades-10-11-rhel-7.3-virt-3cont_3keystone_1comp-ipv4-vxlan-ceph-ussl-yes-ossl-no/3/consoleFull
I see 2 scenarios : Scenario 001 ============ From what I've seen in your logs, it's failing at this command: /usr/bin/systemctl show 'openstack-gnocchi-statsd' --property ActiveState | grep '\bactive\b' Which clearly means Gnocchi Statsd wasn't running well before the upgrade. Could you confirm that the service was running well before? Could you provide all Gnocchi logs from /var/log/gnocchi? Even sosreport would be super useful. Scenario 002 ============ It's possible that during the upgrade process: 1) httpd is stopped on an host (where Keystone is running in WSGI) 2) Gnocchi Statsd is started on another host in the same same, and can't reach Keystone endpoints on the other nodes because httpd wasn't started yet. It could be a race condition in the upgrade process if both task happen in the same upgrade step. Or it could be just an adjustment to make to the steps.
Created attachment 1275726 [details] /var/log/gnocchi/statsd.log file Here are the complete logs for statsd. After the failure, the logs show everything working after the service was manually restarted. Also, the service seemed to be receiving errors while trying to authenticate against keystone before the error in the object-store endpoint.
Looking at the logs: 2017-05-02 14:29:21.891 96273 ERROR gnocchi ClientException: Endpoint for object-store not found - have you specified a region? It's pretty clear that Gnocchi Statsd wasn't working well before the upgrade process.
After little investigation, this is not a bug in the upgrade but in OSP10. When deploy Keystone and Gnocchi Statsd on 2 different nodes, there is a race condition in the deployment where Gnocchi could be started before Keystone endpoints are created, within step 5. See: https://github.com/openstack/puppet-tripleo/blob/stable/newton/manifests/profile/base/gnocchi/statsd.pp#L31-L35 https://github.com/openstack/puppet-tripleo/blob/stable/newton/manifests/profile/base/keystone.pp#L128 That's why when you restarted Gnocchi Statsd, it worked well afterward. Note: the bug has been fixed in OSP11, since we now manage Keystone resources at step 3: https://github.com/openstack/puppet-tripleo/blob/stable/ocata/manifests/profile/base/keystone.pp#L210 In other words: 1) OSP10 has a bug where Gnocchi Statsd doesn't work when Keystone is not colocated. 2) OSP11 fails to upgrade when Gnocchi Statsd and Keystone are not colocated because of 1).
The upstream fix in gnocchi is under review. Need to backport to 3.0 branch once it merges.
For what it's worth, I was able to hit this same exact issue with an upgrade I was attempting on the Manila side of things. This was an Infrared deployment with 3 controllers and 2 compute nodes with the only extra configuration being from setting up the NetApp cDOT Manila Driver during the OSP-10 Overcloud deployment. I'll see if applying the patch referenced in Gerrit fixes the issues that I'm seeing in Gnocchi and allows the upgrade to succeed.
According to our records, this should be resolved by openstack-gnocchi-3.0.14-1.el7ost. This build is available now.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3230