Created attachment 1267575 [details] sosreport compute node Description of problem: The environment has been minor updated and then major upgraded with the steps mentioned in the docs below: Update: https://docs.google.com/document/d/1PUdFw3L_9J49jTjzkabfSaNOCxH8fPDrGkmYmCrmgbQ Upgrade: https://docs.google.com/document/d/1IFJte2mjaOrvsbNNCVFMowMVFhIsKlG9g1sz8YRYGt0 Then fixes has been applied as mention at the bz: https://bugzilla.redhat.com/show_bug.cgi?id=1431556 SOSReport is attached Version-Release number of selected component (if applicable): OSPd 11 python-openvswitch-2.6.1-10.git20161206.el7fdp.noarch openvswitch-2.6.1-13.git20161206.el7fdp.x86_64 How reproducible: Always Steps to Reproduce: 1. Update the env 2. Upgrade the env 3. Try to boot an instance Actual results: Instance won't boot Expected results: Instance boot successfully Additional info:
I did a quick check on the environment. Controller is failing on Step5 of puppet apply: overcloud.AllNodesDeploySteps.ControllerDeployment_Step5.0: resource_type: OS::Heat::StructuredDeployment Error: ceilometer-upgrade --skip-metering-database returned 1 instead of one of [0] Error: /Stage[main]/Tripleo::Profile::Base::Ceilometer::Collector/Exec[ceilometer-db-upgrade]/returns: change from notrun to 0 failed: ceilometer-upgrade --skip-metering-database returned 1 instead of one of [0] Error: gnocchi-upgrade --config-file=/etc/gnocchi/gnocchi.conf returned 1 instead of one of [0] Error: /Stage[main]/Tripleo::Profile::Base::Gnocchi::Api/Exec[run gnocchi upgrade with storage]/returns: change from notrun to 0 failed: gnocchi-upgrade --config-file=/etc/gnocchi/gnocchi.conf ceilometer-upgrade is failing. This is the recent backport for this upgrade code addition - https://review.openstack.org/#/c/447735. ceilometer-upgrade.log: ----------------------- 2017-04-03 21:58:14.376 84782 INFO ceilometer.cmd.storage [-] Skipping metering database upgrade 2017-04-03 21:58:16.830 84782 CRITICAL ceilometer [-] ClientException: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>500 Internal Server Error</title> </head><body> <h1>Internal Server Error</h1> <p>The server encountered an internal error or misconfiguration and was unable to complete your request.</p> <p>Please contact the server administrator at [no address given] to inform them of the time this error occurred, and the actions you performed just before this error.</p> <p>More information about this error may be available in the server error log.</p> </body></html> (HTTP 500) 2017-04-03 21:58:16.830 84782 ERROR ceilometer Traceback (most recent call last): 2017-04-03 21:58:16.830 84782 ERROR ceilometer File "/usr/bin/ceilometer-upgrade", line 10, in <module> 2017-04-03 21:58:16.830 84782 ERROR ceilometer sys.exit(upgrade()) 2017-04-03 21:58:16.830 84782 ERROR ceilometer File "/usr/lib/python2.7/site-packages/ceilometer/cmd/storage.py", line 53, in upgrade 2017-04-03 21:58:16.830 84782 ERROR ceilometer gnocchi_client.upgrade_resource_types(conf) 2017-04-03 21:58:16.830 84782 ERROR ceilometer File "/usr/lib/python2.7/site-packages/ceilometer/gnocchi_client.py", line 113, in upgrade_resource_types 2017-04-03 21:58:16.830 84782 ERROR ceilometer gnocchi.resource_type.get(name=name) 2017-04-03 21:58:16.830 84782 ERROR ceilometer File "/usr/lib/python2.7/site-packages/gnocchiclient/v1/resource_type.py", line 44, in get 2017-04-03 21:58:16.830 84782 ERROR ceilometer headers={'Content-Type': "application/json"}).json() 2017-04-03 21:58:16.830 84782 ERROR ceilometer File "/usr/lib/python2.7/site-packages/gnocchiclient/v1/base.py", line 37, in _get 2017-04-03 21:58:16.830 84782 ERROR ceilometer return self.client.api.get(*args, **kwargs) 2017-04-03 21:58:16.830 84782 ERROR ceilometer File "/usr/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 217, in get 2017-04-03 21:58:16.830 84782 ERROR ceilometer return self.request(url, 'GET', **kwargs) 2017-04-03 21:58:16.830 84782 ERROR ceilometer File "/usr/lib/python2.7/site-packages/gnocchiclient/client.py", line 38, in request 2017-04-03 21:58:16.830 84782 ERROR ceilometer raise exceptions.from_response(resp, method)
(In reply to Saravanan KR from comment #1) > I did a quick check on the environment. Controller is failing on Step5 of > puppet apply: I am supposed to update - https://bugzilla.redhat.com/show_bug.cgi?id=1438608.
Can you please paste the error when spawning a VM? Any root cause done to understand what is the issue?
Assaf We are Verifying right now RHOS 10 upgrade to 11 with direct PASS OVS-2.5.14 to 2.6.10 Workarounds decreased to few lines.. Once we will have SUCCESS we will retry and update
We have verified direct upgrade OSPd10 -> OSPd11. Using the following updated guide: https://gitlab.cee.redhat.com/mandreou/OSP10-OSP11-Upgrade/blob/master/README.md with post-install.yaml: https://github.com/krsacme/tht-dpdk/blob/master/post-install-update.yaml Thanks.
ANjali, This BZ needs to be assigned to the engineer who fixed the selinux issue. He would update the BZ with the right version info and QA can close it. I think Eyal has already closed it. Can you please re-assign to move this BZ to closure. Regards Vijay.
So today I hit the same exact problem described in #c1 and took the sosreports [1] of all the nodes of the overcloud. The deployment is a composable one, and the machine that hit the error is overcloud-controller-0, so the sosreport to take a look at is sosreport-controller-0.localdomain-20170525102419.tar.xz. [1] http://file.rdu.redhat.com/~rscarazz/BZ1437554/
I forgot to add that this issue is a race, I deployed several times on the same exact environment without hitting the issue, so I can't say how this is reproducible.
(In reply to Raoul Scarazzini from comment #7) > So today I hit the same exact problem described in #c1 and took the > sosreports [1] of all the nodes of the overcloud. > The deployment is a composable one, and the machine that hit the error is > overcloud-controller-0, so the sosreport to take a look at is > sosreport-controller-0.localdomain-20170525102419.tar.xz. > > [1] http://file.rdu.redhat.com/~rscarazz/BZ1437554/ The comment c1 is wrongly posted on this BZ, whereas the comment c1 is supposed to be in https://bugzilla.redhat.com/show_bug.cgi?id=1438608
Oh I see, I reopened that bug. Thanks.
Eyal, Is there anything open on this BZ?
Hi Saravanan, No from my point of view, this issue was fixed with selinux and socket directory. Thanks.