Bug 1437554 - Unable to boot an instance in ospd11 DPDK environment[openvswitch 2.6.1]
Summary: Unable to boot an instance in ospd11 DPDK environment[openvswitch 2.6.1]
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Saravanan KR
QA Contact: Eyal Dannon
URL:
Whiteboard:
Depends On:
Blocks: 1408224
TreeView+ depends on / blocked
 
Reported: 2017-03-30 14:08 UTC by Eyal Dannon
Modified: 2017-06-28 06:08 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-28 06:08:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sosreport compute node (10.33 MB, application/x-xz)
2017-03-30 14:08 UTC, Eyal Dannon
no flags Details

Description Eyal Dannon 2017-03-30 14:08:22 UTC
Created attachment 1267575 [details]
sosreport compute node

Description of problem:
The environment has been minor updated and then major upgraded with the steps mentioned in the docs below:
Update: https://docs.google.com/document/d/1PUdFw3L_9J49jTjzkabfSaNOCxH8fPDrGkmYmCrmgbQ
Upgrade: https://docs.google.com/document/d/1IFJte2mjaOrvsbNNCVFMowMVFhIsKlG9g1sz8YRYGt0

Then fixes has been applied as mention at the bz: https://bugzilla.redhat.com/show_bug.cgi?id=1431556
SOSReport is attached

Version-Release number of selected component (if applicable):
OSPd 11
python-openvswitch-2.6.1-10.git20161206.el7fdp.noarch
openvswitch-2.6.1-13.git20161206.el7fdp.x86_64


How reproducible:
Always

Steps to Reproduce:
1. Update the env
2. Upgrade the env
3. Try to boot an instance

Actual results:
Instance won't boot

Expected results:
Instance boot successfully

Additional info:

Comment 1 Saravanan KR 2017-04-04 05:55:29 UTC
I did a quick check on the environment. Controller is failing on Step5 of puppet apply:

overcloud.AllNodesDeploySteps.ControllerDeployment_Step5.0:
  resource_type: OS::Heat::StructuredDeployment
    Error: ceilometer-upgrade --skip-metering-database returned 1 instead of one of [0]
    Error: /Stage[main]/Tripleo::Profile::Base::Ceilometer::Collector/Exec[ceilometer-db-upgrade]/returns: change from notrun to 0 failed: ceilometer-upgrade --skip-metering-database returned 1 instead of one of [0]
    Error: gnocchi-upgrade --config-file=/etc/gnocchi/gnocchi.conf returned 1 instead of one of [0]
    Error: /Stage[main]/Tripleo::Profile::Base::Gnocchi::Api/Exec[run gnocchi upgrade with storage]/returns: change from notrun to 0 failed: gnocchi-upgrade --config-file=/etc/gnocchi/gnocchi.conf 

ceilometer-upgrade is failing. This is the recent backport for this upgrade code addition - https://review.openstack.org/#/c/447735.

ceilometer-upgrade.log:
-----------------------
2017-04-03 21:58:14.376 84782 INFO ceilometer.cmd.storage [-] Skipping metering database upgrade
2017-04-03 21:58:16.830 84782 CRITICAL ceilometer [-] ClientException: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at
 [no address given] to inform them of the time this error occurred,
 and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>
 (HTTP 500)
2017-04-03 21:58:16.830 84782 ERROR ceilometer Traceback (most recent call last):
2017-04-03 21:58:16.830 84782 ERROR ceilometer   File "/usr/bin/ceilometer-upgrade", line 10, in <module>
2017-04-03 21:58:16.830 84782 ERROR ceilometer     sys.exit(upgrade())
2017-04-03 21:58:16.830 84782 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/ceilometer/cmd/storage.py", line 53, in upgrade
2017-04-03 21:58:16.830 84782 ERROR ceilometer     gnocchi_client.upgrade_resource_types(conf)
2017-04-03 21:58:16.830 84782 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/ceilometer/gnocchi_client.py", line 113, in upgrade_resource_types
2017-04-03 21:58:16.830 84782 ERROR ceilometer     gnocchi.resource_type.get(name=name)
2017-04-03 21:58:16.830 84782 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/gnocchiclient/v1/resource_type.py", line 44, in get
2017-04-03 21:58:16.830 84782 ERROR ceilometer     headers={'Content-Type': "application/json"}).json()
2017-04-03 21:58:16.830 84782 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/gnocchiclient/v1/base.py", line 37, in _get
2017-04-03 21:58:16.830 84782 ERROR ceilometer     return self.client.api.get(*args, **kwargs)
2017-04-03 21:58:16.830 84782 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/keystoneauth1/adapter.py", line 217, in get
2017-04-03 21:58:16.830 84782 ERROR ceilometer     return self.request(url, 'GET', **kwargs)
2017-04-03 21:58:16.830 84782 ERROR ceilometer   File "/usr/lib/python2.7/site-packages/gnocchiclient/client.py", line 38, in request
2017-04-03 21:58:16.830 84782 ERROR ceilometer     raise exceptions.from_response(resp, method)

Comment 2 Saravanan KR 2017-04-04 06:19:37 UTC
(In reply to Saravanan KR from comment #1)
> I did a quick check on the environment. Controller is failing on Step5 of
> puppet apply:
I am supposed to update - https://bugzilla.redhat.com/show_bug.cgi?id=1438608.

Comment 3 Assaf Muller 2017-04-05 12:53:06 UTC
Can you please paste the error when spawning a VM? Any root cause done to understand what is the issue?

Comment 4 Yariv 2017-04-13 10:06:41 UTC
Assaf

We are Verifying right now RHOS 10 upgrade to 11 with direct PASS OVS-2.5.14 to 2.6.10 

Workarounds decreased to few lines..
Once we will have SUCCESS we will retry and update

Comment 5 Eyal Dannon 2017-04-18 08:48:02 UTC
We have verified direct upgrade OSPd10 -> OSPd11.
Using the following updated guide:
https://gitlab.cee.redhat.com/mandreou/OSP10-OSP11-Upgrade/blob/master/README.md
with post-install.yaml: https://github.com/krsacme/tht-dpdk/blob/master/post-install-update.yaml

Thanks.

Comment 6 Vijay Chundury 2017-04-24 11:23:31 UTC
ANjali,
This BZ needs to be assigned to the engineer who fixed the selinux issue.
He would update the BZ with the right version info and QA can close it.
I think Eyal has already closed it.

Can you please re-assign to move this BZ to closure.

Regards
Vijay.

Comment 7 Raoul Scarazzini 2017-05-25 10:54:25 UTC
So today I hit the same exact problem described in #c1 and took the sosreports [1] of all the nodes of the overcloud.
The deployment is a composable one, and the machine that hit the error is overcloud-controller-0, so the sosreport to take a look at is sosreport-controller-0.localdomain-20170525102419.tar.xz.

[1] http://file.rdu.redhat.com/~rscarazz/BZ1437554/

Comment 8 Raoul Scarazzini 2017-05-25 10:58:13 UTC
I forgot to add that this issue is a race, I deployed several times on the same exact environment without hitting the issue, so I can't say how this is reproducible.

Comment 9 Saravanan KR 2017-05-25 11:10:40 UTC
(In reply to Raoul Scarazzini from comment #7)
> So today I hit the same exact problem described in #c1 and took the
> sosreports [1] of all the nodes of the overcloud.
> The deployment is a composable one, and the machine that hit the error is
> overcloud-controller-0, so the sosreport to take a look at is
> sosreport-controller-0.localdomain-20170525102419.tar.xz.
> 
> [1] http://file.rdu.redhat.com/~rscarazz/BZ1437554/

The comment c1 is wrongly posted on this BZ, whereas the comment c1 is supposed to be in https://bugzilla.redhat.com/show_bug.cgi?id=1438608

Comment 10 Raoul Scarazzini 2017-05-25 12:11:43 UTC
Oh I see, I reopened that bug. Thanks.

Comment 11 Saravanan KR 2017-06-07 09:59:50 UTC
Eyal,
Is there anything open on this BZ?

Comment 12 Eyal Dannon 2017-06-07 12:47:53 UTC
Hi Saravanan,
No from my point of view, this issue was fixed with selinux and socket directory.
Thanks.


Note You need to log in before you can comment on or make changes to this bug.