Bug 1462307

Summary: Failed create Instance under load: Remote error: DBAPIError (exceptions.RuntimeError)
Product: Red Hat OpenStack Reporter: Yuri Obshansky <yobshans>
Component: openstack-novaAssignee: Eoghan Glynn <eglynn>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Joe H. Rahme <jhakimra>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 11.0 (Ocata)CC: berrange, dasmith, eglynn, kchamart, mwitt, sbauza, sferdjao, sgordon, srevivo, vromanso
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-30 14:26:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
controller nova-conductor.log
none
controller mysqld.log none

Description Yuri Obshansky 2017-06-16 17:08:49 UTC
Description of problem:
Failed create several instances under REST API load test. 
Message:
"Remote error: DBAPIError (exceptions.RuntimeError) reentrant call inside <_io.BufferedReader name=5> [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 155, in _process_incoming\n re"
Details:
"File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 866, in schedule_and_build_instances request_specs[0].to_legacy_filter_properties_dict()) File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 597, in _schedule_instances hosts = self.scheduler_client.select_destinations(context, spec_obj) File "/usr/lib/python2.7/site-packages/nova/scheduler/utils.py", line 371, in wrapped return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 51, in select_destinations return self.queryclient.select_destinations(context, spec_obj) File "/usr/lib/python2.7/site-packages/nova/scheduler/client/__init__.py", line 37, in __run_method return getattr(self.instance, __name)(*args, **kwargs) File "/usr/lib/python2.7/site-packages/nova/scheduler/client/query.py", line 32, in select_destinations return self.scheduler_rpcapi.select_destinations(context, spec_obj) File "/usr/lib/python2.7/site-packages/nova/scheduler/rpcapi.py", line 129, in select_destinations return cctxt.call(ctxt, 'select_destinations', **msg_args) File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 169, in call retry=self.retry) File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 97, in _send timeout=timeout, retry=retry) File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 505, in send retry=retry) File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 496, in _send raise result"
nova-conductor.log has many errors (see attached file) 
Look for instance:
Name    perf-18-1-vm
ID    d967103b-8866-4e29-823a-8bfadd748d85
Status    Error
Created    June 15, 2017, 9:11 p.m.

Version-Release number of selected component (if applicable):
rhos-release 11   -p 2017-05-30.1
[root@overcloud-controller-0 log]# rpm -qa | grep nova
openstack-nova-novncproxy-15.0.3-3.el7ost.noarch
python-nova-15.0.3-3.el7ost.noarch
openstack-nova-cert-15.0.3-3.el7ost.noarch
openstack-nova-migration-15.0.3-3.el7ost.noarch
openstack-nova-api-15.0.3-3.el7ost.noarch
python-novaclient-7.1.0-1.el7ost.noarch
puppet-nova-10.4.0-5.el7ost.noarch
openstack-nova-compute-15.0.3-3.el7ost.noarch
openstack-nova-scheduler-15.0.3-3.el7ost.noarch
openstack-nova-console-15.0.3-3.el7ost.noarch
openstack-nova-common-15.0.3-3.el7ost.noarch
openstack-nova-conductor-15.0.3-3.el7ost.noarch
openstack-nova-placement-api-15.0.3-3.el7ost.noarch

How reproducible:
It could be reproduced by run the REST API performance test
which simulates load on Openstack with 20 concurrent threads.
Each thread create instance and perform actions:
01.NOVA.GET.Images
02.NOVA.GET.Flavors
03.NEUTRON.POST.Create.Network
04.NEUTRON.POST.Create.Subnet
05.NOVA.POST.Boot.Server
00.NOVA.GET.Server.Details
06.NOVA.POST.Pause.Server
07.NOVA.POST.Unpause.Server
08.NOVA.POST.Suspend.Server
09.NOVA.POST.Resume.Server
10.NOVA.POST.Soft.Reboot.Server
11.NOVA.POST.Hard.Reboot.Server
12.NOVA.POST.Stop.Server
13.NOVA.POST.Start.Server
14.NOVA.POST.Create.Image
15.NOVA.GET.Image.Id
16.NOVA.DELETE.Image
17.NOVA.DELETE.Server
18.NEUTRON.DELETE.Network

Steps to Reproduce:
1.
2.
3.

Actual results:
Several instances in state Error

Expected results:
No instances in state Error

Additional info:
Not happened in RHOS 9 and 10

Comment 1 Yuri Obshansky 2017-06-16 17:09:37 UTC
Created attachment 1288418 [details]
controller nova-conductor.log

Comment 2 Yuri Obshansky 2017-06-16 17:10:01 UTC
Created attachment 1288419 [details]
controller mysqld.log

Comment 3 melanie witt 2017-06-22 20:47:47 UTC
I just wanted to mention that we had an upstream bug relatively recently [1] where we had a significant increase in database connections starting in Newton and continuing to increase over Ocata and Pike. It might possibly be related to the performance issues you have been experiencing.

The fix for that is available starting in openstack-nova-15.0.5-1.el7ost.

We will discuss this BZ in more depth during our triage call tomorrow.

[1] https://bugs.launchpad.net/nova/+bug/1691545

Comment 4 melanie witt 2017-06-30 14:26:07 UTC
As discussed on the rh-openstack-dev@ thread [1], this is part of a larger problem with the database being overwhelmed during the load tests and we're going to need to re-run the tests and deep dive into investigating the root cause of the performance degradation in RHOSP 11.

[1] http://post-office.corp.redhat.com/archives/rh-openstack-dev/2017-June/msg00392.html