Bug 1336478

Summary: REST API Delete instances fails under load
Product: Red Hat OpenStack Reporter: Yuri Obshansky <yobshans>
Component: openstack-novaAssignee: Eoghan Glynn <eglynn>
Status: CLOSED NOTABUG QA Contact: Prasanth Anbalagan <panbalag>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: berrange, dasmith, eglynn, kchamart, mbooth, sbauza, sferdjao, sgordon, srevivo, vromanso, yobshans
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-08 13:37:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Horizon screenshot
none
nova-api log
none
nova-scheduler log
none
Keystone conf file
none
Keystone log file none

Description Yuri Obshansky 2016-05-16 15:09:42 UTC
Description of problem:
During cleanup process to remove 5000 instances from stack 
156 REST API delete instance calls failed.
JMeter script to remove instances has 50 concurrent threads which 
connect to OpenStack and delete instances
Instances were stayed forever in weird status:
Status - Active, Task - Deleting, Power State - Running.
When you take a look in Horizon/Instances page 
all instances appeared in view and disappeared at the same time.
The same when you click the Next link.
(See attached screenshot).

Version-Release number of selected component (if applicable):
rhos-release 8 -p 2016-04-18.1

How reproducible:
Request ->
DELETE http://10.1.244.10:8774/v2.1/db3d1b5287554c5fb952e3e7744468e1/servers/827ead1e-3295-4efc-bd23-27d3ee203746
DELETE data:
Request Headers:
Connection: keep-alive
Content-type: application/json
Accept: application/json
X-Auth-Token: 80324db2f358443fa91bb87a4c1174ca
Content-Length: 0
Host: 10.1.244.10:8774
User-Agent: Apache-HttpClient/4.2.6 (java 1.5)

Response ->
{"computeFault": {"message": "Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.\n<class 'keystoneclient.exceptions.RequestTimeout'>", "code": 500}}

Errors in nova-api.log ->
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions [req-8a081bf5-b7f7-4dac-86bf-96155396545d 8f5fdf7162a94ee2bf4c1
a993eea7da7 4fe7f8fda59e411b8050afa7a343257f - - -] Unexpected exception in API method
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions Traceback (most recent call last):
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/api/openstack/ext
ensions.py", line 478, in wrapped
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return f(*args, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/api/openstack/compute/servers.py", line 912, in delete
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     self._delete(req.environ['nova.context'], req, id)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/api/openstack/compute/servers.py", line 767, in _delete
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     self.compute_api.delete(context, instance)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 234, in wrapped
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return func(self, context, target, *args, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 223, in inner
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return function(self, context, instance, *args, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 251, in _wrapped
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return fn(self, context, instance, *args, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 204, in inner
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return f(self, context, instance, *args, **kw)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 1935, in delete
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     self._delete_instance(context, instance)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 1925, in _delete_instance
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     task_state=task_states.DELETING)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", 
line 1726, in _delete
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     quotas.rollback()
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 204, in __exit__
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     six.reraise(self.type_, self.value, self.tb)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 1716, in _delete
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     self._local_delete(context, instance, bdms, delete_type, cb)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 1843, in _local_delete
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     instance)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 854, in deallocate_for_instance
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     self._delete_ports(neutron, instance, ports, raise_if_fail=True)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 818, in _delete_ports
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     neutron.delete_port(port)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 102, in with_params
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     ret = self.function(instance, *args, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 567, in delete_port
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return self.delete(self.port_path % (port))
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 289, in delete
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     headers=headers, params=params)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 270, in retry_request
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     headers=headers, params=params)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/cli
ent.py", line 200, in do_request
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     content_type=self.content_type())
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/neutronclient/client.py", line 306, in do_request
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return self.request(url, method, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/neutronclient/client.py", line 294, in request
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     resp = super(SessionClient, self).request(*args, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/keystoneclient/adapter.py", line 95, in request
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return self.session.request(url, method, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/keystoneclient/utils.py", line 337, in inner
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return func(*args, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/keystoneclient/session.py", line 386, in request
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     resp = send(**kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/keystoneclient/session.py", line 427, in _send_request
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     raise exceptions.RequestTimeout(msg)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions RequestTimeout: Request to http://172.17.0.10:9696/v2.0/ports/8928145c-8eec-426c-83c9-d8289e874111.json timed out (HTTP 408)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions 

Steps to Reproduce:
1.Create huge amount of instances (in my case 5000)
2.Delete instances using many concurrent threads (in my case 50)
3.

Actual results:


Expected results:


Additional info:

Comment 2 Yuri Obshansky 2016-05-16 15:10:28 UTC
Created attachment 1157998 [details]
Horizon screenshot

Comment 3 Yuri Obshansky 2016-05-16 15:11:10 UTC
Created attachment 1157999 [details]
nova-api log

Comment 4 Yuri Obshansky 2016-05-16 15:12:39 UTC
Another error in nova-scheduler.log ->
2016-05-15 07:03:24.397 14465 ERROR oslo_db.api [req-15370471-51ab-4641-a424-d82a0e4b7bb6 - - - - -] DB error.
2016-05-15 07:03:24.397 14465 ERROR oslo_db.api Traceback (most recent call last):
2016-05-15 07:03:24.397 14465 ERROR oslo_db.api   File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 136, in wrapper
2016-05-15 07:03:24.397 14465 ERROR oslo_db.api     return f(*args, **kwargs)
2016-05-15 07:03:24.397 14465 ERROR oslo_db.api   File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 3782, i
n reservation_expire
2016-05-15 07:03:24.397 14465 ERROR oslo_db.api     reservation_query.soft_delete(synchronize_session=False)
2016-05-15 07:03:24.397 14465 ERROR oslo_db.api   File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/orm.py", line 32, in soft_delete

Comment 5 Yuri Obshansky 2016-05-16 15:13:36 UTC
Created attachment 1158000 [details]
nova-scheduler log

Comment 6 Yuri Obshansky 2016-05-16 15:15:07 UTC
Look at date 2016-05-15 only in log files

Comment 7 Matthew Booth 2016-05-20 10:23:05 UTC
At first glance this looks like a problem in keystone. It may simply be that keystone isn't keeping up with this request rate, in which case the immediate recommendation will be to go slower.

Could you please provide keystone logs covering the same period, and also the keystone configuration.

Comment 8 Matthew Booth 2016-05-20 14:24:47 UTC
After discussion within the team, could you please also provide some additional info:

* Number of api hosts
* Number of compute hosts
* Number of keystone hosts
* sosreport from a keystone host

I'm suspecting a scaling problem, so I need to get an idea of the scale of the deployment. If you can think of any other information relevant to deployment scale, please include that too.

Comment 9 Yuri Obshansky 2016-05-22 08:24:44 UTC
That is my fault, I had to provide environment configuration in the bug.
It is HA, baremetal deployment, 3 controller nodes and 6 compute nodes where each is 24 CPUs and 64 G RAM. Volume is 5000 instances
- image: cirros (12.6 MB)
- flavor: m1.nano (1VCPU, 64MB RAM, 1GB Disk)
I was not able remove instances in any way to clean up the environment.
Only reboot all nodes (controllers and computes) released the stacked instances.
As result, I cannot provide the sosreport right now.
The keystone conf and log files attached.

Comment 10 Yuri Obshansky 2016-05-22 08:25:18 UTC
Created attachment 1160273 [details]
Keystone conf file

Comment 11 Yuri Obshansky 2016-05-22 08:25:46 UTC
Created attachment 1160274 [details]
Keystone log file

Comment 12 Matthew Booth 2018-01-08 13:37:59 UTC
Clearing out old bugs.

From looking at the keystone logs it seems that the original supposition was correct. I think a RequestTimeout is an appropriate return code under these circumstances.