Bug 1336478 - REST API Delete instances fails under load
Summary: REST API Delete instances fails under load
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 8.0 (Liberty)
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Eoghan Glynn
QA Contact: Prasanth Anbalagan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-16 15:09 UTC by Yuri Obshansky
Modified: 2019-09-09 15:32 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-01-08 13:37:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Horizon screenshot (71.77 KB, image/png)
2016-05-16 15:10 UTC, Yuri Obshansky
no flags Details
nova-api log (1.35 MB, text/plain)
2016-05-16 15:11 UTC, Yuri Obshansky
no flags Details
nova-scheduler log (4.47 MB, text/plain)
2016-05-16 15:13 UTC, Yuri Obshansky
no flags Details
Keystone conf file (67.59 KB, text/plain)
2016-05-22 08:25 UTC, Yuri Obshansky
no flags Details
Keystone log file (22.21 KB, application/x-gzip)
2016-05-22 08:25 UTC, Yuri Obshansky
no flags Details

Description Yuri Obshansky 2016-05-16 15:09:42 UTC
Description of problem:
During cleanup process to remove 5000 instances from stack 
156 REST API delete instance calls failed.
JMeter script to remove instances has 50 concurrent threads which 
connect to OpenStack and delete instances
Instances were stayed forever in weird status:
Status - Active, Task - Deleting, Power State - Running.
When you take a look in Horizon/Instances page 
all instances appeared in view and disappeared at the same time.
The same when you click the Next link.
(See attached screenshot).

Version-Release number of selected component (if applicable):
rhos-release 8 -p 2016-04-18.1

How reproducible:
Request ->
DELETE http://10.1.244.10:8774/v2.1/db3d1b5287554c5fb952e3e7744468e1/servers/827ead1e-3295-4efc-bd23-27d3ee203746
DELETE data:
Request Headers:
Connection: keep-alive
Content-type: application/json
Accept: application/json
X-Auth-Token: 80324db2f358443fa91bb87a4c1174ca
Content-Length: 0
Host: 10.1.244.10:8774
User-Agent: Apache-HttpClient/4.2.6 (java 1.5)

Response ->
{"computeFault": {"message": "Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.\n<class 'keystoneclient.exceptions.RequestTimeout'>", "code": 500}}

Errors in nova-api.log ->
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions [req-8a081bf5-b7f7-4dac-86bf-96155396545d 8f5fdf7162a94ee2bf4c1
a993eea7da7 4fe7f8fda59e411b8050afa7a343257f - - -] Unexpected exception in API method
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions Traceback (most recent call last):
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/api/openstack/ext
ensions.py", line 478, in wrapped
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return f(*args, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/api/openstack/compute/servers.py", line 912, in delete
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     self._delete(req.environ['nova.context'], req, id)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/api/openstack/compute/servers.py", line 767, in _delete
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     self.compute_api.delete(context, instance)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 234, in wrapped
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return func(self, context, target, *args, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 223, in inner
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return function(self, context, instance, *args, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 251, in _wrapped
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return fn(self, context, instance, *args, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 204, in inner
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return f(self, context, instance, *args, **kw)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 1935, in delete
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     self._delete_instance(context, instance)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 1925, in _delete_instance
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     task_state=task_states.DELETING)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", 
line 1726, in _delete
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     quotas.rollback()
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 204, in __exit__
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     six.reraise(self.type_, self.value, self.tb)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 1716, in _delete
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     self._local_delete(context, instance, bdms, delete_type, cb)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 1843, in _local_delete
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     instance)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 854, in deallocate_for_instance
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     self._delete_ports(neutron, instance, ports, raise_if_fail=True)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 818, in _delete_ports
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     neutron.delete_port(port)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 102, in with_params
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     ret = self.function(instance, *args, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 567, in delete_port
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return self.delete(self.port_path % (port))
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 289, in delete
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     headers=headers, params=params)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 270, in retry_request
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     headers=headers, params=params)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/cli
ent.py", line 200, in do_request
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     content_type=self.content_type())
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/neutronclient/client.py", line 306, in do_request
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return self.request(url, method, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/neutronclient/client.py", line 294, in request
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     resp = super(SessionClient, self).request(*args, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/keystoneclient/adapter.py", line 95, in request
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return self.session.request(url, method, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/keystoneclient/utils.py", line 337, in inner
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     return func(*args, **kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/keystoneclient/session.py", line 386, in request
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     resp = send(**kwargs)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/keystoneclient/session.py", line 427, in _send_request
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions     raise exceptions.RequestTimeout(msg)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions RequestTimeout: Request to http://172.17.0.10:9696/v2.0/ports/8928145c-8eec-426c-83c9-d8289e874111.json timed out (HTTP 408)
2016-05-15 07:35:40.698 14121 ERROR nova.api.openstack.extensions 

Steps to Reproduce:
1.Create huge amount of instances (in my case 5000)
2.Delete instances using many concurrent threads (in my case 50)
3.

Actual results:


Expected results:


Additional info:

Comment 2 Yuri Obshansky 2016-05-16 15:10:28 UTC
Created attachment 1157998 [details]
Horizon screenshot

Comment 3 Yuri Obshansky 2016-05-16 15:11:10 UTC
Created attachment 1157999 [details]
nova-api log

Comment 4 Yuri Obshansky 2016-05-16 15:12:39 UTC
Another error in nova-scheduler.log ->
2016-05-15 07:03:24.397 14465 ERROR oslo_db.api [req-15370471-51ab-4641-a424-d82a0e4b7bb6 - - - - -] DB error.
2016-05-15 07:03:24.397 14465 ERROR oslo_db.api Traceback (most recent call last):
2016-05-15 07:03:24.397 14465 ERROR oslo_db.api   File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 136, in wrapper
2016-05-15 07:03:24.397 14465 ERROR oslo_db.api     return f(*args, **kwargs)
2016-05-15 07:03:24.397 14465 ERROR oslo_db.api   File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 3782, i
n reservation_expire
2016-05-15 07:03:24.397 14465 ERROR oslo_db.api     reservation_query.soft_delete(synchronize_session=False)
2016-05-15 07:03:24.397 14465 ERROR oslo_db.api   File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/orm.py", line 32, in soft_delete

Comment 5 Yuri Obshansky 2016-05-16 15:13:36 UTC
Created attachment 1158000 [details]
nova-scheduler log

Comment 6 Yuri Obshansky 2016-05-16 15:15:07 UTC
Look at date 2016-05-15 only in log files

Comment 7 Matthew Booth 2016-05-20 10:23:05 UTC
At first glance this looks like a problem in keystone. It may simply be that keystone isn't keeping up with this request rate, in which case the immediate recommendation will be to go slower.

Could you please provide keystone logs covering the same period, and also the keystone configuration.

Comment 8 Matthew Booth 2016-05-20 14:24:47 UTC
After discussion within the team, could you please also provide some additional info:

* Number of api hosts
* Number of compute hosts
* Number of keystone hosts
* sosreport from a keystone host

I'm suspecting a scaling problem, so I need to get an idea of the scale of the deployment. If you can think of any other information relevant to deployment scale, please include that too.

Comment 9 Yuri Obshansky 2016-05-22 08:24:44 UTC
That is my fault, I had to provide environment configuration in the bug.
It is HA, baremetal deployment, 3 controller nodes and 6 compute nodes where each is 24 CPUs and 64 G RAM. Volume is 5000 instances
- image: cirros (12.6 MB)
- flavor: m1.nano (1VCPU, 64MB RAM, 1GB Disk)
I was not able remove instances in any way to clean up the environment.
Only reboot all nodes (controllers and computes) released the stacked instances.
As result, I cannot provide the sosreport right now.
The keystone conf and log files attached.

Comment 10 Yuri Obshansky 2016-05-22 08:25:18 UTC
Created attachment 1160273 [details]
Keystone conf file

Comment 11 Yuri Obshansky 2016-05-22 08:25:46 UTC
Created attachment 1160274 [details]
Keystone log file

Comment 12 Matthew Booth 2018-01-08 13:37:59 UTC
Clearing out old bugs.

From looking at the keystone logs it seems that the original supposition was correct. I think a RequestTimeout is an appropriate return code under these circumstances.


Note You need to log in before you can comment on or make changes to this bug.