1320724 – Nova does not clean up deleted instances which severely impacts horizon performance

Bug 1320724 - Nova does not clean up deleted instances which severely impacts horizon performance

Summary: Nova does not clean up deleted instances which severely impacts horizon perfo...

Keywords:
Status:	CLOSED DUPLICATE of bug 1329414
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	python-django-horizon
Sub Component:
Version:	7.0 (Kilo)
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	8.0 (Liberty)
Assignee:	Itxaka
QA Contact:	Ido Ovadia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-03-23 20:09 UTC by Jon Jozwiak
Modified:	2023-09-14 03:23 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-04-22 08:21:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Database Cleanup Script (3.34 KB, application/x-shellscript) 2016-03-23 20:09 UTC, Jon Jozwiak	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	238204	0	None	MERGED	Reduce the default date range on Overview panel to 1 day	2020-12-07 08:29:10 UTC
Red Hat Issue Tracker	OSP-28596	0	None	None	None	2023-09-14 03:22:03 UTC

Description Jon Jozwiak 2016-03-23 20:09:12 UTC

Created attachment 1139744 [details]
Database Cleanup Script

Description of problem:
Nova has no process to clean up records for deleted instances from the database.  This will just continue to grow.  A side effect of this is that when logging into Horizon, login times also continue to grow slower.  With 45,000 deleted instance records in the Nova database it will take longer than 1 minute to login and this will result in HAProxy timeouts.  

Note it appears this is being addressed in Mitaka:
https://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/purge-deleted-instances-cmd.html

Version-Release number of selected component (if applicable):
RHEL OSP 7 (Deployed via Director)
openstack-nova-api-2015.1.2-13.el7ost.noarch
openstack-dashboard-2015.1.2-4.el7ost.noarch

How reproducible:
Not tested, but likely you can reproduce by creating and deleting 45,000 VMs.  Rally was used to generate and delete the VMs.  If you use Rally for benchmarking for a few weeks you'll likely run into this problem.  

Steps to Reproduce:
1. Time your login to Horizon 
2. Use Rally to gradually create/delete 45,000 VMs
3. Try to login to Horizon again and note the time (and potentially the Gateway Timeout from HAProxy)

Actual results:


Expected results:
Ideally the SQL queries used to calculate the data for the Overview screen would be tuned so they could still function with a database this size.  However, just removing deleted instances also works

Additional info:
I've attached a DB cleanup script used to reduce the instance count

Comment 2 Radomir Dopieralski 2016-04-01 11:33:51 UTC

I think this is a Horizon bug. Soft-deleted instances should have no effect on displaying information about existing instances, as the filtering would happen on the database side. It certainly shouldn't take more than a minute to filter 45k entries on a boolean field.

Comment 3 Radomir Dopieralski 2016-04-01 14:34:23 UTC

For the soft-deleted instances to negatively affect the performance of Horizon, it would have to specifically ask Nova to provide the list of deleted instances. If Horizon indeed does that, this is a bug that should be solved in Horizon. I'm re-categorizing this bug to Horizon, so that the developers can check for such queries.

Comment 4 Matthias Runge 2016-04-04 07:53:04 UTC

Which page are you visiting in Horizon, where you were spotting the timeout?

Comment 6 Itxaka 2016-04-18 13:51:09 UTC

Can confirm that with around 45000 deleted instances and accessing the admin->overview page, the request takes around 21 seconds in a local network (so no network delays that could affect the timing)

Main issue seems to come from a call that we do to novaclient.usage.list in which we ask a detailed vies, which includes all the deleted instances.


Unfortunately, the overview of the admin needs this data to provide a proper view.

There is a patch upstream for Neutron that allows to configure the overview range to 1 day, thus diminishing the issues that occurs with a large number of deleted instances.
https://review.openstack.org/#/c/238204/

Comment 7 Itxaka 2016-04-18 14:09:25 UTC

But in the end, this looks like its the intended output, a display of all instances, even deleted ones.

Comment 8 Itxaka 2016-04-22 08:21:56 UTC

Im closing this and following this issue on bz 1329414 that has opened the issue upstream.

*** This bug has been marked as a duplicate of bug 1329414 ***

Comment 9 Red Hat Bugzilla 2023-09-14 03:20:02 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.