1487789 – Some resources in the opendaylightjournal table take several hours to be sent to ODL

Bug 1487789 - Some resources in the opendaylightjournal table take several hours to be sent to ODL

Summary: Some resources in the opendaylightjournal table take several hours to be sent...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	opendaylight
Sub Component:
Version:	12.0 (Pike)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	beta
Target Release:	12.0 (Pike)
Assignee:	Mike Kolesnik
QA Contact:	Itzik Brown
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-01 22:30 UTC by Sai Sindhur Malleni
Modified:	2017-10-29 07:52 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-10-29 07:52:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Sai Sindhur Malleni 2017-09-01 22:30:24 UTC

Description of problem:

The test environment is as follows:
3 OSP Controllers
3 Computes
3 ODLs clustered on OpenStack Controllers

We ran the Browbeat+Rally neutron test suite with concurrencies of 8,16 and 32 and times set to 500. What essentially happens is neutron resources such as ports, networks, routers etc are created 500 times with 8,16 or 32 resources being created in parallel based on concurrency. After creation the resources are deleted and the cycle is repeated over and over again.


Most perf measurement tools like Rally capture the time taken for the resource entry to be created in the neutron database after the create request is sent, and there is no guarantee that the resource exists and is created in the ML2 backend. With Lucas Gomes' help,  I was able to use [1] to profile the time taken for a resource entry in the opendaylightjournal to to be sent to ODL API and for it to return. This was done by looking at the time difference between the 'last_retried' and 'created_at'  timestamps in the opendaylightjournal table. My understanding is, when a journaling thread picks up an entry in the table for processing it updates the 'last_retried' field to the current timstamp and when the entry is sent to ODL API and the call returns successfully the entry is marked 'completed', and the 'created_at' timestamp is updated. So, the difference between these two timestamps gives us the time taken for the HTTP call to ODL to return.  I reran the Browbeat+Rally neutron test suite about 4 times with "concurrency" set to 8 and 16 and "time" set to 500. 

Our python script [2] gives us the average time taken for the different operations on the database and also prints the top 15 time consuming entries and their resource UUIDs. It also informs us of any failed entries in the journal

I also pulled in https://review.openstack.org/#/c/496125/ manually and set maintenance_interval to a large number so that the maintenance thread never runs during the test and set completed_rows_retention to -1 so  that I can average out values during the entire test run and no entries are deleted during that (I realize this could mean more rows to scan for the maintenance thread)

Observations:

    After the first run of the test suite, we see that the a maximum of 888 seconds was taken for an update port operation and the average for all operations was around 165s (https://gist.github.com/smalleni/fa05a627e29a50c68dc1d2368ba48061)
    After the second run of the test suite, we see that the maximum is now around 3400s for a create port and the average for all operations is around 950s (https://gist.github.com/smalleni/68102613516b6a3a387bc75ec5e85bbe)
    After the third run of the test suite, we see that the maximum is now around 8000s for an update port and the average for all operations is around 1500s (https://gist.github.com/smalleni/68102613516b6a3a387bc75ec5e85bbe)
    After the fourth run of the test suite, we see that the maximum is now 15700s for an update port and the average for all operations is around 2300s (https://gist.github.com/smalleni/28c4a855984cc23c5d58238639a48d1e)
    mysqld CPU usage during the entire test duration: https://snapshot.raintank.io/dashboard/snapshot/KLVyeojHtfX46pYI1YGNdIrhyeXE57FX
        We can see that it reaches a peak of around 43 cores on a 56 core machine and slowly goes back to normal when the test suite runs are finished
    MySQL slow query log: http://8.43.86.1:8088/smalleni/overcloud-controller-0-slow-profile.log

Since each test suite run did not take more than 1.5hrs, the maximum values for the 3rd and 4th iterations of the test suite indicate that entries from the previous run of test suite were still being processed when running the current iteration. Also, looking at the MySQL slow query log, it doesn't seem like any query takes longer than 15s. We are curious to find out as to why it takes so long for calls to ODL to return when the query time itself isn't *that* huge. Also, we use a default timeout of 10s in ml2_odl configuration so that should mean the that the HTTP call to ODL returns before 10s for a journal entry to be marked complete, unless I'm reading it wrong.  

Version-Release number of selected component (if applicable):
OSP12+ODL Carbon
python-networking-odl-11.0.0-0.20170806093629.2e78dca.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud with ODL
2. Run Browbeat+Rally stress tests
3. Profile the opendaylightjournal table using [1]

Actual results:
Some resources take several hours to create under heavy load

Expected results:
Resources should be created fairly quickly

Additional info:
https://gist.github.com/smalleni/8da8c75f1765c23d7706706126dd4c2d

Comment 1 Sai Sindhur Malleni 2017-09-01 22:31:04 UTC

[1]- https://gist.github.com/smalleni/8da8c75f1765c23d7706706126dd4c2d

Comment 2 Michel Peterson 2017-09-04 08:32:25 UTC

> I also pulled in https://review.openstack.org/#/c/496125/ manually and set 
> maintenance_interval to a large number so that the maintenance thread never runs 
> during the test and set completed_rows_retention to -1 so  that I can average out 
> values during the entire test run and no entries are deleted during that (I realize
> this could mean more rows to scan for the maintenance thread)

There are several problems with this approach, as you mentioned:

1) Since rows are not deleted, the operations quickly pile up and since there is no index on status then there is a performance impact on the other processes that need to happen for it to go to ODL.

2) You shouldn't put maintenance_interval at a high number, as cleanup is part of the process. One specific part of cleanup is to put PROCESSING rows that failed but the error was not properly handled in PENDING. You don't specify how big your maintenance_interval is, but what could be happening is that the some of your operations are left in PROCESSING and never reset back to PENDING until your maintenance runs and then it is processed successfully. Under this conditions you'd see a long time delta. 
   If you wanted rows not to be deleted, completed_rows_retention=-1 should be enough without the need to raise maintenance_interval. 

In regard to these two problems, I propose the following:

- re (1)*: Create a DB hook or an external process that would "archive" or log the time deltas but would allow them to be maintained (including their deletion)
- re (2): At least allow maintenance to run, just don't allow the deletion during the maintenance. 


* PREFERABLE

Comment 3 Nir Yechiel 2017-09-28 13:46:49 UTC

Sai,

Can you please review comment #2 above?

Thanks,
Nir

Comment 4 Sai Sindhur Malleni 2017-10-02 12:40:17 UTC

Nir,

I will get to this in the next round of testing de to begin in first week of November.

Comment 5 Mike Kolesnik 2017-10-29 07:52:06 UTC

Closing for now since we don't have the details, let's re-open if we have logs or something more concrete..

Note You need to log in before you can comment on or make changes to this bug.