Bug 848090

Summary: thin server takes 100% cpu
Product: [Retired] CloudForms Cloud Engine Reporter: Dave Johnson <dajohnso>
Component: aeolus-conductorAssignee: Tzu-Mainn Chen <tzumainn>
Status: CLOSED WORKSFORME QA Contact: Dave Johnson <dajohnso>
Severity: high Docs Contact:
Priority: high    
Version: 1.0.0CC: athomas, morazi
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-22 15:35:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dave Johnson 2012-08-14 14:53:46 UTC
Description of problem:
============================
conductor test automation around quotas continues to encounter a issue when the underlying thin server takes 100% of the cpu.  I pointed eck at the issue and he has acknowledged the issue and identified the reason why (I'll let him fill in the details) but I believe there is still some question on how to properly resolve.

Comment 1 John Eckersberg 2012-08-14 19:32:54 UTC
This seems to have something to do with eager loading in activerecord.

At pools_controller.rb:80:

Pool.includes(:deployments, :instances, :quota, :catalogs).
          list_for_user(current_session, current_user, Privilege::VIEW).
          list(sort_column(Pool), sort_direction)

Somewhere inside of this chain of calls, the process goes cpu-bound.  It's kinda hard to debug because the stack is like 100+ frames deep.  From what I can tell, it's trying to build objects for all the eager loaded records, and wire up all the inter-object relationships for them.

If instead the eager loading is removed, and the code looks like this:

Pool.list_for_user(current_session, current_user, Privilege::VIEW).
     list(sort_column(Pool), sort_direction)

The requests return in a reasonable amount of time.

This is happening on some test systems that dajo is running automation against.

Comment 2 Scott Seago 2012-08-16 05:42:10 UTC
This was added with the following commit:

commit bc9ef2f278a7d96ce3b7c02e072f141a04c89d87
Author: Tzu-Mainn Chen <tzumainn>
Date:   Thu Mar 22 11:27:24 2012 -0400

    BZ 802571 added eager loading and other minor efficiency fixes

It seems we have dueling scalability concerns here. The eager loading was added to address certain scalability concerns, and it is causing others.

It may be worth revisiting the original use case that led to the eager loading -- with recent changes in permissions queries, etc, the eager loading may not be needed anyway.

I'm reassigning to Tzumainn, since he added the original eager loading bits here. Tzumainn --  you're not the right one to resolve this, we can work w/ Angus to find the right person.

Comment 3 Dave Johnson 2012-08-22 15:35:55 UTC
I have not been able to reproduce this here lately so closing this out until it shows itself again.