Bug 848090 - thin server takes 100% cpu
thin server takes 100% cpu
Status: CLOSED WORKSFORME
Product: CloudForms Cloud Engine
Classification: Red Hat
Component: aeolus-conductor (Show other bugs)
1.0.0
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: Tzu-Mainn Chen
Dave Johnson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-14 10:53 EDT by Dave Johnson
Modified: 2012-08-22 11:35 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-08-22 11:35:55 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dave Johnson 2012-08-14 10:53:46 EDT
Description of problem:
============================
conductor test automation around quotas continues to encounter a issue when the underlying thin server takes 100% of the cpu.  I pointed eck at the issue and he has acknowledged the issue and identified the reason why (I'll let him fill in the details) but I believe there is still some question on how to properly resolve.
Comment 1 John Eckersberg 2012-08-14 15:32:54 EDT
This seems to have something to do with eager loading in activerecord.

At pools_controller.rb:80:

Pool.includes(:deployments, :instances, :quota, :catalogs).
          list_for_user(current_session, current_user, Privilege::VIEW).
          list(sort_column(Pool), sort_direction)

Somewhere inside of this chain of calls, the process goes cpu-bound.  It's kinda hard to debug because the stack is like 100+ frames deep.  From what I can tell, it's trying to build objects for all the eager loaded records, and wire up all the inter-object relationships for them.

If instead the eager loading is removed, and the code looks like this:

Pool.list_for_user(current_session, current_user, Privilege::VIEW).
     list(sort_column(Pool), sort_direction)

The requests return in a reasonable amount of time.

This is happening on some test systems that dajo is running automation against.
Comment 2 Scott Seago 2012-08-16 01:42:10 EDT
This was added with the following commit:

commit bc9ef2f278a7d96ce3b7c02e072f141a04c89d87
Author: Tzu-Mainn Chen <tzumainn@redhat.com>
Date:   Thu Mar 22 11:27:24 2012 -0400

    BZ 802571 added eager loading and other minor efficiency fixes

It seems we have dueling scalability concerns here. The eager loading was added to address certain scalability concerns, and it is causing others.

It may be worth revisiting the original use case that led to the eager loading -- with recent changes in permissions queries, etc, the eager loading may not be needed anyway.

I'm reassigning to Tzumainn, since he added the original eager loading bits here. Tzumainn --  you're not the right one to resolve this, we can work w/ Angus to find the right person.
Comment 3 Dave Johnson 2012-08-22 11:35:55 EDT
I have not been able to reproduce this here lately so closing this out until it shows itself again.

Note You need to log in before you can comment on or make changes to this bug.