Bug 1514595 - Memory issue on appliance with Amazon provider
Summary: Memory issue on appliance with Amazon provider
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: 5.9.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.9.0
Assignee: Ladislav Smola
QA Contact: Matouš Mojžíš
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-17 20:30 UTC by Jerome Marc
Modified: 2018-06-20 13:26 UTC (History)
12 users (show)

Fixed In Version: 5.9.0.11
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-06 15:17:42 UTC
Category: ---
Cloudforms Team: AWS
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
evm.log (4.38 MB, application/x-gzip)
2017-11-17 20:35 UTC, Jerome Marc
no flags Details

Description Jerome Marc 2017-11-17 20:30:28 UTC
Description of problem:
I have isolated each provider in its own zone in my lab deployment. The appliance for Amazon provider is swapping and requires a restart every few hours.

Version-Release number of selected component (if applicable):
5.9.0.9.20171115202245_7429f75

How reproducible:
Always

Steps to Reproduce:
1. Deploy CloudForms and add Amazon provider (C&U capture is on)
2. 
3.

Actual results:
The appliance starts to consume all memory and goes to swap.

Expected results:
No swap.

Additional info:

Comment 2 Jerome Marc 2017-11-17 20:35:52 UTC
Created attachment 1354332 [details]
evm.log

Comment 12 Greg Blomquist 2017-11-30 22:18:00 UTC
Ladas, can you get https://github.com/ManageIQ/manageiq/pull/16502 into a mergeable state?  Once that's merged, this BZ can be moved to POST.

Comment 13 Ladislav Smola 2017-12-01 11:03:45 UTC
After talking with Adam, https://github.com/ManageIQ/manageiq/pull/16432 is enough for this BZ. We might or might not merge the https://github.com/ManageIQ/manageiq/pull/16502 in the future.

Comment 14 Matouš Mojžíš 2018-01-25 16:47:33 UTC
Ladas,
how can I reproduce & verify this issue?
Thanks

Comment 15 Ladislav Smola 2018-01-25 18:55:14 UTC
So part of it was caused by the mem leak. Part of it by the MiqQueue issues. So the verification is just keeping the appliance running for some time(few days) and checking the memory is not rising.

Comment 16 Matouš Mojžíš 2018-02-12 10:20:37 UTC
So I had appliance running for four days and used memory was increased by 300MB. I think this should be okay?

Comment 17 Ladislav Smola 2018-02-12 11:05:11 UTC
Can you try for couple more days? Also make sure you test it with the latest memory leak fixes. https://bugzilla.redhat.com/show_bug.cgi?id=1535720

Comment 18 dmetzger 2018-02-12 13:07:15 UTC
What build were you testing with when you saw the 300Mb growth? Also, can you post a full log set from the appliance (or share the IP/creds)? 

Please test with 5.9.0.19 or 5.9.0.20

Comment 20 Matouš Mojžíš 2018-02-14 10:37:57 UTC
Memory used is same as yesterday. Also I am trying to reproduce it on region that is not often used. Shall I try it with busy region?

Comment 22 Matouš Mojžíš 2018-02-22 13:16:14 UTC
So, I ran appliance with busy ec2 region.
After first day memory usage bumped by 100MB and then it didn't increase anymore after 7 days.
So it is enough for verification?

Comment 23 dmetzger 2018-02-22 13:29:40 UTC
That's sufficient to verify the fix.

Comment 24 Matouš Mojžíš 2018-02-22 13:43:12 UTC
Verified in 5.9.0.20. Appliance with busy ec2 region was running for a week with no sign of memory leaks.


Note You need to log in before you can comment on or make changes to this bug.