Bug 1514595

Summary: Memory issue on appliance with Amazon provider
Product: Red Hat CloudForms Management Engine Reporter: Jerome Marc <jmarc>
Component: ProvidersAssignee: Ladislav Smola <lsmola>
Status: CLOSED CURRENTRELEASE QA Contact: Matouš Mojžíš <mmojzis>
Severity: high Docs Contact:
Priority: high    
Version: 5.9.0CC: bascar, cpelland, dmetzger, gblomqui, jcheal, jfrey, jhardy, jmarc, lsmola, mmojzis, obarenbo, simaishi
Target Milestone: GA   
Target Release: 5.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 5.9.0.11 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-06 15:17:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: AWS Target Upstream Version:
Attachments:
Description Flags
evm.log none

Description Jerome Marc 2017-11-17 20:30:28 UTC
Description of problem:
I have isolated each provider in its own zone in my lab deployment. The appliance for Amazon provider is swapping and requires a restart every few hours.

Version-Release number of selected component (if applicable):
5.9.0.9.20171115202245_7429f75

How reproducible:
Always

Steps to Reproduce:
1. Deploy CloudForms and add Amazon provider (C&U capture is on)
2. 
3.

Actual results:
The appliance starts to consume all memory and goes to swap.

Expected results:
No swap.

Additional info:

Comment 2 Jerome Marc 2017-11-17 20:35:52 UTC
Created attachment 1354332 [details]
evm.log

Comment 12 Greg Blomquist 2017-11-30 22:18:00 UTC
Ladas, can you get https://github.com/ManageIQ/manageiq/pull/16502 into a mergeable state?  Once that's merged, this BZ can be moved to POST.

Comment 13 Ladislav Smola 2017-12-01 11:03:45 UTC
After talking with Adam, https://github.com/ManageIQ/manageiq/pull/16432 is enough for this BZ. We might or might not merge the https://github.com/ManageIQ/manageiq/pull/16502 in the future.

Comment 14 Matouš Mojžíš 2018-01-25 16:47:33 UTC
Ladas,
how can I reproduce & verify this issue?
Thanks

Comment 15 Ladislav Smola 2018-01-25 18:55:14 UTC
So part of it was caused by the mem leak. Part of it by the MiqQueue issues. So the verification is just keeping the appliance running for some time(few days) and checking the memory is not rising.

Comment 16 Matouš Mojžíš 2018-02-12 10:20:37 UTC
So I had appliance running for four days and used memory was increased by 300MB. I think this should be okay?

Comment 17 Ladislav Smola 2018-02-12 11:05:11 UTC
Can you try for couple more days? Also make sure you test it with the latest memory leak fixes. https://bugzilla.redhat.com/show_bug.cgi?id=1535720

Comment 18 dmetzger 2018-02-12 13:07:15 UTC
What build were you testing with when you saw the 300Mb growth? Also, can you post a full log set from the appliance (or share the IP/creds)? 

Please test with 5.9.0.19 or 5.9.0.20

Comment 20 Matouš Mojžíš 2018-02-14 10:37:57 UTC
Memory used is same as yesterday. Also I am trying to reproduce it on region that is not often used. Shall I try it with busy region?

Comment 22 Matouš Mojžíš 2018-02-22 13:16:14 UTC
So, I ran appliance with busy ec2 region.
After first day memory usage bumped by 100MB and then it didn't increase anymore after 7 days.
So it is enough for verification?

Comment 23 dmetzger 2018-02-22 13:29:40 UTC
That's sufficient to verify the fix.

Comment 24 Matouš Mojžíš 2018-02-22 13:43:12 UTC
Verified in 5.9.0.20. Appliance with busy ec2 region was running for a week with no sign of memory leaks.