Bug 1514595

Summary:

Memory issue on appliance with Amazon provider

Product:

Red Hat CloudForms Management Engine

Reporter:

Jerome Marc <jmarc>

Component:

Providers

Assignee:

Ladislav Smola <lsmola>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Matouš Mojžíš <mmojzis>

Severity:

high

Docs Contact:

Priority:

high

Version:

5.9.0

CC:

bascar, cpelland, dmetzger, gblomqui, jcheal, jfrey, jhardy, jmarc, lsmola, mmojzis, obarenbo, simaishi

Target Milestone:

Target Release:

5.9.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

5.9.0.11

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-03-06 15:17:42 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

AWS

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
evm.log	none

Description Jerome Marc 2017-11-17 20:30:28 UTC

Description of problem:
I have isolated each provider in its own zone in my lab deployment. The appliance for Amazon provider is swapping and requires a restart every few hours.

Version-Release number of selected component (if applicable):
5.9.0.9.20171115202245_7429f75

How reproducible:
Always

Steps to Reproduce:
1. Deploy CloudForms and add Amazon provider (C&U capture is on)
2. 
3.

Actual results:
The appliance starts to consume all memory and goes to swap.

Expected results:
No swap.

Additional info:

Comment 2 Jerome Marc 2017-11-17 20:35:52 UTC

Created attachment 1354332 [details]
evm.log

Comment 12 Greg Blomquist 2017-11-30 22:18:00 UTC

Ladas, can you get https://github.com/ManageIQ/manageiq/pull/16502 into a mergeable state?  Once that's merged, this BZ can be moved to POST.

Comment 13 Ladislav Smola 2017-12-01 11:03:45 UTC

After talking with Adam, https://github.com/ManageIQ/manageiq/pull/16432 is enough for this BZ. We might or might not merge the https://github.com/ManageIQ/manageiq/pull/16502 in the future.

Comment 14 Matouš Mojžíš 2018-01-25 16:47:33 UTC

Ladas,
how can I reproduce & verify this issue?
Thanks

Comment 15 Ladislav Smola 2018-01-25 18:55:14 UTC

So part of it was caused by the mem leak. Part of it by the MiqQueue issues. So the verification is just keeping the appliance running for some time(few days) and checking the memory is not rising.

Comment 16 Matouš Mojžíš 2018-02-12 10:20:37 UTC

So I had appliance running for four days and used memory was increased by 300MB. I think this should be okay?

Comment 17 Ladislav Smola 2018-02-12 11:05:11 UTC

Can you try for couple more days? Also make sure you test it with the latest memory leak fixes. https://bugzilla.redhat.com/show_bug.cgi?id=1535720

Comment 18 dmetzger 2018-02-12 13:07:15 UTC

What build were you testing with when you saw the 300Mb growth? Also, can you post a full log set from the appliance (or share the IP/creds)? 

Please test with 5.9.0.19 or 5.9.0.20

Comment 20 Matouš Mojžíš 2018-02-14 10:37:57 UTC

Memory used is same as yesterday. Also I am trying to reproduce it on region that is not often used. Shall I try it with busy region?

Comment 22 Matouš Mojžíš 2018-02-22 13:16:14 UTC

So, I ran appliance with busy ec2 region.
After first day memory usage bumped by 100MB and then it didn't increase anymore after 7 days.
So it is enough for verification?

Comment 23 dmetzger 2018-02-22 13:29:40 UTC

That's sufficient to verify the fix.

Comment 24 Matouš Mojžíš 2018-02-22 13:43:12 UTC

Verified in 5.9.0.20. Appliance with busy ec2 region was running for a week with no sign of memory leaks.