Bug 1575562

Summary: Memory leak in ovirt-engine deployed as RHHI
Product: Red Hat Enterprise Virtualization Manager Reporter: Mauro Oddi <moddi>
Component: ovirt-engineAssignee: Sahina Bose <sabose>
Status: CLOSED INSUFFICIENT_DATA QA Contact: SATHEESARAN <sasundar>
Severity: medium Docs Contact:
Priority: high    
Version: 4.1.11CC: lsurette, mgoldboi, mkalinin, moddi, mperina, Rhev-m-bugs, rnori, sabose, sasundar, srevivo, ykaul
Target Milestone: ovirt-4.3.1Flags: lsvaty: testing_plan_complete-
Target Release: 4.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-11 10:38:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Gluster RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine.log after ParOldGen gets to 99% (RHV 4.1.11) before engine restart none

Description Mauro Oddi 2018-05-07 10:17:44 UTC
Description of problem:
RHHI infrastructure (RHV 4.1.11 + Gluster 3.3 ) starts to show increasing amounts of VDSNetworkException ERRORs in the engine.log until hosted_engine glusterfs Storage Domain fails and the engine is restated.

Analysis has shown there is no indicators a network issue or exhaustion. However  it was detected that the ParOldGen heap area gets to 99% after 10 days more or less. Increasing usage 9/10% a day.

When the use is close to 99% the aforementioned exceptions start to show up and the problem reproduces again.
The customer provided a heap dump for further analysis.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Yaniv Kaul 2018-05-08 02:49:00 UTC
Any logs?

Comment 12 Yaniv Kaul 2018-05-17 13:47:40 UTC
Ravi, any news?

Comment 13 Ravi Nori 2018-05-17 13:55:35 UTC
From the logs I see that in 8 hours time frame GlusterServersListVDSCommand and GlusterVolumesListVDSCommand is executed 9665 times each. Every three seconds there is an execution of the commands.

Apart from the above issue I don't see anything else in the logs or the thread dump. 

It looks like there is an issue with Gluster integration.

Comment 19 Mauro Oddi 2018-05-28 15:30:51 UTC
Created attachment 1443379 [details]
engine.log after ParOldGen gets to 99% (RHV 4.1.11) before engine restart

Comment 54 Sandro Bonazzola 2019-01-28 09:40:50 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 56 Yaniv Kaul 2019-01-28 10:26:45 UTC
What's the next step here?

Comment 57 Sahina Bose 2019-01-28 12:19:51 UTC
(In reply to Yaniv Kaul from comment #56)
> What's the next step here?

We have not been able to reproduce the memory leak issue, and there has not been any further information from the customer.

Mauro, can we close this bug?

Comment 61 Sahina Bose 2019-02-11 06:22:04 UTC
Mauro, any update? Should we continue to keep this bug open?

Comment 63 Sahina Bose 2019-02-11 10:38:13 UTC
Closing as we do not have enough data/ understanding of customer specific issue to proceed