Description of problem: RHHI infrastructure (RHV 4.1.11 + Gluster 3.3 ) starts to show increasing amounts of VDSNetworkException ERRORs in the engine.log until hosted_engine glusterfs Storage Domain fails and the engine is restated. Analysis has shown there is no indicators a network issue or exhaustion. However it was detected that the ParOldGen heap area gets to 99% after 10 days more or less. Increasing usage 9/10% a day. When the use is close to 99% the aforementioned exceptions start to show up and the problem reproduces again. The customer provided a heap dump for further analysis. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Any logs?
Ravi, any news?
From the logs I see that in 8 hours time frame GlusterServersListVDSCommand and GlusterVolumesListVDSCommand is executed 9665 times each. Every three seconds there is an execution of the commands. Apart from the above issue I don't see anything else in the logs or the thread dump. It looks like there is an issue with Gluster integration.
Created attachment 1443379 [details] engine.log after ParOldGen gets to 99% (RHV 4.1.11) before engine restart
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
What's the next step here?
(In reply to Yaniv Kaul from comment #56) > What's the next step here? We have not been able to reproduce the memory leak issue, and there has not been any further information from the customer. Mauro, can we close this bug?
Mauro, any update? Should we continue to keep this bug open?
Closing as we do not have enough data/ understanding of customer specific issue to proceed