Bug 1255767 - [scale] - org.jboss.resteasy.spi.ResteasyProviderFactory potential leak
Summary: [scale] - org.jboss.resteasy.spi.ResteasyProviderFactory potential leak
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine-restapi
Version: 3.5.4
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ovirt-3.5.5
: 3.5.5
Assignee: Juan Hernández
QA Contact: Eldad Marciano
URL:
Whiteboard: infra
Depends On: 1250140
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-08-21 13:49 UTC by rhev-integ
Modified: 2016-02-10 19:14 UTC (History)
14 users (show)

Fixed In Version: ovirt-engine-3.5.5-1
Doc Type: Bug Fix
Doc Text:
Clone Of: 1250140
Environment:
Last Closed: 2015-10-26 18:33:21 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:1938 0 normal SHIPPED_LIVE Red Hat Enterprise Virtualization Manager 3.5.5 update 2015-10-26 22:31:23 UTC
oVirt gerrit 44932 0 ovirt-engine-3.5 MERGED restapi: JAXB provider Never
oVirt gerrit 44933 0 ovirt-engine-3.5 MERGED restapi: Faster JAXB element creation Never

Description rhev-integ 2015-08-21 13:49:16 UTC
+++ This bug is a RHEV-M zstream clone. The original bug is: +++
+++   https://bugzilla.redhat.com/show_bug.cgi?id=1250140. +++
+++ Requested by "juan.hernandez" +++
======================================================================



----------------------------------------------------------------------
Following comment by emarcian on August 04 at 15:12:24, 2015

Description of problem:
engine runs OOM after 1835 rest actions.
seems like there is class loader leak around: 
 org.jboss.resteasy.spi.ResteasyProviderFactory

which runs by 'Worker' Threads and 'org.ovirt.thread.pool'

the use case drives by QE engine that serve many jenkins jobs.

reproduced the bug on top synthetic engine required.

on other hand QE Automation will double check their automation code for risky areas such as (api.disconnect).


link for heap file:
http://file.tlv.redhat.com/gklein/heap-20635-2015-08-02_13-01-02.bin.gz

update the bug with further more information ASAP.
Version-Release number of selected component (if applicable):
3.5.4

How reproducible:
100%

Steps to Reproduce:
1. engine with 2GB ram.
2. 734 tests 2.5 times per day till the problem happens (1835 rest actions)

Actual results:
OOM after a while.

Expected results:
Continues Hours Operation 

Additional info:

----------------------------------------------------------------------
Following comment by emarcian on August 05 at 11:18:07, 2015

Nelly, 
how many connections jenkins handle?
is it connection per test ? or team or one connection for all?

----------------------------------------------------------------------
Following comment by juan.hernandez on August 17 at 11:59:15, 2015

What that heap dump shows is that the server has created 44 instances of the "com.sun.xml.bind.v2.runtime.JAXBContextImpl" class. Instances of this class store all the information required to convert any of the objects used by the RESTAPI to/from XML, and each of them consumes approx 22 MiB, for a total of approx 115 MiB.

All these "JAXBContextImpl" instances are created by the Resteasy builtin JAXB provider "org.jboss.resteasy.plugins.providers.jaxb.JAXBContextWrapper" and stored in a cache indexed by type of object:

  DataCenter -> First instance
  Cluster -> Second instance
  VM -> Third instance
  ...

In general these instances may contain different information, but in our case all of them are identical, so one would be enough, but this isn't how the builtin JAXB provider works.

If we want to improve this we need to backport the following change, which introduces a custom message body writer that creates only one JAXB context implementation:

  restapi: JAXB provider
  https://gerrit.ovirt.org/29789

Oved, please set the 3.5.z flag and acks if you want this backported.

----------------------------------------------------------------------
Following comment by oourfali on August 17 at 12:10:09, 2015

Sounds like we should. 
I've set flags and target release accordingly.

----------------------------------------------------------------------
Following comment by ncredi on August 17 at 12:43:23, 2015

one per execution

----------------------------------------------------------------------
Following comment by juan.hernandez on August 17 at 13:53:58, 2015

The two backported patches should fix this issue. To verify that the issue is fixed check that the number of instances of the JAXBContextImpl classes doesn't increase when new types of objects are requested via the RESTAPI:

  # ps -u ovirt | grep java
  22143 ?        00:00:24 java

  # su - ovirt -s /bin/sh

  $ jmap -histo 22143 | grep 'JAXBContextImpl$'
  1057: 4 352 com.sun.xml.bind.v2.runtime.JAXBContextImp

Before the fix the number of instances will increase when new types of objects are requested. After the fix the number of instances (the second column, 4 in the example above) should stay constant.

Comment 1 Gil Klein 2015-10-20 12:56:54 UTC
Verified based on the latest automation run.

Issue was not reproduced after runnign several test cycles on the same system.

Verified using vt17.1

Comment 3 errata-xmlrpc 2015-10-26 18:33:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1938.html


Note You need to log in before you can comment on or make changes to this bug.