Bug 1030460 - Seen out of memory kill in the engine .
Seen out of memory kill in the engine .
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rhsc (Show other bugs)
Unspecified Unspecified
high Severity high
: ---
: RHGS 2.1.2
Assigned To: Sahina Bose
: ZStream
Depends On: 1028966 1040049
  Show dependency treegraph
Reported: 2013-11-14 08:50 EST by RamaKasturi
Modified: 2015-05-13 12:27 EDT (History)
11 users (show)

See Also:
Fixed In Version: cb12
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2014-02-25 03:03:48 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Attaching the error screenshot. (171.24 KB, image/png)
2013-11-14 08:50 EST, RamaKasturi
no flags Details
Output of pmap after starting engine (57.88 KB, text/plain)
2013-11-22 08:12 EST, Sahina Bose
no flags Details
pmap2.txt (58.33 KB, text/plain)
2013-11-22 08:13 EST, Sahina Bose
no flags Details

  None (edit)
Description RamaKasturi 2013-11-14 08:50:31 EST
Created attachment 823961 [details]
Attaching the error screenshot.

Description of problem:
ovirt-engine service is getting crashed.

Version-Release number of selected component (if applicable):

How reproducible:
Not Always

Steps to Reproduce:
1. create a distributed volume and start it.
2. mount the volume and create a file of 10GB  in the volume.
3. select a brick and start removing it.

Actual results:
By the time data migration is completed and when you try clicking on the drop down in the activities column an popup comes saying "A request to the server failed .status code : 503.

Expected results:
No crash should happen.

Additional info:
Comment 3 Sahina Bose 2013-11-17 00:17:48 EST
Please provide engine and vdsm logs
Comment 5 Prasanth 2013-11-18 04:37:23 EST
Sahina, is this bug related to Bug 1026100 in RHEVM by any chance? If so, I think it might be useful for you to fix this bug!
Comment 6 Juan Hernández 2013-11-18 05:34:54 EST
This looks very similar to bug 1028966 in RHEV-M, as the engine is consuming more than 6 GiB of RSS.

To make progress we need to generate a heap dump of the engine when it is consuming this unusual amount of memory. I would suggest to try to reproduce in a machine with more RAM (the current one has 8 GiB) so that when the engine is consuming those 6 GiB we can make a heap dump before the out of memory killer kills it.
Comment 7 Juan Hernández 2013-11-18 10:34:27 EST
In RHEV-M we are studying if this can be caused by the 64 MiB memory areas created by the libc "malloc" allocator (87 were detected in bug 1028966 ). It would be helpful if you can check if the following setting in /etc/sysconfig/ovirt-engine helps:


Please make sure that this is effectively applied to the engine:

# ps -u ovirt
  PID TTY          TIME CMD
 1710 ?        00:00:00 ovirt-websocket
 4547 ?        00:00:00 ovirt-engine.py
 4549 ?        00:01:30 java

# strings /proc/4549/environ | grep MALLOC

This should reduce the number of 64 MiB areas to just 1.
Comment 8 Juan Hernández 2013-11-18 10:36:12 EST
Other useful information you can gather from the engine when this situation arises is the memory map generated with the "pmap" command:

# ps -u ovirt
  PID TTY          TIME CMD
 1710 ?        00:00:00 ovirt-websocket
 4547 ?        00:00:00 ovirt-engine.py
 4549 ?        00:01:30 java

# pmap 4549 > mymap.txt
Comment 9 Sahina Bose 2013-11-22 08:10:43 EST

For what it's worth - QE has been hitting this OOM killer ever since they started testing the engine on RHEL 6.5 and EAP 6.2.

From the pmap output when the memory consumption on engine vm was almost approaching the 8GB limit:

0000000000e13000 2868564 2534820 2534820 rw---    [ anon ]
00000000aff80000 1311232  840660  840660 rw---    [ anon ]

00007fe4c2482000 3301336 2353956 2353956 rw---    [ anon ]

And from pmap output when the engine was just started:
0000000000e13000 2868564K rw---    [ anon ]
00000000aff80000 1311232K rw---    [ anon ]

00007fe52c214000 1567120K rw---    [ anon ]

If you notice the third line has doubled. I'm not sure what it corresponds to however. Will attach both pmap outputs to the bug
Comment 10 Sahina Bose 2013-11-22 08:12:06 EST
Created attachment 827768 [details]
Output of pmap after starting engine

Output of pmap after starting engine
Comment 11 Sahina Bose 2013-11-22 08:13:14 EST
Created attachment 827769 [details]

Output of pmap when engine was consuming close to 8GB
Comment 12 Juan Hernández 2013-11-25 06:46:40 EST
Please take a look at comment 17 in bug 1028966. If you can do the same in your environment it will help to determine the cause of this issue.

Comment 13 Juan Hernández 2013-11-28 07:24:08 EST
Sahina, I think that this bug should now be closed as a duplicate of bug 1028966, and the solution should be the same proposed there.
Comment 15 Sahina Bose 2013-12-15 21:32:56 EST
The patch which introduces "Conflicts: java-1.7.0-openjdk = 1:" (as per comment 48 on Bug 1028966) has been merged into RHSC repository
Comment 16 Sahina Bose 2013-12-17 00:44:39 EST
openjdk update is available in RHEL 6.5.z stream. Please ensure that you're subscribed to this.
Comment 17 Prasanth 2013-12-17 06:24:20 EST
(In reply to Sahina Bose from comment #16)
> openjdk update is available in RHEL 6.5.z stream. Please ensure that you're
> subscribed to this.


As RHS-C server is expected to be subscribed to base RHEL 6 channel (rhel-x86_64-server-6) for getting the required child channels [1], is it possible to get openjdk update which is available in RHEL 6.5.z stream??


If so, please let me know how does that work.
Comment 18 Sahina Bose 2013-12-17 06:34:49 EST
I'm assuming the base RHEL 6 channel will have the Z stream updates. If not, need to check with Rel eng how to get these.
Comment 19 RamaKasturi 2013-12-25 09:00:36 EST
Have not seen this issue with cb12 and with the new open jdk java version of "java-1.7.0-openjdk-" . so marking this verified. 

Will re open  if it happens again.
Comment 21 errata-xmlrpc 2014-02-25 03:03:48 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.