Bug 1030460
Summary: | Seen out of memory kill in the engine . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | RamaKasturi <knarra> | ||||||||
Component: | rhsc | Assignee: | Sahina Bose <sabose> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | RamaKasturi <knarra> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 2.1 | CC: | dpati, dtsang, grajaiya, herrold, juan.hernandez, knarra, mmahoney, pprakash, rhs-bugs, sabose, ssampat | ||||||||
Target Milestone: | --- | Keywords: | ZStream | ||||||||
Target Release: | RHGS 2.1.2 | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | cb12 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2014-02-25 08:03:48 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 1028966, 1040049 | ||||||||||
Bug Blocks: | |||||||||||
Attachments: |
|
Please provide engine and vdsm logs Sahina, is this bug related to Bug 1026100 in RHEVM by any chance? If so, I think it might be useful for you to fix this bug! This looks very similar to bug 1028966 in RHEV-M, as the engine is consuming more than 6 GiB of RSS. To make progress we need to generate a heap dump of the engine when it is consuming this unusual amount of memory. I would suggest to try to reproduce in a machine with more RAM (the current one has 8 GiB) so that when the engine is consuming those 6 GiB we can make a heap dump before the out of memory killer kills it. In RHEV-M we are studying if this can be caused by the 64 MiB memory areas created by the libc "malloc" allocator (87 were detected in bug 1028966 ). It would be helpful if you can check if the following setting in /etc/sysconfig/ovirt-engine helps: export MALLOC_ARENA_MAX=1 Please make sure that this is effectively applied to the engine: # ps -u ovirt PID TTY TIME CMD 1710 ? 00:00:00 ovirt-websocket 4547 ? 00:00:00 ovirt-engine.py 4549 ? 00:01:30 java # strings /proc/4549/environ | grep MALLOC MALLOC_ARENA_MAX=1 This should reduce the number of 64 MiB areas to just 1. Other useful information you can gather from the engine when this situation arises is the memory map generated with the "pmap" command: # ps -u ovirt PID TTY TIME CMD 1710 ? 00:00:00 ovirt-websocket 4547 ? 00:00:00 ovirt-engine.py 4549 ? 00:01:30 java # pmap 4549 > mymap.txt Juan, For what it's worth - QE has been hitting this OOM killer ever since they started testing the engine on RHEL 6.5 and EAP 6.2. From the pmap output when the memory consumption on engine vm was almost approaching the 8GB limit: 0000000000e13000 2868564 2534820 2534820 rw--- [ anon ] 00000000aff80000 1311232 840660 840660 rw--- [ anon ] 00007fe4c2482000 3301336 2353956 2353956 rw--- [ anon ] And from pmap output when the engine was just started: 0000000000e13000 2868564K rw--- [ anon ] 00000000aff80000 1311232K rw--- [ anon ] 00007fe52c214000 1567120K rw--- [ anon ] If you notice the third line has doubled. I'm not sure what it corresponds to however. Will attach both pmap outputs to the bug Created attachment 827768 [details]
Output of pmap after starting engine
Output of pmap after starting engine
Created attachment 827769 [details]
pmap2.txt
Output of pmap when engine was consuming close to 8GB
Please take a look at comment 17 in bug 1028966. If you can do the same in your environment it will help to determine the cause of this issue. https://bugzilla.redhat.com/show_bug.cgi?id=1028966#c17 Sahina, I think that this bug should now be closed as a duplicate of bug 1028966, and the solution should be the same proposed there. The patch which introduces "Conflicts: java-1.7.0-openjdk = 1:1.7.0.45-2.4.3.3.el6" (as per comment 48 on Bug 1028966) has been merged into RHSC repository openjdk update is available in RHEL 6.5.z stream. Please ensure that you're subscribed to this. (In reply to Sahina Bose from comment #16) > openjdk update is available in RHEL 6.5.z stream. Please ensure that you're > subscribed to this. Sahina, As RHS-C server is expected to be subscribed to base RHEL 6 channel (rhel-x86_64-server-6) for getting the required child channels [1], is it possible to get openjdk update which is available in RHEL 6.5.z stream?? [1] rhel-x86_64-server-6-rhs-rhsc-2.1 jbappplatform-6-x86_64-server-6-rpm If so, please let me know how does that work. I'm assuming the base RHEL 6 channel will have the Z stream updates. If not, need to check with Rel eng how to get these. Have not seen this issue with cb12 and with the new open jdk java version of "java-1.7.0-openjdk-1.7.0.45-2.4.3.4.el6_5.x86_64" . so marking this verified. Will re open if it happens again. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html |
Created attachment 823961 [details] Attaching the error screenshot. Description of problem: ovirt-engine service is getting crashed. Version-Release number of selected component (if applicable): rhsc-2.1.2-0.23.master.el6_5.noarch How reproducible: Not Always Steps to Reproduce: 1. create a distributed volume and start it. 2. mount the volume and create a file of 10GB in the volume. 3. select a brick and start removing it. 4. Actual results: By the time data migration is completed and when you try clicking on the drop down in the activities column an popup comes saying "A request to the server failed .status code : 503. Expected results: No crash should happen. Additional info: