Bug 169776 - OOM-Killer kill Oracle Processes then system
OOM-Killer kill Oracle Processes then system
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Larry Woodman
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-10-03 10:55 EDT by Thomas Tracy
Modified: 2007-11-30 17:07 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-10-18 11:54:42 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
oom-killer log file from /var/log/messages (33.48 KB, application/octet-stream)
2005-10-03 11:00 EDT, Thomas Tracy
no flags Details
slabinfo file from 8 hour run before oom-killer starts (385.76 KB, text/plain)
2005-10-03 11:01 EDT, Thomas Tracy
no flags Details
CPU amd Memory information during a test (3.01 KB, text/plain)
2005-10-03 11:40 EDT, Thomas Tracy
no flags Details

  None (edit)
Description Thomas Tracy 2005-10-03 10:55:16 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461; SV1; (R1 1.5); .NET CLR 1.1.4322; .NET CLR 1.0.3705)

Description of problem:
Running with either RHAS4 update 1 or RHAS4 update 2 OOM-killer kills Oracle processes after 8 hours in a single query. This query is a complex join of 2 tables that produces high I/O,CPU time. Other types of workloads (reading tables) causes oom-killer to raise it's head after 18 hours of reading tables. 

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-16.EL

How reproducible:
Always

Steps to Reproduce:
1.start system
2. mount ocfsv2 volumes
3.start database
4. wait 8 hours for oom-killer to start
  

Actual Results:  System crash

Expected Results:  queries should finished

Additional info:

attaching logs from message file. Last test I did last night, I wrote a script that copied the contents of /proc/slabinfo into a text file during a test. I have seen similar but dis-similar bugs on this subject but nothing concerning Oracle 10.1.0.4
Comment 1 Thomas Tracy 2005-10-03 11:00:49 EDT
Created attachment 119551 [details]
oom-killer log file from /var/log/messages
Comment 2 Thomas Tracy 2005-10-03 11:01:56 EDT
Created attachment 119552 [details]
slabinfo file from 8 hour run before oom-killer starts
Comment 3 Thomas Tracy 2005-10-03 11:40:16 EDT
Created attachment 119553 [details]
CPU amd Memory information during a test
Comment 4 Thomas Tracy 2005-10-06 10:25:05 EDT
Do not know if this will help but I was finally successful in commpleting an
Oracle complex-join query by turning off NFS and shutting down an internal
program  called collectl, which gathers basis system statistic
(CPU,Memory,IO,Network Bandwidth). The query takes 17 hours to complete which is
about right for a 2 cpu blade. I am running another test now with NFS turned on,
the collectl program off. I have seen a note within bugzilla of changing
/proc/sys/vm/lower_zone_protestion to 100. That had no effect in previous
experiments. 
Comment 5 Larry Woodman 2005-10-07 15:54:26 EDT
The problem appears to be *someone* is leaking about 700MB of lowmem via
kmalloc() of size 32 bytes:

size-32 20159195 20159195 32 119 1 : tunables 120 60 8 : slabdata 169405 169405 0


Please send along an lsmod output and an AltSysrq-M output when this happens.

Larry Woodman
Comment 6 Tom Tracy 2005-10-18 10:55:55 EDT
Larry
        We can close this bug as we discovered that the memory leak was caused 
by ocfsv2 version 1.0.4-1. Working with Oracle, we have tested and verified 
the fix.

Thanks
Tom

Note You need to log in before you can comment on or make changes to this bug.