Bug 149011 - Oracle 8 import of Oracle 9 database can lock system.
Summary: Oracle 8 import of Oracle 9 database can lock system.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Larry Woodman
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 156320
TreeView+ depends on / blocked
 
Reported: 2005-02-17 21:56 UTC by Hisashi T Fujinaka
Modified: 2007-11-30 22:07 UTC (History)
4 users (show)

Fixed In Version: RHSA-2005-663
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-09-28 14:47:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
AltSysRqM output (8.46 KB, text/plain)
2005-03-01 19:03 UTC, Hisashi T Fujinaka
no flags Details
AltSysRq[PWT] output from syslog (31.88 KB, text/plain)
2005-04-15 18:20 UTC, Hisashi T Fujinaka
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:663 0 qe-ready SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 6 2005-09-28 04:00:00 UTC

Description Hisashi T Fujinaka 2005-02-17 21:56:25 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:

Every time, with a particular database.

Steps to Reproduce:
1. Export data from Oracle 9 (our specific data set) to an Oracle 8 machine.
2. Import to Oracle 8.
3. Watch the system consume all memory/swap and become unresponsive.
  
Actual results:

The database imports and when integrity constraints are being applied, the kernel keeps 
allowing memory to be consumed until the system dies. This is on a patched RHEL 3 
system. I watched the free memory/swap go away with top.

Expected results:

I would expect overcommit_memory to be set, or something would stop the process from 
consuming ALL of memory.

This has worked in the past, I don't know what has changed in the data set, however, the 
kernel still should not allow oracle to keeps spawning threads to consume all of memory.

Additional info:

Unfortunately, the data set has patient information and we can not provide it.

Comment 1 Suzanne Hillman 2005-02-18 19:38:34 UTC
Since this refers to a RHEL3 system, not a RHEL4 one, I'm modifying the version
number accordingly.

Comment 2 Larry Woodman 2005-03-01 14:15:10 UTC
Please privide lots more information so I can start debugging this
problem: top output, processor type, exact kernel version string,
AltSysrq-M outputs, etc.

Larry Woodman


Comment 3 Hisashi T Fujinaka 2005-03-01 19:01:42 UTC
Here is the uname output:
Linux beatle.verinform.com 2.4.21-27.0.2.EL #1 Wed Jan 12 23:46:37 EST 2005 i686 i686 
i386 GNU/Linux

And the AltSysrq-M output will be attached.

I'm unclear about what else you want. Do you want the top output before the system 
hangs?


Comment 4 Hisashi T Fujinaka 2005-03-01 19:03:32 UTC
Created attachment 111542 [details]
AltSysRqM output

Comment 5 Larry Woodman 2005-04-06 18:32:00 UTC
Hisashi, the AltSysrq-M doen not show any problems with memory.  Please get me
several AltSysrq-P outputs and one AltSysrq-W and one AltSysrq-T output when the
system is hung so I can see what is running on each CPU and what each process in
blocked on.

Thanks, Larry Woodman

Comment 6 Larry Woodman 2005-04-06 19:50:44 UTC
OK, I think I see the problem here:

On one CPU kswapd calls launder_page() which increments the page->count and
calls page_cache_release() with the zone->lru_lock held when that page is being
re-activated.
On another CPU if the process last process that maps that page calls exit,
page_cache_release()gets called for the same page.  If thats the last reference
to the page and it races with kswapd, launder_page() will call __free_pages_ok()
with the and zone->lru_lock held deadlock.

This patch fixes this problem:
-------------------------------------------------------------------
--- linux-2.4.21/mm/vmscan.c.orig
+++ linux-2.4.21/mm/vmscan.c
@@ -315,7 +315,9 @@ int launder_page(zone_t * zone, int gfp_
 	if (cache_ratio(zone) > cache_limits.max && page_anon(page) &&
 			free_min(zone) < 0) {
 		add_page_to_active_list(page, INITIAL_AGE);
+		lru_unlock(zone);
 		page_cache_release(page);
+		lru_lock(zone);
 		return 0;
 	}
 


Comment 7 Hisashi T Fujinaka 2005-04-15 18:17:40 UTC
I have some more AltSysRq output, but now the process dies properly.

Now, if you can forward my errors to Oracle, somehow, since their imp triggered this kernel bug.

Comment 8 Hisashi T Fujinaka 2005-04-15 18:20:22 UTC
Created attachment 113241 [details]
AltSysRq[PWT] output from syslog

Comment 9 Larry Woodman 2005-04-15 19:06:43 UTC
Hisashi, from looking at this AltSysrq-M output it appears that the system hung
because 182589 pages of anonymous memory was VM_LOCK'd:

>>>aa:182592 ac:1665 id:54 il:0 ic:0 fr:636 


Is your application doing something that mlock()s memory or something???

Larry Woodman


Comment 10 Hisashi T Fujinaka 2005-04-15 19:10:31 UTC
Unfortunately, all I'm doing is running "imp" from Oracle 8.1.7.0. We mere mortals aren't privvy to the 
inner workings of Oracle programs.

Comment 11 Larry Woodman 2005-04-15 19:20:47 UTC
Ah, wait!  You have no swap space free!!  Thats the problem!!!

>>>Free swap:            0kB 


Fix that and the problem will go away.


Larry Woodman


Comment 12 Hisashi T Fujinaka 2005-04-15 19:26:35 UTC
Please read the opening bug. Having all my swap consumed is what this bug is all about.

Comment 13 Larry Woodman 2005-04-15 19:29:55 UTC
OK, sorry.  What is /proc/sys/vm/overcommit_memory?


Larry


Comment 14 Hisashi T Fujinaka 2005-04-15 22:31:59 UTC
OK, in clarification: the latest Alt-SysRq info was from a PATCHED system, from the patch send by Larry 
via the web page. Larry's fix now causes the program to crash after consuming all memory, which is 
better than the old behavior which was a hang forever.

Comment 15 Hisashi T Fujinaka 2005-04-15 22:33:33 UTC
/proc/sys/vm/overcommit_memory is the default, 0.

Comment 16 Ernie Petrides 2005-04-23 00:41:17 UTC
A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-32.2.EL).


Comment 24 Red Hat Bugzilla 2005-09-28 14:47:17 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html



Note You need to log in before you can comment on or make changes to this bug.