149011 – Oracle 8 import of Oracle 9 database can lock system.

Bug 149011 - Oracle 8 import of Oracle 9 database can lock system.

Summary: Oracle 8 import of Oracle 9 database can lock system.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Larry Woodman
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	156320
TreeView+	depends on / blocked

Reported:	2005-02-17 21:56 UTC by Hisashi T Fujinaka
Modified:	2007-11-30 22:07 UTC (History)
CC List:	4 users (show)
Fixed In Version:	RHSA-2005-663
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-09-28 14:47:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
AltSysRqM output (8.46 KB, text/plain) 2005-03-01 19:03 UTC, Hisashi T Fujinaka	no flags	Details
AltSysRq[PWT] output from syslog (31.88 KB, text/plain) 2005-04-15 18:20 UTC, Hisashi T Fujinaka	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2005:663	0	qe-ready	SHIPPED_LIVE	Important: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 6	2005-09-28 04:00:00 UTC

Description Hisashi T Fujinaka 2005-02-17 21:56:25 UTC

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:

Every time, with a particular database.

Steps to Reproduce:
1. Export data from Oracle 9 (our specific data set) to an Oracle 8 machine.
2. Import to Oracle 8.
3. Watch the system consume all memory/swap and become unresponsive.
  
Actual results:

The database imports and when integrity constraints are being applied, the kernel keeps 
allowing memory to be consumed until the system dies. This is on a patched RHEL 3 
system. I watched the free memory/swap go away with top.

Expected results:

I would expect overcommit_memory to be set, or something would stop the process from 
consuming ALL of memory.

This has worked in the past, I don't know what has changed in the data set, however, the 
kernel still should not allow oracle to keeps spawning threads to consume all of memory.

Additional info:

Unfortunately, the data set has patient information and we can not provide it.

Comment 1 Suzanne Hillman 2005-02-18 19:38:34 UTC

Since this refers to a RHEL3 system, not a RHEL4 one, I'm modifying the version
number accordingly.

Comment 2 Larry Woodman 2005-03-01 14:15:10 UTC

Please privide lots more information so I can start debugging this
problem: top output, processor type, exact kernel version string,
AltSysrq-M outputs, etc.

Larry Woodman

Comment 3 Hisashi T Fujinaka 2005-03-01 19:01:42 UTC

Here is the uname output:
Linux beatle.verinform.com 2.4.21-27.0.2.EL #1 Wed Jan 12 23:46:37 EST 2005 i686 i686 
i386 GNU/Linux

And the AltSysrq-M output will be attached.

I'm unclear about what else you want. Do you want the top output before the system 
hangs?

Comment 4 Hisashi T Fujinaka 2005-03-01 19:03:32 UTC

Created attachment 111542 [details]
AltSysRqM output

Comment 5 Larry Woodman 2005-04-06 18:32:00 UTC

Hisashi, the AltSysrq-M doen not show any problems with memory.  Please get me
several AltSysrq-P outputs and one AltSysrq-W and one AltSysrq-T output when the
system is hung so I can see what is running on each CPU and what each process in
blocked on.

Thanks, Larry Woodman

Comment 6 Larry Woodman 2005-04-06 19:50:44 UTC

OK, I think I see the problem here:

On one CPU kswapd calls launder_page() which increments the page->count and
calls page_cache_release() with the zone->lru_lock held when that page is being
re-activated.
On another CPU if the process last process that maps that page calls exit,
page_cache_release()gets called for the same page.  If thats the last reference
to the page and it races with kswapd, launder_page() will call __free_pages_ok()
with the and zone->lru_lock held deadlock.

This patch fixes this problem:
-------------------------------------------------------------------
--- linux-2.4.21/mm/vmscan.c.orig
+++ linux-2.4.21/mm/vmscan.c
@@ -315,7 +315,9 @@ int launder_page(zone_t * zone, int gfp_
 	if (cache_ratio(zone) > cache_limits.max && page_anon(page) &&
 			free_min(zone) < 0) {
 		add_page_to_active_list(page, INITIAL_AGE);
+		lru_unlock(zone);
 		page_cache_release(page);
+		lru_lock(zone);
 		return 0;
 	}

Comment 7 Hisashi T Fujinaka 2005-04-15 18:17:40 UTC

I have some more AltSysRq output, but now the process dies properly.

Now, if you can forward my errors to Oracle, somehow, since their imp triggered this kernel bug.

Comment 8 Hisashi T Fujinaka 2005-04-15 18:20:22 UTC

Created attachment 113241 [details]
AltSysRq[PWT] output from syslog

Comment 9 Larry Woodman 2005-04-15 19:06:43 UTC

Hisashi, from looking at this AltSysrq-M output it appears that the system hung
because 182589 pages of anonymous memory was VM_LOCK'd:

>>>aa:182592 ac:1665 id:54 il:0 ic:0 fr:636 


Is your application doing something that mlock()s memory or something???

Larry Woodman

Comment 10 Hisashi T Fujinaka 2005-04-15 19:10:31 UTC

Unfortunately, all I'm doing is running "imp" from Oracle 8.1.7.0. We mere mortals aren't privvy to the 
inner workings of Oracle programs.

Comment 11 Larry Woodman 2005-04-15 19:20:47 UTC

Ah, wait!  You have no swap space free!!  Thats the problem!!!

>>>Free swap:            0kB 


Fix that and the problem will go away.


Larry Woodman

Comment 12 Hisashi T Fujinaka 2005-04-15 19:26:35 UTC

Please read the opening bug. Having all my swap consumed is what this bug is all about.

Comment 13 Larry Woodman 2005-04-15 19:29:55 UTC

OK, sorry.  What is /proc/sys/vm/overcommit_memory?


Larry

Comment 14 Hisashi T Fujinaka 2005-04-15 22:31:59 UTC

OK, in clarification: the latest Alt-SysRq info was from a PATCHED system, from the patch send by Larry 
via the web page. Larry's fix now causes the program to crash after consuming all memory, which is 
better than the old behavior which was a hang forever.

Comment 15 Hisashi T Fujinaka 2005-04-15 22:33:33 UTC

/proc/sys/vm/overcommit_memory is the default, 0.

Comment 16 Ernie Petrides 2005-04-23 00:41:17 UTC

A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-32.2.EL).

Comment 24 Red Hat Bugzilla 2005-09-28 14:47:17 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html

Note You need to log in before you can comment on or make changes to this bug.