133938 – RHEL3 U4: livelock due to out-of-swap

Bug 133938 - RHEL3 U4: livelock due to out-of-swap

Summary: RHEL3 U4: livelock due to out-of-swap

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Larry Woodman
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-09-28 15:45 UTC by Matt Domsch
Modified:	2007-11-30 22:07 UTC (History)
CC List:	3 users (show)
Fixed In Version:	Update 4
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-02-02 14:54:29 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
20040928-sysrq-outofswap.txt (800.36 KB, text/plain) 2004-09-28 18:06 UTC, Matt Domsch	no flags	Details
View All

Description Matt Domsch 2004-09-28 15:45:33 UTC

Description of problem:
IssueTracker 50542 opened to match this BZ.

RHEL3 pre-U4 kernel 2.4.21-20.5 plus two patches described in 
bugzilla 131525 (lwoodman's patch and my free_more_memory patch).  
System running OMSA omdiag system memory test.  System has 256MB RAM, 
and 512, 1GB, and then 2GB swap available.  System runs out of swap 
space and livelocks with each amount of swap space. Failure occurs in 
<3 hours.

SysRq-{m,p,t] output indicates that the VM moves pages between the 
active anonymous, inactive dirty, and inactive laundry list 
repeatedly and continuously.  No disk I/O is occuring.

Out-of-memory killer does kill 16 threads of the omawsd32 web server 
daemon during the run.  I'm not sure if these all happen at once, or 
if the system can make forward progress after killing them.

Why is the system running out of swap space? 

Version-Release number of selected component (if applicable):


How reproducible:
easy on one system at Dell.  Other similar systems did not fail in 
the same manner.

Comment 1 Larry Woodman 2004-09-28 17:31:59 UTC

Can you attach the show_mem() output from dmesg when the OOM kill occurs?

Larry

Comment 2 Matt Domsch 2004-09-28 18:06:58 UTC

Created attachment 104451 [details]
20040928-sysrq-outofswap.txt

Sorry, it was attached in the IssueTracker, just not in the bugzilla.

The top 16 or so Show Mem's come from the oom_killer killing an omawsd32
thread.  Following that are my pressing sysrq-[mptw] repeatedly.

Comment 3 Larry Woodman 2004-09-28 18:19:21 UTC

Matt, do you know what the "OMSA omdiag system memory test" does?  Do
you really think it uses up all of the memory and swap space or do you
think we are leaking swap space?

Can you grab a "ps aux" when the system is in this state so we can see
the VSS and RSS of every process?


Larry

Comment 4 Matt Domsch 2004-09-28 18:37:24 UTC

Something is leaking swap space, I don't know if it's the tool itself 
or the kernel.

omdiag (I don't have the source to it, but it is Dell-developed just 
not open source) first allocates as much memory as it can, up to 95% 
of system RAM via calls to malloc() in large (1MB+) blocks, reducing 
the block size as malloc() fails, until it's gotten to 95% of RAM.  
For systems with >2GB RAM, it forks itself until it can malloc up to 
the 95% point.  It then spawns two threads per process, one of which 
touches each allocated page in a loop, which keeps its VSS=RSS and 
most of its pages on the active anonymous lists.  The second thread 
write/read/compares each byte of each page.  After a while it decides 
it's finished, and frees all the memory again, and starts over.  The 
goal is to induce single-bit-correctable ECC memory errors.  In 
practice it beats on the VM until it cries uncle.  This is the same 
tool that induced the kswapd deadlock we've just fixed.

The processes are named omdiag and memorytestprocess.

At the point of failure, no additional shell work is possible.  I did 
happen to have a 'top' running on another VT, which I can still see 
the first 15 or so processes on.  It's showing six instances of 
memorytestprocess, each with SIZE=256M and RSSs of 1396, 1336, 1364, 
1264, 1500, 1704, and one instance of omdiag with SIZE=9944 and 
RSS=1752.  You've got a point, there shouldn't be six 
memorytestprocess processes listed on this config, there should only 
be one at a time because there's <2GB RAM so it need not fork to be 
able to allocate all RAM.

/me is going to talk to the omdiag writers...

Comment 5 Matt Domsch 2004-09-28 18:47:07 UTC

Even if the app does manage to accidentally run the system out of 
swap space, the kernel shouldn't livelock.

Comment 6 Larry Woodman 2004-09-28 19:22:40 UTC

Agreed, I wonder if the OOM killing of tasks is not resulting in the
freeing of memory and/or swap space as it was designed to do?

Larry

Comment 7 Matt Domsch 2004-09-28 19:46:03 UTC

maybe, but the processes which were oom_killed were little web server 
threads, not the larger memorytestprocess processes.  So the 
oom_killer may have done little good anyhow in this case.

Comment 8 Matt Domsch 2004-09-28 22:04:00 UTC

note, only 16 (or in another case, 11) threads of omaws32 were 
killed, when there were likely 30+ threads running.  We know for a 
fact from the sysrq-t that there were more threads.  Therefore, no 
memory was reclaimed during the kill.

See 2.4-bk for wli's patch on 13-Aug-2004 to mm/oom_kill.c which 
fixes this by taking the task_lock and mmlist_lock when reading p->mm 
(which could otherwise wind up NULL accidentally), and calling mmput
(mm) on the mm for the whole process to make sure the whole mm is 
freed.

Comment 9 Matt Domsch 2004-09-30 02:45:48 UTC

A newer build of omdiag (AppCD 4.1.0 rev A00) no longer induces this 
failure.  It correctly cleans up the extra memorytestprocess 
processes on exit, as well as only allocates 85% system RAM for the 
test rather than 95%.  However, I believe the kernel failure is still 
real and a valid bug to fix.

Comment 10 Matt Domsch 2006-02-02 14:54:29 UTC

The related IT was closed, so closing this.

Comment 12 Ernie Petrides 2006-04-04 22:39:36 UTC

I don't believe we actually did anything to fix this, so I'm
changing the disposition to WONTFIX.

Note You need to log in before you can comment on or make changes to this bug.