147456 – Kernel doesn't handle memory exhaustion caused by processes in I/O block

Bug 147456 - Kernel doesn't handle memory exhaustion caused by processes in I/O block

Summary: Kernel doesn't handle memory exhaustion caused by processes in I/O block

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Larry Woodman
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-02-08 04:00 UTC by David Kelertas
Modified:	2007-11-30 22:07 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-02-09 00:17:02 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description David Kelertas 2005-02-08 04:00:20 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041107 Firefox/1.0

Description of problem:
We have observed processes that suffer from memory leaks (internal
issue, which has been fixed) which cause the server to go into I/O block.

The kernel tries to kill (SIGTERM) the process which has exhausted the
memory, but the kill is unsuccesful as the process is waiting on I/O.

The process in this case was logging to an NFS file and was doing a
lot of message handling and interactions with an Oracle install.

Version-Release number of selected component (if applicable):
kernel-2.4.21-27.0.2.EL

How reproducible:
Couldn't Reproduce

Steps to Reproduce:
I've writen a small program to exhaust the memory (malloc loop) and
that is handled correctly. I haven't added I/O activity in yet to
produce a I/O block. The primary servers where this is seen are
production development systems.

1. Exhaust the memory (RAM) with a process that has network I/O
   

Actual Results:  Kernel loops trying to kill (SIGTERM) the process
that has exhausted the memory.

Expected Results:  The process is killed and the system returns to
status-quo.

Additional info:

Is it  possible to patc the kernel to send a SIGKILL if the
application does not respond to a SIGTERM after a period of time. This
will generally be for applications that are in I/O block.

Comment 1 Ernie Petrides 2005-02-09 00:17:02 UTC

Hello, David.  In general, processes waiting for I/O completion cannot be
interrupted by signal delivery (because locks or other critical resources
need to be held during the operation).  This is appropriate/desirable
behavior, and the kernel is not looping during the I/O wait even if a
signal has been posted (i.e., queued for delivery).  These types of waits
are generally very short-term.

Waiting on I/O completion from a remote machine (e.g., for an NFS request)
is a bit of a special case.  In certain situations, it might be helpful to
use the "intr" and/or "soft" options for NFS mounts, but this is generally
not advised because of the potential for data loss (from interrupted NFS
operations).

Applications that consume large amounts of virtual memory will likely have
their physical pages "stolen" (reallocated to other processes) if they were
to become stuck in a long-term I/O wait, so this is not a problem.

Comment 2 David Kelertas 2005-02-14 02:50:01 UTC

We've had scenarios where hosts which have an application which has a memory
leak and consequently consumes all the memory available, the OS then attempts to
kill the process, but is unable to because it is I/O bound.

The host and process remain in this state for hours (I've observed it happening
for at least 8 hours) and a physcial power-cycle is required to regain access to
the server.

Note You need to log in before you can comment on or make changes to this bug.