Red Hat Bugzilla – Bug 147456
Kernel doesn't handle memory exhaustion caused by processes in I/O block
Last modified: 2007-11-30 17:07:06 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Description of problem:
We have observed processes that suffer from memory leaks (internal
issue, which has been fixed) which cause the server to go into I/O block.
The kernel tries to kill (SIGTERM) the process which has exhausted the
memory, but the kill is unsuccesful as the process is waiting on I/O.
The process in this case was logging to an NFS file and was doing a
lot of message handling and interactions with an Oracle install.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
I've writen a small program to exhaust the memory (malloc loop) and
that is handled correctly. I haven't added I/O activity in yet to
produce a I/O block. The primary servers where this is seen are
production development systems.
1. Exhaust the memory (RAM) with a process that has network I/O
Actual Results: Kernel loops trying to kill (SIGTERM) the process
that has exhausted the memory.
Expected Results: The process is killed and the system returns to
Is it possible to patc the kernel to send a SIGKILL if the
application does not respond to a SIGTERM after a period of time. This
will generally be for applications that are in I/O block.
Hello, David. In general, processes waiting for I/O completion cannot be
interrupted by signal delivery (because locks or other critical resources
need to be held during the operation). This is appropriate/desirable
behavior, and the kernel is not looping during the I/O wait even if a
signal has been posted (i.e., queued for delivery). These types of waits
are generally very short-term.
Waiting on I/O completion from a remote machine (e.g., for an NFS request)
is a bit of a special case. In certain situations, it might be helpful to
use the "intr" and/or "soft" options for NFS mounts, but this is generally
not advised because of the potential for data loss (from interrupted NFS
Applications that consume large amounts of virtual memory will likely have
their physical pages "stolen" (reallocated to other processes) if they were
to become stuck in a long-term I/O wait, so this is not a problem.
We've had scenarios where hosts which have an application which has a memory
leak and consequently consumes all the memory available, the OS then attempts to
kill the process, but is unable to because it is I/O bound.
The host and process remain in this state for hours (I've observed it happening
for at least 8 hours) and a physcial power-cycle is required to regain access to