Bug 293641 - kswapd0 hangs the system
Summary: kswapd0 hangs the system
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.4
Hardware: i686
OS: Linux
Target Milestone: ---
: ---
Assignee: Larry Woodman
QA Contact: Martin Jenner
Depends On:
TreeView+ depends on / blocked
Reported: 2007-09-17 18:34 UTC by Iftequar F Mohammed
Modified: 2012-06-20 16:12 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2012-06-20 16:12:37 UTC
Target Upstream Version:

Attachments (Terms of Use)
mpstat data collected on the server just before it rebooted (2.06 KB, text/plain)
2007-09-17 18:34 UTC, Iftequar F Mohammed
no flags Details
top data collected on the server just before it rebooted (15.70 KB, text/plain)
2007-09-17 19:16 UTC, Iftequar F Mohammed
no flags Details
vmstat data collected on the server before it rebooted. (2.58 KB, text/plain)
2007-09-17 19:17 UTC, Iftequar F Mohammed
no flags Details
This is the typical data that is collected by Oracle oswatcher utility. (791.55 KB, application/zip)
2007-09-18 20:07 UTC, Iftequar F Mohammed
no flags Details
dmesg output (40.01 KB, text/plain)
2007-09-25 14:25 UTC, Iftequar F Mohammed
no flags Details

Description Iftequar F Mohammed 2007-09-17 18:34:49 UTC
Description of problem:
I am running RHEL AS 4.0 Update 4 on HP-DL580 with 16GB of memory and 4 Xeon
processors. Once about every month the kswapd0 process is taking all the CPU
resources on the machine and bringing it to halt. 
Actually there are three such servers. These servers are part of our 3-node
Oracle RAC cluster and runs our production databases. kswapd process runs on a
server and takes all the CPU resources. Other processes are hung. Since this
node is part of the Oracle Cluster it is being evicted from the cluster when
unable to communicate.
I have seen this problem happening on all the three nodes at different times.
They have happened several times. They have happened after the server has been
running for about 1 month.

Version-Release number of selected component (if applicable):
[oracle.ichotels.com] cat /etc/redhat-release 
Red Hat Enterprise Linux AS release 4 (Nahant Update 4)

[oracle.ichotels.com] uname -a
Linux racdbp2.dcb.ichotels.com 2.6.9-42.0.8.ELsmp #1 SMP Tue Jan 23 12:49:51 EST
2007 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
It has been happened several times on all of three servers. This has happened
after the server is up for about 1 month.

Steps to Reproduce:
Actual results:

Expected results:

Additional info:
I will update few files with the data collected before the server rebooted.

Comment 1 Iftequar F Mohammed 2007-09-17 18:34:49 UTC
Created attachment 197701 [details]
mpstat data collected on the server just before it rebooted

Comment 2 Iftequar F Mohammed 2007-09-17 19:16:11 UTC
Created attachment 197721 [details]
top data collected on the server just before it rebooted

Comment 3 Iftequar F Mohammed 2007-09-17 19:17:33 UTC
Created attachment 197731 [details]
vmstat data collected on the server before it rebooted.

Comment 4 Iftequar F Mohammed 2007-09-17 19:19:24 UTC
The server has 16GB memory and 18GB of swap.

[oracle.ichotels.com] free
             total       used       free     shared    buffers     cached
Mem:      16417680   15057028    1360652          0     246984    8021748
-/+ buffers/cache:    6788296    9629384
Swap:     18876364          0   18876364

Comment 5 Larry Woodman 2007-09-18 19:03:01 UTC
I am working on this issue.  Can you get me AltSysrq-W and AltSysrq-M outputs
when this happens just to make sure its the same thing I think it is?

Thanks, Larry Woodman

Comment 6 Iftequar F Mohammed 2007-09-18 19:53:52 UTC
Hello Larry,
The problem is this is happening about once a month. The most recent server
reboot was yesterday after an uptime of 42 days. If the system is hung for few
minutes it is being rebooted by the Oracle cluster software. It is hard to tell
when the system will be hung.
I was running Oracle's data collection utility called oswatcher. It basically
collects data about the system every 30 seconds and stores in the files. I can
upload those files to you.
I am sorry I cannot get you AltSysrq-W and AltSysrq-M data.
If you want to talk to me please feel free to call me at 770-604-5606 or
419-290-9988. I really appreciate your help.

Comment 7 Iftequar F Mohammed 2007-09-18 20:07:10 UTC
Created attachment 198811 [details]
This is the typical data that is collected by Oracle oswatcher utility.

The oswatcher data is rolled-over every 48 hours, so if you need some old data
I can upload it.

Comment 8 Iftequar F Mohammed 2007-09-20 13:36:39 UTC
Good Morning Larry,
Please update.

Comment 9 Iftequar F Mohammed 2007-09-24 13:35:09 UTC
Please update. THanks.

Comment 10 Iftequar F Mohammed 2007-09-25 13:54:58 UTC
Is anyone working on this? I am asking for an update on this for the last
several days with no response from your end. This is unprofessional. Thanks.

Comment 11 Larry Woodman 2007-09-25 14:08:56 UTC
Sorry for the delay.  There is no way to get AltSysrq-M or AltSysrq-W data? 
This is what I usually need to debug a hung system.  Also, can you temm me if
the system is running with memory interleaving enabled or if it is a NUMA system.


1.) echo 1 > /proc/sys/kernel/sysrq
2.) echo m > /proc/sysrq-trigger
3.) dmesg and attach the output.

Thanks, Larry Woodman

Comment 12 Iftequar F Mohammed 2007-09-25 14:25:51 UTC
Created attachment 205521 [details]
dmesg output

There is no way to get AltSysrq-M or AltSysrq-W data? 
No. We don't know when the system will be hung next time. It will be rebooted
automagically by Oracle within few minutes once it is hung.

Also, can you tell me if the system is running with memory interleaving enabled
or if it is a NUMA system.
I will ask the system admins at the data center and let you know as soon as

I have attached the dmesg output.


Comment 13 Iftequar F Mohammed 2007-09-28 13:45:14 UTC
This is not a NUMA system. This is a DL580 G3, and I do not think NUMA is an option.

[root@racdbp1]~# numactl --show
policy: default
preferred node: 0
interleavenode: 0
nodebind: 0
membind: 0
[root@racdbp1]~# numactl --hardware
available: 1 nodes (0-0)
node 0 size: 16895 MB
node 0 free: 46 MB
[root@racdbp1]~# dmesg |grep -i numa
No NUMA configuration found
[root@racdbp1]~# dmesg |grep command
Bootdata ok (command line is ro root=LABEL=/ apm=off nousb apm=off iommu=off)
Kernel command line: ro root=LABEL=/ apm=off nousb apm=off iommu=off console=tty0

Comment 14 Larry Woodman 2007-09-28 13:50:50 UTC
Not being a NUMA system is even more of a reason I need to see AltSysrw-M and
AltSysrw-W outputs when the system is hung.  Having said that I did make changes
to RHEL4-U6 to prevent the system form getting into this state, can you try the
latest RHEL4-U6 beta kernel???  If yes, you also need to set the new tunable
parameter /proc/sys/vm/pagecache to 10.

Larry Woodman

Comment 15 Larry Woodman 2007-10-08 14:46:29 UTC
If I give you a kernel that will print the AltSysrq-M and AltSysrq-W output to
the console when you ping the system can you run it???  I'd REALLY like see
exactly what the system is doing when this hang occurs co I can verify its the
same problem I fixed in RHEL4-U6.  Also, are you considering running the latest
RHEL4-U6 kernel

Larry Woodman

Comment 16 Iftequar F Mohammed 2007-10-08 15:50:15 UTC
Your service Sucks!! We have decided to move to Oracle Linux. Thanks for response.

Comment 17 Larry Woodman 2007-10-12 11:32:27 UTC
Did you get a chance to try the latest RHEL4-U6 kernel???  I made a change to
prevent the system form getting hung in this state.  You need to install
RHEL4-U6 and then set /proc/sys/vm/pagecache to 10%.  This will prevent kswapd
and all other callers of try_to_free_pages() from getting stuck on the
zone->lru_lock.  We have verified that this prevents the hang you are seeing
when memory becomes exhausted on x86_64 systems with lots of CPUs and Lots or RAM.

Larry Woodman

Comment 18 Kathy Whyte 2008-07-21 19:26:15 UTC
I'm having issues with kswapd on RHEL 4 U6.  I've tried kernels 2.6.9-67.0.20,
2.6.9-67.0.15 and 2.6.9-67.0.4 both with and without smp.
Seems to be a memory leak I'm going to check RHEl 5, but I can't go to that at
this particular juncture.

Any help would be greatly appreciated.

Comment 19 Larry Woodman 2008-07-21 19:39:43 UTC
Kathy, can you provide us with whatever data you have on this kswaopd issue??? 
Is the system hanging, if so can you get AltSysrq-M output so I casn see the
exact memory state?  Also, you seem to think the system is leaking memory, can
you provide me with whatever data or evidence you have of this?  Finally, if you
can send me some sort of reproducer program that I can run on my system that
would make debugging this problem much faster than going back and forth.

Thanks, Larry Woodman

Comment 20 Kathy Whyte 2008-07-21 19:59:26 UTC
I have consistently reproduced the issue with the installation of the Intel
compilers from: http://intel.com/cd/software/products/asmo-na/eng/219771.htm

2.6.9-67.0.4.ELsmp just never seems to get past the testing mode of the above

In the other kernels I am using top and saw the swap go to 2G of usage and have
2G of actual RAM in the system a Dell GX280. I imagine I can reproduce on my
Dell GX 755 as well. 

Need to do some work to get firefox and g++ on my RHEL 5 test system and can let
you know the results there.

I've also contacted Intel Product Support about this, but no response yet.

Thanks for you prompt reply.
Kathy Whyte

Comment 21 Kathy Whyte 2008-07-21 21:08:13 UTC

I just tried on a freshly installed RHEL 4 U6 Dell Optiplex 755 and it seems to

3G RAM in this baby, but it never seems to hardly touch it...
certainly it never swaps.
Kernel is 2.6.9-67.0.20.ELsmp

Kathy Whyte

Comment 22 Jiri Pallich 2012-06-20 16:12:37 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.

Note You need to log in before you can comment on or make changes to this bug.