Description of problem: I am running RHEL AS 4.0 Update 4 on HP-DL580 with 16GB of memory and 4 Xeon processors. Once about every month the kswapd0 process is taking all the CPU resources on the machine and bringing it to halt. Actually there are three such servers. These servers are part of our 3-node Oracle RAC cluster and runs our production databases. kswapd process runs on a server and takes all the CPU resources. Other processes are hung. Since this node is part of the Oracle Cluster it is being evicted from the cluster when unable to communicate. I have seen this problem happening on all the three nodes at different times. They have happened several times. They have happened after the server has been running for about 1 month. Version-Release number of selected component (if applicable): [oracle.ichotels.com] cat /etc/redhat-release Red Hat Enterprise Linux AS release 4 (Nahant Update 4) [oracle.ichotels.com] uname -a Linux racdbp2.dcb.ichotels.com 2.6.9-42.0.8.ELsmp #1 SMP Tue Jan 23 12:49:51 EST 2007 x86_64 x86_64 x86_64 GNU/Linux How reproducible: It has been happened several times on all of three servers. This has happened after the server is up for about 1 month. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: I will update few files with the data collected before the server rebooted.
Created attachment 197701 [details] mpstat data collected on the server just before it rebooted
Created attachment 197721 [details] top data collected on the server just before it rebooted
Created attachment 197731 [details] vmstat data collected on the server before it rebooted.
The server has 16GB memory and 18GB of swap. [oracle.ichotels.com] free total used free shared buffers cached Mem: 16417680 15057028 1360652 0 246984 8021748 -/+ buffers/cache: 6788296 9629384 Swap: 18876364 0 18876364
I am working on this issue. Can you get me AltSysrq-W and AltSysrq-M outputs when this happens just to make sure its the same thing I think it is? Thanks, Larry Woodman
Hello Larry, The problem is this is happening about once a month. The most recent server reboot was yesterday after an uptime of 42 days. If the system is hung for few minutes it is being rebooted by the Oracle cluster software. It is hard to tell when the system will be hung. I was running Oracle's data collection utility called oswatcher. It basically collects data about the system every 30 seconds and stores in the files. I can upload those files to you. I am sorry I cannot get you AltSysrq-W and AltSysrq-M data. If you want to talk to me please feel free to call me at 770-604-5606 or 419-290-9988. I really appreciate your help.
Created attachment 198811 [details] This is the typical data that is collected by Oracle oswatcher utility. The oswatcher data is rolled-over every 48 hours, so if you need some old data I can upload it.
Good Morning Larry, Please update. Thanks.
Please update. THanks.
Is anyone working on this? I am asking for an update on this for the last several days with no response from your end. This is unprofessional. Thanks.
Sorry for the delay. There is no way to get AltSysrq-M or AltSysrq-W data? This is what I usually need to debug a hung system. Also, can you temm me if the system is running with memory interleaving enabled or if it is a NUMA system. Please: 1.) echo 1 > /proc/sys/kernel/sysrq 2.) echo m > /proc/sysrq-trigger 3.) dmesg and attach the output. Thanks, Larry Woodman
Created attachment 205521 [details] dmesg output There is no way to get AltSysrq-M or AltSysrq-W data? No. We don't know when the system will be hung next time. It will be rebooted automagically by Oracle within few minutes once it is hung. Also, can you tell me if the system is running with memory interleaving enabled or if it is a NUMA system. I will ask the system admins at the data center and let you know as soon as possible. I have attached the dmesg output. Thanks.
This is not a NUMA system. This is a DL580 G3, and I do not think NUMA is an option. [root@racdbp1]~# numactl --show policy: default preferred node: 0 interleavemask: interleavenode: 0 nodebind: 0 membind: 0 [root@racdbp1]~# numactl --hardware available: 1 nodes (0-0) node 0 size: 16895 MB node 0 free: 46 MB [root@racdbp1]~# dmesg |grep -i numa No NUMA configuration found [root@racdbp1]~# dmesg |grep command Bootdata ok (command line is ro root=LABEL=/ apm=off nousb apm=off iommu=off) Kernel command line: ro root=LABEL=/ apm=off nousb apm=off iommu=off console=tty0 [root@racdbp1]~#
Not being a NUMA system is even more of a reason I need to see AltSysrw-M and AltSysrw-W outputs when the system is hung. Having said that I did make changes to RHEL4-U6 to prevent the system form getting into this state, can you try the latest RHEL4-U6 beta kernel??? If yes, you also need to set the new tunable parameter /proc/sys/vm/pagecache to 10. Larry Woodman
If I give you a kernel that will print the AltSysrq-M and AltSysrq-W output to the console when you ping the system can you run it??? I'd REALLY like see exactly what the system is doing when this hang occurs co I can verify its the same problem I fixed in RHEL4-U6. Also, are you considering running the latest RHEL4-U6 kernel Larry Woodman
Your service Sucks!! We have decided to move to Oracle Linux. Thanks for response.
Did you get a chance to try the latest RHEL4-U6 kernel??? I made a change to prevent the system form getting hung in this state. You need to install RHEL4-U6 and then set /proc/sys/vm/pagecache to 10%. This will prevent kswapd and all other callers of try_to_free_pages() from getting stuck on the zone->lru_lock. We have verified that this prevents the hang you are seeing when memory becomes exhausted on x86_64 systems with lots of CPUs and Lots or RAM. Larry Woodman
I'm having issues with kswapd on RHEL 4 U6. I've tried kernels 2.6.9-67.0.20, 2.6.9-67.0.15 and 2.6.9-67.0.4 both with and without smp. Seems to be a memory leak I'm going to check RHEl 5, but I can't go to that at this particular juncture. Any help would be greatly appreciated.
Kathy, can you provide us with whatever data you have on this kswaopd issue??? Is the system hanging, if so can you get AltSysrq-M output so I casn see the exact memory state? Also, you seem to think the system is leaking memory, can you provide me with whatever data or evidence you have of this? Finally, if you can send me some sort of reproducer program that I can run on my system that would make debugging this problem much faster than going back and forth. Thanks, Larry Woodman
I have consistently reproduced the issue with the installation of the Intel compilers from: http://intel.com/cd/software/products/asmo-na/eng/219771.htm 2.6.9-67.0.4.ELsmp just never seems to get past the testing mode of the above installation. In the other kernels I am using top and saw the swap go to 2G of usage and have 2G of actual RAM in the system a Dell GX280. I imagine I can reproduce on my Dell GX 755 as well. Need to do some work to get firefox and g++ on my RHEL 5 test system and can let you know the results there. I've also contacted Intel Product Support about this, but no response yet. Thanks for you prompt reply. Kathy Whyte
Hmmm... I just tried on a freshly installed RHEL 4 U6 Dell Optiplex 755 and it seems to work... 3G RAM in this baby, but it never seems to hardly touch it... certainly it never swaps. Kernel is 2.6.9-67.0.20.ELsmp Kathy Whyte
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.