Bug 71699
Summary: | malloc() fails: OOM crashes SMP kernel when swap enabled | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | josip | ||||
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 7.3 | ||||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2004-09-30 15:39:50 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
josip
2002-08-16 21:00:45 UTC
Created attachment 71225 [details]
Test program: uses all successfully malloc'd memory
Some of our users' programs do not use malloc() but can still create crashes on our SMP machines with the same symptoms. While fixing malloc() would prevent some crashes, this means that other crashes may still originate in the interaction of swapping and OOM killer under "Out of Memory" conditions on SMP machines. This has nothing to do with malloc nor glibc, it is about overcommiting memory and kernel oom handling. Reassigning. The kernel has run-time configurable overcommit strategy. sysctl vm.overcommit_memory=2 will set a semi-strict overcommit strategy which seems to work pretty ok in practice /proc/sys/vm/overcommit_memory=2 does not help: our SMP machines with swap enabled still crash. Supposedly, the default value overcommit_memory=0 should result in strict malloc() checking, but this clearly is not working right. I have yet to see a malloc() actually return ENOMEM on any Linux host with kernel 2.4.18-{5,5smp}. Failure to properly detect memory situation is a normal priority bug; but crashes of SMP hosts with swap enabled when OOM is encountered are a high priority item. Note that OOM-related crashes result on SMP hosts with swap enabled; if we disable swap on SMP hosts OR run the same test program on uniprocessor hosts with swap enabled, OOM killer usually kills the test program before the machine crashes. We use default values in /proc/sys/vm: bdflush:30 500 0 0 500 3000 60 20 0 kswapd:512 32 8 max-readahead:127 max_map_count:65536 min-readahead:3 overcommit_memory:0 page-cluster:3 pagetable_cache:25 50 My current thinking is that frequent SMP+swapon+OOM crashes are due to a kernel bug which should be fixed at high priority. The failure by malloc() to detect OOM condition in conformance with Unix standards is a normal priority item that should also be fixed. FYI, malloc() operates properly on Suns running Solaris. Reliance on OOM killer instead of reliable OOM condition detection at the point of memory allocation is very troubling. This is an intrinsically unreliable design. 0 is not strict overcommit. At the moment I am unable to duplicate your problem report. On all my test sets the kernel correctly refuses to go out of memory. With echo "2" >/proc/sys/vm/overcommit_memory I see malloc() returned error: Cannot allocate memory Allocated 307 MB echo "3" >/proc/sys/vm/overcommit_memory I see malloc() returned error: Cannot allocate memory Allocated 245 Mb I have also so far been unable to duplicate a hang on SMP boxes. With the default policy I do see out of memory kills as I would expect. As regards policy my personal view agrees with yours and is that the default policy should be "2". Thats something that may change in future Linux releases How much memory does your test system have ? Curious. I did more testing using test code "m", and found that malloc() returns correct error codes on our Pentium IV 1.7GHz machines with 1 GB PC800 RDRAM and two 1 GB swap partitions on /dev/hda{2,3}. However, on our uniprocessor Pentium II 400 MHz machines with 384 MB PC100 SDRAM and two 384 MB swap partitions on /dev/hda{2,3} and the default VM settings, the test program always gets killed by the kernel's OOM killer, never by detecting any malloc() errors: [root@n027 root]# /usr/local/sbin/m Terminated After "m" is killed, /var/log/messages shows: Aug 20 14:18:19 n027 kernel: Out of Memory: Killed process 9488 (m). All our uniprocessor machines run the same kernel and glibc. To make the story more interesting, our SMP servers with SCSI disks can generally run "m" without crashing (OOM killer acts in time), while our SMP compute nodes with IDE disks frequently crash when OOM state is reached. No error codes are returned by malloc() on any SMP machine I've tried, and the OOM killer is activated again: [root@fs1 root]# /usr/local/sbin/m Terminated and the /var/log/messages shows: Aug 20 13:59:51 fs1 kernel: Out of Memory: Killed process 19927 (m). My suspicion is that this problem may be timing related, because: 400 MHz single CPU nodes (Intel 440BX chipset) fail to return malloc() errors. They do not crash since OOM killer terminates the test program, but fast swap drives on ATA-133 interfaces do not help fix the incorrect malloc() behavior. 500 MHz dual CPU nodes (Intel 440BX) fail to return malloc() errors and always crash if swapping is enabled on /dev/hda{2,3} using UDMA-33 interface. Crash is usually avoided if swapping is disabled on the UDMA-33 drive. No crashes are seen with swapping enabled only on /dev/sd{a,b,c}2 or on /dev/hde3 using ATA-133 interface. In all tests, malloc() never returns error codes. 800 MHz dual CPU nodes (ServerWorks LE) fail to return malloc() errors and crash with about 75% probability if swapping is enabled on /dev/hda{2,3} with UDMA-33 interface. Interestingly enough, 4 out of 16 machines survived "m". 1.7 GHz single CPU nodes (Intel 850) return malloc() codes correctly. In other words: malloc() misbehaves if CPU speed is under 1 GHz or so. Crashes result with OOM+swapping on SMP nodes which misbehaves if the disk interface is UDMA-33. All machines run Red Hat kernel 2.4.18-{5,5smp} and have identical binaries installed (except servers, which have extra capabilities). We've got 68 machines of the above types, and the problem not limited to a particular machine or two, so I do not suspect a hardware defect. Most likely, this is a genuine Linux kernel problem. Additional experiments with overcommit management facility on our dual Pentium III 500 MHz nodes with 440BX chipset, 512 MB PC100 SDRAM, and two swap partitions (512MB each) on UDMA/33 drive: (1) When swap is ON, neither overcommit_memory=2 nor overcommit_memory=3 can fix the problem. The test program can get almost all physical RAM, but then instead of continuing to allocate pages from swap space, the machine completely locks up. No error indications are returned by malloc(), but unlike the default value overcommit_memory=0 where the machine simply dies, values 2 or 3 help get some hints about the place where things went wrong. The system console reports traceback starting from page_launder_zone. Obviously, handle_mm_fault is needed in executing the test program, the system tries to allocate pages by executing try_to_free_pages, gets into page_launder, and fails. Another run (with fewer kernel modules loaded) produced slightly different resuts, with kswapd failing and tracebacks complaining about "EIP page_over_rsslimit", then listing refill_inactive_zone in kswapd, and eventually complaining about "Unable to handle kernel NULL pointer dereference". In both cases, it appears that the trouble involves getting a page in swap. Suggestion: Look for timing problems within/near page_launder or refill_inactive on SMP machines where the CPU and the disk are not very fast. (2) When swap is OFF, overcommit_memory=3 reduces the total address space commit to zero, which crashes the system: [root@n033 vm]# swapoff -a [root@n033 vm]# echo "3" >/proc/sys/vm/overcommit_memory [root@n033 vm]# sync bash: fork: Cannot allocate memory bash: xmalloc: subst.c:258: cannot allocate 5 bytes (0 bytes allocated) rlogin: connection closed. Suggestion: Change behavior of overcommit_memory=3 so that when swap is off, the active constraint becomes physical RAM, thus avoiding system lockups. (3) When swap is OFF, overcommit_memory=2 reduces the total address space commit to about half physical RAM, but malloc() returns correct error codes. Suggestion: Swap is not relevant when off. Use the physical RAM limit in vm_enough_memory() when swap is off. If the machine is hanging doing swapping then its not the vm overcommit handling thats a problem. Something else is messed up and the vm overcommit is just a red herring. vm overcommit 3 cannot be changed as you suggest. What other modules are you using (an lsmod would be useful here). I've seen those kind of hangs with people using openafs for example Agreed, the source of crashes is probably swapping, so we've turned swap off for now. If overcommit_memory=3 handling cannot be changed, perhaps it should be disabled when swap is off. Finally, user programs should get better feedback from the kernel on memory status (how much can I get, can I use what I got, can I catch OOM exceptions if I can't, etc.). Regarding loaded modules, we run two different configurations, and the one without clan1k and lanevi is more reliable. These modules operate our Giganet cLAN network, which remains unused during the test but may want to perform path discovery or something similar on its own. Removing these modules and using "paranoid" overcommit_memory=3 (with swap enabled) helps somewhat (typically, dual P3/800 stay up; but 14 to 15 our of 16 dual P3/500 crash). Since we start the test program over the network, it may be that even NIC activity during swapping is a problem. My current hypothesis: Crashes happen and swapping fails when there is competing device activity, e.g. network. Modules typically present on our dual P3/500 nodes: [root@n033 root]# lsmod Module Size Used by Tainted: P w83781d 18592 0 (unused) i2c-proc 8224 0 [w83781d] i2c-isa 1892 0 (unused) i2c-piix4 4996 0 (unused) i2c-core 19360 0 [w83781d i2c-proc i2c-isa i2c-piix4] autofs 12612 0 (autoclean) (unused) nfs 91196 5 (autoclean) lockd 58880 1 (autoclean) [nfs] sunrpc 84180 1 (autoclean) [nfs lockd] tulip 43840 1 lanevi 23036 1 clan1k 26736 3 [lanevi] usb-uhci 25604 0 (unused) usbcore 75904 1 [usb-uhci] ext3 70944 2 jbd 53728 2 [ext3] This problem is still present in Red Hat kernel 2.4.18-10smp. Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |