From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040805 Firefox/0.9.2 Description of problem: The machine Athlon64 with 256MB of RAM and 512MB of swap has just yum upgrade running through an ssh. A second ssh connection runs top to check how greedy yum is. Then the OOM starts and kills the second sshd. Obviously it was not the application consuming memory. Yum was around 220MB of virtual memory use. The last top information confirms the oom was started abusively: top - 22:22:18 up 4:27, 3 users, load average: 1.42, 1.12, 0.61 Tasks: 46 total, 1 running, 45 sleeping, 0 stopped, 0 zombie Cpu(s): 10.6% us, 2.2% sy, 0.0% ni, 0.0% id, 87.2% wa, 0.0% hi, 0.0% si Mem: 187388k total, 185292k used, 2096k free, 224k buffers Swap: 506036k total, 120844k used, 385192k free, 4008k cached Connection to test64 closed by remote host. Connection to test64 closed. RES SHR S %CPU %MEM TIME+ COMMAND there was still ample free swap top was updating every second. I don't see anything in that system consuming 380MB of swap in a second or so. Syslog logged the following: Aug 12 16:16:56 test64 kernel: oom-killer: gfp_mask=0xd2 Aug 12 16:16:57 test64 kernel: DMA per-cpu: Aug 12 16:16:58 test64 kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 12 16:16:58 test64 kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 12 16:16:59 test64 kernel: Normal per-cpu: Aug 12 16:16:59 test64 kernel: cpu 0 hot: low 28, high 84, batch 14 Aug 12 16:16:59 test64 kernel: cpu 0 cold: low 0, high 28, batch 14 Aug 12 16:17:00 test64 kernel: HighMem per-cpu: empty Aug 12 16:17:01 test64 kernel: Aug 12 16:17:01 test64 kernel: Free pages: 2244kB (0kB HighMem) Aug 12 16:17:01 test64 kernel: Active:42474 inactive:73 dirty:0 writeback:1 unstable:0 free:561 slab:2218 mapped:42472 pagetables:571 Aug 12 16:17:01 test64 kernel: DMA free:1068kB min:28kB low:56kB high:84kB active:11184kB inactive:40kB present:16384kB Aug 12 16:17:01 test64 kernel: protections[]: 14 252 252 Aug 12 16:17:02 test64 kernel: Normal free:1176kB min:476kB low:952kB high:1428kB active:158712kB inactive:252kB present:245440kB Aug 12 16:17:02 test64 kernel: protections[]: 0 238 238 Aug 12 16:17:02 test64 kernel: HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB Aug 12 16:17:02 test64 kernel: protections[]: 0 0 0 Aug 12 16:17:02 test64 kernel: DMA: 15*4kB 2*8kB 0*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1068kB Aug 12 16:17:02 test64 kernel: Normal: 54*4kB 2*8kB 1*16kB 1*32kB 0*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1176kB Aug 12 16:17:02 test64 kernel: HighMem: empty Aug 12 16:17:02 test64 kernel: Swap cache: add 749787, delete 727050, find 167131/312132, race 0+0 Aug 12 16:17:02 test64 kernel: Out of Memory: Killed process 3084 (sshd). Aug 12 16:21:40 test64 sshd(pam_unix)[4901]: session opened for user root by root(uid=0) I relogged 5mn later finding the connection closed ... Daniel Version-Release number of selected component (if applicable): kernel-2.6.7-1.515 How reproducible: Didn't try Actual Results: OOM killed a process and the wrong process Expected Results: OOM not started in that situation, and should kill the right process instead (assuming there is a right process - yum - which is another hairy issue). Additional info:
I experienced something similar on i686 (non-SMP) with this particular kernel version. The problem I had was that OOM was invoked because the load just went through the roof. I use i686, so it might be a bit more generic problem and not just x86__64. I can provide my /var/log/messages (private e-mail) if necessary.
I posted a possible patch for this problem today: http://lkml.org/lkml/2004/10/25/357
OK, a better patch was created upstream and is in the current kernel. The other two bugs regarding this problem were closed already, but apparently we forgot about this one ;)