Description of problem: After approximately 4-8 hours of light desktop use in X, oom-killer is invoked in a spiral which ultimately leads to black screen and unusable desktop, though plenty of RAM remains. Requires hardware reset. Consistent through XFCE, LXDE, KDE. Version-Release number of selected component (if applicable): kernel-3.7.6-201.fc18.i686 xorg-x11-server-Xorg-1.13.2-2.fc18.i686 How reproducible: Always within 4-6 hours of login. Steps to Reproduce: 1. Boot machine 2. Login to X 3. Load terminal, firefox, thunderbird, pidgin, wait 4-6 hours. Actual results: Desktop becomes unresponsive, screen blank, no input at keyboard. Hard reset, logs reveal oom-killer invoked dozens of times. Expected results: Working desktop. Additional info: Fedup upgrade from a fully-working Fedora 17.
Created attachment 703094 [details] Full trace of oom-killer that began recent spiral
Created attachment 703095 [details] List of all invocations of oom-killer in spiral
Created attachment 703096 [details] List of all applications that have trigger oom-killer since 2013-02-06
I have reproduced this booting and leaving the system unattended in runlevel 3. Ensuing attachments reflect system state some 36 hours after boot and no real activity. Tracking ps, free, and vmstat every 60 seconds, and grabbed full system log since boot. Vmstat reveals iowait jacking up approximately 51 minutes before the end.
Created attachment 732547 [details] Full log from boot to OOM death
Created attachment 732549 [details] Output of vmstat every 60s, system state last 40m
Created attachment 732550 [details] Output of free, system state 50m before end
Created attachment 732551 [details] Output of ps, system state 50m before end
the problem in every one of those traces is that you ran out of DMA memory. It doesn't matter that there's memory free, because that memory (highmem) isn't suitable for DMA. Taking just one example.. Mar 17 06:19:26 fritzdesk kernel: [133178.494441] DMA free:3464kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15796kB managed:5816kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:12kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Mar 17 06:19:26 fritzdesk kernel: [133178.502596] Normal free:9336kB min:3720kB low:4648kB high:5580kB active_anon:0kB inactive_anon:0kB active_file:40kB inactive_file:76kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:881880kB managed:831704kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:7936kB slab_unreclaimable:36260kB kernel_stack:1136kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:16567 all_unreclaimable? yes Here it seems that your workload used up all of the 'normal' zone memory, and once this was exhausted it fell back to using up the DMA zone memory too. Once that reached critical levels, the oom killer does something about it. The normal zone had reclaimable slab caches, but that wouldn't have helped any in allocating a DMA page. There's not really anything the kernel can do when this happens other than kill a process that's using up the memory. If it didn't the box would just hang indefinitely waiting for DMA-able memory to become free. You might try disabling all the stuff you don't need (iscsi, ksmtuned etc) but that's probably just going to at best put things off before it inevitably happens again. You *might* be able to make things run a little better by changing up some of the /proc/sys/vm/ sysctls, but the effort involved might not make it a worthwhile exercise. Adding more RAM might cause some more things to move into highmem, freeing up lowmem too.
I am far from an expert in this area, but the odd thing is that this same hardware ran Fedora 17 without issue, and practically every other Fedora before that. I attached the process list and so forth hoping that it might provide some clue, but as I indicated, even a workload of nothing more than boot to runlevel 3 will eventually trigger this, and that is something I have never experienced before on any box. I will try some of your suggestions. Unfortunately, my current less-than-acceptable mitigation is panic on oom, and reboot on panic.
ZONE_NORMAL is just around 830MB in available (managed) memory, of which: - 9MB free - 8MB reclaimable slab - 32MB unreclaimable slab - 1MB kernel stack - active/inactive pages: a few kB That means a total 51MB out of 880MB has been accounted for. The DMA zone has been totally exhausted, too. This means that either some kernel driver or X is eating up your memory. Can you check your kernel to see what is using up all the memory? You may be able to check (and exclude?) the graphics subsystem by looking at the contents of this file: /sys/class/drm/ttm/memory_accounting/kernel/used_memory
After running X for a few hours, nothing too heavy: $ cat /sys/class/drm/ttm/memory_accounting/kernel/used_memory 210
Created attachment 742832 [details] lsmod, after a few hours of X
Created attachment 742833 [details] slabinfo, after a few hours of X
Are you still seeing this issue with an updated F18 and the 3.9.6 or newer kernels?
Still seeing the issue on 3.8.1-201.fc18.i686. Will update to the latest and report back.
I have a different issue with system lockup on 3.9.9-201.fc18.i686, typically 15-45 minutes after boot. "Lockup" consists of frozen X session, no keyboard input, and eventual monitor blank. If at CLI, stack trace usually appears. Attached trace usually makes it to the logs. Updated to 3.9.10-200.fc18.i686 and will report back.
Created attachment 777022 [details] Stack trace on 3.9.9-201
Running now on 3.10.7-100.fc18.i686 and that seems to have resolved the issue. Uptime currently nearing two days, which seemed an impossible feat when I opened this issue: $ uptime 11:36:51 up 1 day, 21:45, 9 users, load average: 0.25, 0.22, 0.17
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs. Fedora 18 has now been rebased to 3.11.4-101.fc18. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19. If you experience different issues, please open a new bug report for those.