Created attachment 1198061 [details] Extract of /var/log/messages Description of problem: I can no longer work! Since updating to 4.7.2 oom-killer is repeatedly killing firefox and my KVM virtual machine. I am also running evolution and copying some files to an NTFS external USB drive. Version-Release number of selected component (if applicable): kernel-4.7.2-201.fc24.x86_64 kernel-core-4.7.2-201.fc24.x86_64 kernel-devel-4.7.2-201.fc24.x86_64 kernel-headers-4.7.2-201.fc24.x86_64 kernel-modules-4.7.2-201.fc24.x86_64 kernel-modules-extra-4.7.2-201.fc24.x86_64 How reproducible: It has killed my VM three times in the past 2 hours, and firefox once Steps to Reproduce: 1. Boot. Start evolution, firefox, KVM Guest, copy files with Nemo to external HDD 2. Work normally 3. Out of the blue the KVM Guest is killed. Actual results: Expected results: This did not happen in kernel 4.6. I expect a stable experience. Additional info:
We also have this problem since we updated the kernel on Fedora 23 from 4.6.4-201 to 4.7.3-100. There is enough available memory free (14 GB) but oom killer kills our named daemon and we also saw it with different processes. When we use the old 4.6.4 kernel again this problem did not happen so it looks like the problem started in de 4.7 kernels. We now upgraded to Fedora 24 with the latest 4.7.9-200 kernel and now this problem is back again so its not solved yet. This thread from Linux Torvalds is also about the oom killer not behaving correctly since kernel 4.7: http://www.spinics.net/lists/linux-mm/msg113661.html This problem is not specific for Firefox and Qemu-system-x86 so maybe the title should be changed.
Some additional logs of oom-killer: http://pastebin.com/avbh4UH4
I have had this problem on multiple 4.7 kernels and still have it on 4.8.4-200.fc24.x86_64. For me it normally kills virtualbox virtual machines. I have a 16GB machine and typically when OOM killer kills vbox, typically there is around 95% swap free and about 10GB of RAM used by "buff/cache" and about 6GB of RAM total in use (as reported by top). I've attached a /var/log/messages extract as well.
Created attachment 1217855 [details] /var/log/messages extract of OOM Killer in action
Comment on attachment 1217855 [details] /var/log/messages extract of OOM Killer in action Total system RAM 16GB, Total Swap 10GB. At the time of the kill, approx RAM in use 6GB, RAM used by buff/cache 10GB, Swap free >9GB
Created attachment 1217919 [details] Extract of /var/log/messages (same as pastebin)
Of interest I have also noticed that on an almost regular basis (something like every 15-30 minutes) I get a huge surge in the number of kworker processes. My system seems to normally sit at around 100 kworker processes, but the regular surge is to around 1200 kworker processes. This does not seem to alter the amount of RAM allocated or free or buffers in any large way. For that "problem", I found this link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1626564 This talks about using SLUB vs SLAB, which being related to memory allocation got me wondering about this bug which is also related to memory allocation. As of writing this comment I am running kernel 4.8.6-201.fc24.x86_64, which apparently is configured to use SLUB so the above "bug" is not directly relevant - it just seems curious to me how I am experiencing similar symptoms (large number of kworkers) and also this bug with memory allocation problems. grep -iE 'sl[aou]b' /boot/config-$(uname -r) CONFIG_SLUB_DEBUG=y # CONFIG_SLAB is not set CONFIG_SLUB=y CONFIG_SLAB_FREELIST_RANDOM=y CONFIG_SLUB_CPU_PARTIAL=y CONFIG_SLABINFO=y # CONFIG_SLUB_DEBUG_ON is not set # CONFIG_SLUB_STATS is not set
Now testing again with the newest kernel 4.7.9-200.fc24.x86_64 but without the containers which were running on this system. The oom-killer problem has not occurred yet (just testing for a day) but I think this is interesting to share.
I have used kernel 4.7.9-200.fc24.x86_64 and the problem manifested itself. I am currently using 4.8.6-201.fc24.x86_64 and the problem still happened. I used to get this problem up to multiple times a day and at least once a week. I may have found a work around as below: For 9 days now I have been flushing the buffer/cache every 2 hours using "sync && echo 1 > /proc/sys/vm/drop_caches" and the problem has not happened again. For my system 2 hours means that the buffer caches never use all available memory and hence I assume that allocations always work properly and hence the OOM killer is never needed. I used to have around 10GB in buffer/cache. Now it only gets to around 6GB, always leaving about 4GB totally free memory.
Kernel: 4.7.10-100.fc23.i686+PAE Ram: 10 GB Utilization of Ram: 1.5 GB OOM Killer randomly kills processes out of the blue. Nov 14 00:20:53 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 00:50:42 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:06:40 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:26:56 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:27:16 s36 kernel: proftpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:27:23 s36 kernel: proftpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:27:31 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:27:38 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:27:41 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:27:45 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:27:47 s36 kernel: xe-update-guest invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:27:52 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:28:02 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:28:20 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:28:25 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:30:02 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:30:22 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:32:36 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:32:43 s36 kernel: exim invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:33:06 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:42:19 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:42:54 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 02:43:18 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 03:33:31 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 06:15:10 s36 kernel: /usr/sbin/munin invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 06:15:14 s36 kernel: exim invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 06:15:20 s36 kernel: kworker/u8:2 invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 06:15:25 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 06:15:28 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 06:15:36 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 06:15:49 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 06:15:53 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 06:16:41 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 06:16:48 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 07:00:02 s36 kernel: bash invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 07:42:08 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 08:54:06 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 09:00:00 s36 kernel: dovecot invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 14 09:03:47 s36 kernel: exim invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 16 02:31:49 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 16 10:48:01 s36 kernel: java invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 16 10:51:13 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 16 14:26:50 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 16 15:05:40 s36 kernel: systemd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 16 16:07:32 s36 kernel: exim invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 16 17:02:00 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 16 17:46:42 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 16 20:28:50 s36 kernel: exim invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 16 22:09:07 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 16 22:35:00 s36 kernel: dovecot invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 16 23:27:27 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 16 23:50:03 s36 kernel: bash invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 16 23:50:41 s36 kernel: exim invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 16 23:50:52 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 17 00:20:28 s36 kernel: java invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 17 02:23:51 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 17 02:23:58 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 17 02:23:58 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 17 02:26:24 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 17 06:17:06 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 17 06:17:08 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 17 06:17:27 s36 kernel: exim invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 17 06:17:34 s36 kernel: proftpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 17 06:18:29 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 17 06:19:01 s36 kernel: exim invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 17 09:15:09 s36 kernel: httpd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 17 09:59:52 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 Nov 17 10:29:24 s36 kernel: mysqld invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=1, oom_score_adj=0 This needs to be fixed asap.
Created attachment 1221869 [details] Extract of /var/log/messages 20161117 with another oom-killer but without containers running on the system
Created attachment 1225684 [details] Oom-killer killed named again. Extract of the logs
(In reply to Jasper Siero from comment #12) > Created attachment 1225684 [details] > Oom-killer killed named again. Extract of the logs The OOM-Killer does not target daemons by name, it selects them by the following conditions: 1. the process is relativly new in the processlist 2. it has gained a lot a memory in this "short" time. So it will kill ANY process, that meets this logic.
Thanks I understand. Named is usually the process which is being killed but you are right that its because of the rules you mentioned. The numbers and statistics why oom-killer did this can be found in the logs. Oom killer should not kill anything on the machine because there is enough memory available.
https://www.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.8.12: commit 7838fbe25a95ce2cd6e8ae27a76d369365da89d4 Author: Michal Hocko <mhocko> Date: Tue Nov 29 17:25:15 2016 +0100 mm, oom: stop pre-mature high-order OOM killer invocations 31e49bfda184 ("mm, oom: protect !costly allocations some more for !CONFIG_COMPACTION") was an attempt to reduce chances of pre-mature OOM killer invocation for high order requests. It seemed to work for most users just fine but it is far from bullet proof and obviously not sufficient for Marc who has reported pre-mature OOM killer invocations with 4.8 based kernels. 4.9 will all the compaction improvements seems to be behaving much better but that would be too intrusive to backport to 4.8 stable kernels. Instead this patch simply never declares OOM for !costly high order requests. We rely on order-0 requests to do that in case we are really out of memory. Order-0 requests are much more common and so a risk of a livelock without any way forward is highly unlikely.
I am fighting the same issue after upgrading a 32bit PAE KVM guest to F25. It appears you have the same old problem (many) others are having with still using a PAE kernel. 4.8.8-100.fc23 originally started the problem for me (vs some earlier F23 kernel) An older ticket from when the problem seems to have started https://bugzilla.redhat.com/show_bug.cgi?id=1075185 For my testing I have munin logging the guest and I am now at only 3GB of 10GB of (guest assigned) memory, no swap used... and rsync, Tomcat, and anything else that moves now gets nailed. The journalctl logs from the guest look reasonable as well (no actual low memory). Log info available if anyone wants it. I am testing: echo 1 > /proc/sys/vm/overcommit_memory All my 64bit F25 hosts and guests are fine. Only the 32bit PAE guest is getting OOM killed.
I don't think it's the same bug/problem you mentioned because we are not using a pae kernel and running 64 bit since the original installation, Fedora is running on a physical machine (not a vm). The problem started with the new 4.7 kernel (4.6 kernel runs without problems).
the kernel devs introduced a new oom algorithm with 4.7, which will be replaced in 4.9 with something more aggressive( back to the old behavior ). You could tryout a 4.9rcx kernel.
(In reply to Jasper Siero from comment #17) > I don't think it's the same bug/problem you mentioned because we are not > using a pae kernel and running 64 bit since the original installation, > Fedora is running on a physical machine (not a vm). The problem started with > the new 4.7 kernel (4.6 kernel runs without problems). Jasper, interesting to know if the newer OOM killer is at fault or possibly PAE. Sorry I didn't notice the x86_64 in the kernel list but it wasn't in the header. I am guessing given the OOM killer is now dormant that it was OOM vs just PAE. Ironic though that none of my x86_64 hosts or guests experienced the OOM issue. I would suggest to anyone having the problem try after rebooting: # cat /proc/sys/vm/overcommit_memory assuming it is 0... # echo 1 > /proc/sys/vm/overcommit_memory For me the VM that had the problem is now stable. Post updating the system I didn't go above 4GB of 10GB of memory assigned before OOM started randomly killing. Now the VM is back to its old stable self after changing overcommit_memory. Hopefully the 4.9 kernel helps long term.
I downgraded from the PAE kernel to the vanilla i686 kernel and have observed at least one OOM kill with that kernel (kernel-4.8.11-200.fc24.i686) although it did resolve the other problem I was having with the PAE kernel (very slow disk writes: 1MB/s to an SSD compared to ~70MB/s with the i686 kernel on the same hardware). I've tried setting vm.overcommit_memory and taken out my cron job for drop_caches and I'll see if that helps.
Created attachment 1230675 [details] console log from oom Same problem with kernel-4.8.12-200.fc24.i686. I ran a "free; sleep 60" loop on the console and rsynced all the file systems to another machine (the regular backup). After about 10 minutes the OOM killer is invoked: [ 1291.074643] rsync invoked oom-killer: gfp_mask=0x2420848(GFP_NOFS|__GFP_NOFAIL|__GFP_HARDWALL|__GFP_MOVABLE), order=0, oom_score_adj=0
Jeff/Darren & other PAE people, please see my post here for a workaround re: slow I/O: https://muug.ca/pipermail/roundtable/2016-June/004669.html I'm pretty sure your bugs are unrelated (or at lease significantly different from) Louis' (original poster) bug, so you should start a new bug if you haven't already. I personally have stopped filing PAE bugs because no kernel dev cares anymore, especially if you are using >4GB RAM. If you can get them to care, great! Otherwise I'm looking for ways to do remote, headless updates from 32-PAE to 64-bit to get out of PAE completely.
Two more updates on the problem. Maybe this will help someone until the OOM killer/kernel is fixed. 1. (same 10GB KVM guest system as above) running 4.8.10-300.fc25.i686+PAE with cat /proc/sys/vm/overcommit_memory = 1. Still OOM kills. The best (ugly) hack so far is: cat /etc/crontab * 0,2,3,4,5,6,7,9,11,13,16,18,20 * * * root sync && echo 1 > /proc/sys/vm/drop_caches [Clear the disk cache during the times it is expected to be expanding cache use.) Disk read I/O is now a lot higher but the last OOM kill was 12/9 before adding the cron entry above. 11 days without an workplace OOM kill!] 2. A 2nd physical machine running 4.8.8-300.fc25.i686 (non PAE) on a 4GB rsync server is now getting OOM kills. I updated it to the newest release 4.8.13-300.fc25.i686 to see if it keeps killing... (no overcommit_memory memory changes made). [Given an upgrade from i386 to x86_64 doesn't exist, but it being a big filesystem it will be a while before I blow it away just to run a 64bit kernel.] Summary: I agree it may not be a PAE problem but a i386 issue that this and other tickets are referencing in general. I have not seen any OOM kills on any x86_64 machines (physical or virtual guest).
(In reply to Trevor Cordes from comment #22) > Jeff/Darren & other PAE people, please see my post here for a workaround re: > slow I/O: > https://muug.ca/pipermail/roundtable/2016-June/004669.html Wow, that makes 2 orders of magnitude difference! The machine has 8G, and since I could try this without rebooting: # uname -r 4.10.0-0.rc0.git4.1.vanilla.knurd.1.fc24.i686+PAE # cat /proc/sys/vm/highmem_is_dirtyable 0 # dd if=/dev/zero of=/zero bs=1M count=8 8388608 bytes (8.4 MB, 8.0 MiB) copied, 3.41237 s, 2.5 MB/s # echo 1 >/proc/sys/vm/highmem_is_dirtyable # dd if=/dev/zero of=/zero bs=1M count=8 8388608 bytes (8.4 MB, 8.0 MiB) copied, 0.04042 s, 208 MB/s > I'm pretty sure your bugs are unrelated (or at lease significantly different > from) Louis' (original poster) bug Actually I think they might be tangentially related: slow paging caused by poor IO performance might be driving up the memory pressure. Anyway, thanks for the tip!
> [Given an upgrade from i386 to x86_64 doesn't exist, but it being a big > filesystem it will be a while before I blow it away just to run a 64bit > kernel.] Because of the situation and that i686 sometime in the future will be removed entirely, i played a bit how to more or less autoupgrade to 64bit. The fasted way is to safe your data ( dump sql to a file ), poweroff the vm, get direct disk access and move / to /old_system then move a fresh 64bit template on the vm, move /old_system/etc/fstab back to /etc/ , adjust passwd, groups and shadow and start it up. Took me 5 Minutes. restore /home, install your rpm packages and reimport the SQL Databases. You also could use dnf to install 64bit packages, change the boot entry to 64bit and rework you throu a lot of binarydata that needs to be removed, including the old rpm packages. People did that. But i believe the "fresh" way it easier.
1) Darren I tested your settings in a guest VM: 80*1MB file, /proc/sys/vm/highmem_is_dirtyable=0: 604 MB/s, 572 MB/s 80*1MB file, /proc/sys/vm/highmem_is_dirtyable=1: 855 MB/s, 875 MB/s 8000*1MB file, /proc/sys/vm/highmem_is_dirtyable=0: 33.8 MB/s, 89.3 MB/s 8000*1MB file, /proc/sys/vm/highmem_is_dirtyable=1: 117 MB/s, 120 MB/s So for smaller linear writes ~30% increase. 10GB memory allocated KVM guest 4.8.10-300.fc25.i686+PAE on top of a 4.8.12-300.fc25.x86_64 physical machine with raw LVM guest partitions on top of MD mirrored 3TB drives. Using: sync;sleep 5; sync echo 0 >/proc/sys/vm/highmem_is_dirtyable /bin/time dd if=/dev/zero of=/data/zero bs=1M count=80 time sync rm -f /data/zero sync;sleep 5; sync echo 1 >/proc/sys/vm/highmem_is_dirtyable /bin/time dd if=/dev/zero of=/data/zero bs=1M count=80 time sync rm -f /data/zero sync;sleep 5; sync echo 0 >/proc/sys/vm/highmem_is_dirtyable /bin/time dd if=/dev/zero of=/data/zero bs=1M count=80 time sync rm -f /data/zero sync;sleep 5; sync echo 1 >/proc/sys/vm/highmem_is_dirtyable /bin/time dd if=/dev/zero of=/data/zero bs=1M count=80 time sync rm -f /data/zero 2) customercare my problem of 32bit guests is certificating the older 32bit C-code in a 64bit world, not something that I want to take on now. -My what looks like a show-stopper is I can no longer use 'cp -al' on a USB drive plugged into a 32bit PAE machine on a rsnapshot server. The painful problem of not being able to 'upgrade' from 32bit to 64bit in place is 13M+ inodes and ~1TB used. I am to the point of installing and running with 2 new (mirrored) OS drives and keeping the existing 2 drive array for the files. [Yuck. The problem besides buying another 3TB drive (or two) or running 4 drives is the time to do this vs someone deciding to drop i686 and/or PAE support and install an overly aggressive OOM killer at the same time.]
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 24 kernel bugs. Fedora 25 has now been rebased to 4.10.9-100.fc24. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 26, and are still experiencing this issue, please change the version to Fedora 26. If you experience different issues, please open a new bug report for those.
*********** MASS BUG UPDATE ************** This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.