Bug 1404519

Summary: massive OOM problems with kernel 4.8.13
Product: [Fedora] Fedora Reporter: customercare
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 24CC: cz172638, frank, gansalmon, ichavero, itamar, jonathan, j, kernel-maint, madhu.chinakonda, mchehab
Target Milestone: ---Flags: jforbes: needinfo?
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-28 17:12:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
monitor of one of the crashed servers with kernel panic none

Description customercare 2016-12-14 01:37:23 UTC
Created attachment 1231382 [details]
monitor of one of the crashed servers with kernel panic

Description of problem:

Since i upgraded to kernel 4.8.13 , either with FC 23 or FC 24, 
i have MASSIVE problems with the oom killer . 

Servers with 4GB and 8GB ram crashing !!! with Kernel Panics all over my cluster.  



Version-Release number of selected component (if applicable):

4.8.13-100 
4.8.13-200

Actual results:

OOM with 0!! bytes of SWAP used

Expected results:

a smooth usage of RAM and swap, before OOM starts killing the shit out of the server. 


Additional info:


Kernel 4.7.9 had have next to zero problems on the same servers.

Comment 1 Josh Boyer 2016-12-14 01:54:29 UTC
32-bit x86 is a low priority for the Fedora kernel team and relies on greater community effort for support.  You are more likely to get feedback by reporting issues directly upstream.

Comment 2 customercare 2016-12-14 02:01:05 UTC
This issue is not bound to 32bit, it's a memory segementation problem,
as Linus explained in a Kernel ML post from september. I don't think his 16 GB Laptop runs on 32bit :) 

The devs changed something in the oom killer algorithm n the 4.9er series of the kernel, but as your know yourself, it's not available atm inside fedora repos

Comment 3 Frank Crawford 2016-12-27 07:42:02 UTC
This looks like it may be getting some traction.  A patch specifically for OOM issues on 32b systems since 4.8 kernels 
has been proposed here:
https://www.spinics.net/lists/kernel/msg2409778.html

Any chance we can get this pushed out as soon as this or a similar fix lands in the kernel officially (or even earlier)?

Comment 4 Frank Crawford 2016-12-27 08:34:11 UTC
One workaround given in the email thread listed in above is to boot with the kernel option cgroup_disable=memory 

I've tried this on the system that I keep getting OOM messages on, and it seems to have worked.

Give it a go and see if it fixes your issues.

Comment 5 Justin M. Forbes 2017-04-11 14:41:37 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 24 kernel bugs.

Fedora 25 has now been rebased to 4.10.9-100.fc24.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 26, and are still experiencing this issue, please change the version to Fedora 26.

If you experience different issues, please open a new bug report for those.

Comment 6 Justin M. Forbes 2017-04-28 17:12:56 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the 
relevant data from the latest kernel you are running and any data that might have been requested previously.