Bug 59002
Summary: | Major memory issues on system with >2GB RAM with 2.4.9-* kernel updates | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Bob Farmer <redhat> | ||||||
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 7.1 | CC: | gerry.morong, joao, joshua.bakerlepain, redhat, shishz, steven_bird | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i686 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2003-06-08 00:52:46 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Bob Farmer
2002-01-29 03:16:55 UTC
Created attachment 43836 [details]
test-addressing.c: my test program for exposing this problem
Investigating.. Upon further probing of the 2GB systems I mentioned, with the 2.4.9 kernel updates, a similar problem appears to happen on them, when total user-allocated memory exceeds 2GB. Again, memory (in this case around 500-1000MB of it) just seems to "disappear", and swap space used goes very high and stays there (and even decreases at times when allocating memory). The problem is easier to spot on 3GB systems though, as there is clearly swapping happening when it shouldn't be, and a really huge amount of memory that seems to just vanish. In case it matters, the systems I've tested (both the aforementioned 2GB and 3GB) are all dual processor. I see sort of similar behavior; swap grows and even when the applications exit it remains in use (while holding ram too). It shrinks with time, but not nearly as fast as desired. Looking into the cause I think we have found a solution for this (well partial. The swap still is reported used but as soon as the kernel requires real memory it's actually freed). Would you be willing to test this ? Sure. I am experiencing a similar problem on Red Hat 7.2 with the 2.4.9-* kernels. If I run jobs past core memory into swap, significant memory and swap are still allocated when the jobs finish. Have tested many configuration (most 2 processor) 1GB, 2GB, 3GB, 4GB RAM. All have the same problem. Example: 2P, 4GB RAM, 8GB swap. Run 2 jobs both asking for 2.5GB ( total 5GB ). Memory and swap both push 4GB each. When the jobs finish, both memory and swap are still holding 2.5GB of space each. Eventually our compute farm managed by LSF will not allocate jobs to each machine because free memory is almost non-existent. Please try http://people.redhat.com/arjanv/testkernels Downloaded kernel-enterprise-2.4.9-21.4 but it requires modutils-2.4.13 or higher. Were can I get the updated modutils? You can either --nodeps it or grab it from the url (I just put it there) Created attachment 45560 [details]
Stats from free command during testing 2.4.9-21.4
So the problem is still there. What is the "Swap Cached:" entry in /proc/meminfo for? It seems to alway be about the same size as the memory and swap not released after the test programs finish. The test version is maybe a little bit better but still exhibits really the same basic problem. If you follow the "Steps to Reproduce" in my original report, on step #4, the program (and system) still gets really sluggish and starts swapping a lot once the program has allocated a little over 1GB. Anything new to test? The kernel should give its memory up when you put some load on. (it does in my tests). It's not immediate but once there's some memory pressure the "stale" swap data is the first to go. It's not just fast enough to my taste, but it should be there. We're still looking on making the removal of stale entries happen faster. I see where swap space gets reused under pressure but the core memory is still a problem. When a jobs completes all of its memory (especially core) should be freed but that is not the case. When you push into swap space and allow the jobs to complete CORE and swap space are still allocated. Any subsequent jobs start allocating core on top of what was leftover from the previous job. Again I do see where the swap gets turned but not core. We run a lot of Electronic Design applications that run between 2 and 3GB of memory so this is a real problem. It also causes problems for our admins and our job schedule that make decisions based on information report back from the kernel. This was not a problem on Red Hat 6.1 and 6.2. Yes, I don't understand why a program running on a 3GB system (when there's basically nothing else running except the minimal set of system daemons) would need to start heavy swapping and disk I/O when it tries to allocate more than 1-1.5GB, regardless of what was running on the system earlier... I just noticed today that the old 2.4.2 kernel that came stock with 7.1 can exhibit this problem too under extreme circumstances (like when you dip really heavily into swap, perhaps). As gerry.morong mentioned, this was never a problem with the 2.2 kernels (RH7.0 and back). We noticed it when we upgraded our compute servers from 6.2 to 7.1 a few weeks back. Would like to check in and see if we have an understanding why core memory is not getting released by the kernel. I am anxious to get a resolution to this problem as I have 40 brand new systems sitting idle. The drivers necessary for these new systems really force the use of Red Hat 7.1 or 7.2. Thank you. So it looks like the 2.4.17 kernel fixes this problem. Interesting, is there a RedHat 2.4.17 kernel package somewhere or did you compile up a custom version? FYI, I get the same results that I'm assuming gerry.morong got. I downloaded the standard Linux 2.4.18 kernel release, compiled it up with the same configuration as my RedHat kernels, and no trace of any of these problems... We've backported the relevant bits of 2.4.18 and the swap behavior seems to be more as you expect: http://people.redhat.com/arjanv/testkernels In some basic testing here, that last test version seems to perform fine (the same as 2.4.18 as far as I could tell), but there is one "glitch" that isn't in 2.4.18. As soon as swapping starts, the "swap memory used" (as listed in "vmstat" and other places) jumps up to 2GB immediately. Regular 2.4.18 builds up from 0 as you'd expect... |