From Bugzilla Helper: User-Agent: Mozilla/4.79 [en] (X11; U; Linux 2.4.2-2 i686) Description of problem: With the latest kernel updates (both 2.4.9-12 and 2.4.9-21 have been tested here) there seem to be major issues with memory allocation/freeing on systems with >2GB RAM. We have three servers here with 3GB RAM, as well as four others with 2GB RAM. Besides the RAM differences, the systems are identical. The problem that happens on the 3GB systems is whenever user programs try to allocate more than 3GB memory total, things go haywire. In "vmstat", the swap space used suddenly jumps from 0 to 2GB. Then, as the user programs allocate more and more, this number actually starts going *down*. Then, when the user programs exit, most of the memory they used seems to be gone forever. For example, I have written a simple program which allocates memory in a loop and prints how much it has allocated (I will include it with this report). If I freshly reboot the 2.4.9 system, and run this program, it will allocate just under 3GB before it errors out (expected behavior, since the kernel doesn't support allocating more than 3GB). It uses up all the system memory then frees it all up when it exits--that's fine. Now, if I run two instances of this program simultaneously, everything is fine up until they both hit 1.5GB. Suddenly, as seen in "vmstat", "top", and other places, swap space usage jumps immediately from 0 to 2GB, and starts *decreasing* as they continue allocating memory (with heavy swapping since the physical RAM is now exhausted). Besides the weird swap number behavior, that's fine. Now the real problem: when I kill both of those programs, swap space usage (as seen in vmstat, top, etc) remains high, and the free RAM goes to only about 1GB. 2GB of memory seems permanently gone from the system. And this is confirmed when I run a single instance of the original program again. It allocates memory on up to 1GB fine, but when it hits 1GB, heavy swapping starts, as if the system had suddenly become a 1GB machine instead of a 3GB machine. My testing has been with the SMP and Enterprise versions of the 2.4.9 kernels. The 2.4.2 kernel (SMP) that originally came with RedHat 7.1 does not have this problem, and so I have reverted to it. How reproducible: Always Steps to Reproduce: 1. Install a RedHat 7.1 system with the 2.4.9-12 or 2.4.9-21 kernel updates, SMP or Enterprise versions. 2. Run two instances at the same time of my memory allocator program on a system with 3GB RAM. 3. After they've each allocated 2GB (should be 3GB of system memory and 1GB of swap), kill them both. 4. Observe the missing 2GB of system memory using vmstat, top, etc. Run another single instance of the allocator program to prove that only 1GB of physical RAM appears to exist now.
Created attachment 43836 [details] test-addressing.c: my test program for exposing this problem
Investigating..
Upon further probing of the 2GB systems I mentioned, with the 2.4.9 kernel updates, a similar problem appears to happen on them, when total user-allocated memory exceeds 2GB. Again, memory (in this case around 500-1000MB of it) just seems to "disappear", and swap space used goes very high and stays there (and even decreases at times when allocating memory). The problem is easier to spot on 3GB systems though, as there is clearly swapping happening when it shouldn't be, and a really huge amount of memory that seems to just vanish.
In case it matters, the systems I've tested (both the aforementioned 2GB and 3GB) are all dual processor.
I see sort of similar behavior; swap grows and even when the applications exit it remains in use (while holding ram too). It shrinks with time, but not nearly as fast as desired. Looking into the cause
I think we have found a solution for this (well partial. The swap still is reported used but as soon as the kernel requires real memory it's actually freed). Would you be willing to test this ?
Sure.
I am experiencing a similar problem on Red Hat 7.2 with the 2.4.9-* kernels. If I run jobs past core memory into swap, significant memory and swap are still allocated when the jobs finish. Have tested many configuration (most 2 processor) 1GB, 2GB, 3GB, 4GB RAM. All have the same problem. Example: 2P, 4GB RAM, 8GB swap. Run 2 jobs both asking for 2.5GB ( total 5GB ). Memory and swap both push 4GB each. When the jobs finish, both memory and swap are still holding 2.5GB of space each. Eventually our compute farm managed by LSF will not allocate jobs to each machine because free memory is almost non-existent.
Please try http://people.redhat.com/arjanv/testkernels
Downloaded kernel-enterprise-2.4.9-21.4 but it requires modutils-2.4.13 or higher. Were can I get the updated modutils?
You can either --nodeps it or grab it from the url (I just put it there)
Created attachment 45560 [details] Stats from free command during testing 2.4.9-21.4
So the problem is still there. What is the "Swap Cached:" entry in /proc/meminfo for? It seems to alway be about the same size as the memory and swap not released after the test programs finish.
The test version is maybe a little bit better but still exhibits really the same basic problem. If you follow the "Steps to Reproduce" in my original report, on step #4, the program (and system) still gets really sluggish and starts swapping a lot once the program has allocated a little over 1GB.
Anything new to test?
The kernel should give its memory up when you put some load on. (it does in my tests). It's not immediate but once there's some memory pressure the "stale" swap data is the first to go. It's not just fast enough to my taste, but it should be there. We're still looking on making the removal of stale entries happen faster.
I see where swap space gets reused under pressure but the core memory is still a problem. When a jobs completes all of its memory (especially core) should be freed but that is not the case. When you push into swap space and allow the jobs to complete CORE and swap space are still allocated. Any subsequent jobs start allocating core on top of what was leftover from the previous job. Again I do see where the swap gets turned but not core. We run a lot of Electronic Design applications that run between 2 and 3GB of memory so this is a real problem. It also causes problems for our admins and our job schedule that make decisions based on information report back from the kernel. This was not a problem on Red Hat 6.1 and 6.2.
Yes, I don't understand why a program running on a 3GB system (when there's basically nothing else running except the minimal set of system daemons) would need to start heavy swapping and disk I/O when it tries to allocate more than 1-1.5GB, regardless of what was running on the system earlier... I just noticed today that the old 2.4.2 kernel that came stock with 7.1 can exhibit this problem too under extreme circumstances (like when you dip really heavily into swap, perhaps). As gerry.morong mentioned, this was never a problem with the 2.2 kernels (RH7.0 and back). We noticed it when we upgraded our compute servers from 6.2 to 7.1 a few weeks back.
Would like to check in and see if we have an understanding why core memory is not getting released by the kernel. I am anxious to get a resolution to this problem as I have 40 brand new systems sitting idle. The drivers necessary for these new systems really force the use of Red Hat 7.1 or 7.2. Thank you.
So it looks like the 2.4.17 kernel fixes this problem.
Interesting, is there a RedHat 2.4.17 kernel package somewhere or did you compile up a custom version?
FYI, I get the same results that I'm assuming gerry.morong got. I downloaded the standard Linux 2.4.18 kernel release, compiled it up with the same configuration as my RedHat kernels, and no trace of any of these problems...
We've backported the relevant bits of 2.4.18 and the swap behavior seems to be more as you expect: http://people.redhat.com/arjanv/testkernels
In some basic testing here, that last test version seems to perform fine (the same as 2.4.18 as far as I could tell), but there is one "glitch" that isn't in 2.4.18. As soon as swapping starts, the "swap memory used" (as listed in "vmstat" and other places) jumps up to 2GB immediately. Regular 2.4.18 builds up from 0 as you'd expect...