Description of problem: The top utility is reporting incorrect values for memory usage of several long running processes. Version-Release number of selected component (if applicable): procps-2.0.17-10 How reproducible: reported memory usage grows the longer the processes are running. Actual results: 16:32:50 up 14 days, 3:54, 23 users, load average: 0.33, 0.17, 0.06 133 processes: 132 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 10.4% 0.6% 5.2% 0.2% 0.0% 0.2% 182.8% cpu00 0.6% 0.0% 0.3% 0.0% 0.0% 0.0% 99.0% cpu01 9.9% 0.6% 4.9% 0.3% 0.0% 0.3% 83.8% Mem: 509876k av, 495460k used, 14416k free, 0k shrd, 15048k buff 339952k actv, 62768k in_d, 6932k in_c Swap: 1044216k av, 398044k used, 646172k free 106360k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 24717 smithj4 15 0 1591M 1.6G 3720 S 4.6 319.4 248:32 1 gkrellm 5148 smithj4 15 0 905M 894M 14876 S 0.0 179.7 109:55 0 mozilla-bin 6232 smithj4 15 0 805M 785M 5460 S 0.3 157.7 99:32 1 gnome-terminal 4625 root 15 0 870M 582M 7512 S 3.7 117.0 647:44 0 X 4726 smithj4 15 0 467M 463M 5184 S 0.3 93.0 35:02 0 gnome-panel 4472 root 15 0 98968 96M 540 S 0.0 19.4 0:01 0 crond Expected results: The ps command shows that gkrellm for example is using around 17MB of memory instead of the nearly 1.6GB that top shows: # ps aux | grep gkrellm smithj4 24717 2.3 0.9 17080 4924 ? S Oct26 248:50 gkrellm Additional info: Compiled the attachment shown here: https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=106011&action=view and ran the executable: # ./pagesize getpagesize()=4096, PAGE_SHIFT: 12, pgshift: 2
1591M is 319.4% of *physical* RAM, 509876k - I expect this is desired behaviour?
Joe, the problem is not %MEM -- it's probably counted correctly. The problem is SIZE and RSS that "ps" reports right and "top" shows some strange output.
I think problems was resolved (thanks to Jason Smith). Jason runs unstable kernel (see: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=121434#c198) and this kernel produces wrong /proc/<pid>/statm data. Dumps: PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 6232 smithj4 15 0 913M 894M 5300 S 0.4 179.6 101:07 1 gnome-terminal # cat /proc/6232/statm 233769 228990 1325 187 694 228109 2697 ^^^^^^ After conversion from pages to KB it's: python -c "print 228990 << 2" 915960 So 915M, that's almost same as in the "top" output. # cat /proc/6232/stat 6232 (gnome-terminal) S 1 4726 4726 0 -1 4194304 278393 3980530 51621 4394127 560022 46748 42882 17739 15 0 0 0 849277 58626048 5720 ^^^^ 4294967295 134512640 134802800 3221203328 3221202796 45691177 0 0 4096 66800 3222490981 0 0 17 0 0 0 560022 46748 42882 17739 In the "stat" file is probably right value: python -c "print 5720 << 2" 22880 So 22M, -- this "stat" file uses the "ps" utils. You can check all by: # cat /proc/6232/status Name: gnome-terminal State: S (sleeping) Tgid: 6232 Pid: 6232 PPid: 1 TracerPid: 0 Uid: 1829 1829 1829 1829 Gid: 31016 31016 31016 31016 FDSize: 256 Groups: 31016 VmSize: 57252 kB VmLck: 0 kB VmRSS: 22884 kB ^^^^^^ VmData: 36472 kB VmStk: 148 kB VmExe: 284 kB VmLib: 14048 kB SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000001000 SigCgt: 00000000800104f0 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 ... so a problem is the reporter's unstable kernel.
If this is deemed a kernel problem, please reassign to Dave Anderson.
We tested 2.4.21-23.ELsmp (RHEL 3 U4 beta) and found that the RSS value reported by top and ps aux do not agree. Karel Zak determined that /proc/<pid>/stat and /proc/<pid>/statm do not agree on this value. He wrote a handy script to test this, and tried various kernels. The results: It passes on: 2.4.9-e.49smp (people.redhat.com) 2.4.21-11.ELsmp (porkchop) 2.4.26 #1 SMP (Debian) 2.6.8-1.521 (my FC box) 2.4.20-31.9smp (Red Hat Linux release 9 (Shrike)) If fails on 2.4.21-23.ELsmp. Test is available on: http://people.redhat.com/kzak/procps/proc-mem-test.py usage: ps -A -opid= | ./proc-mem-test.py
*** Bug 136630 has been marked as a duplicate of this bug. ***
We see this behavior on kernel-smp-2.4.21-25.EL with an proprietary monitoring application, too. Numbers in "ps vx" are right, top claims rss size of several gbytes after some hours.
Just another "me too." Look at this top listing: PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 1363 user1 15 0 122G 121G 4924 S 0.2 1556.6 318:10 3 prog 1369 user1 15 0 46.0G 45G 4876 S 0.4 586.8 19:05 3 prog 1366 user1 15 0 45.7G 45G 4828 S 0.0 582.2 20:55 1 prog 1360 user1 15 0 21.2G 21G 4912 S 0.0 271.0 9:40 0 prog And it continues like that. I'm sure it goes without saying that these processes were nowhere near this size. This is on the current RHEL3 beta kernel (2.4.21-25.ELsmp). Here's a more specific example...top currently gives the following output for process 8866: PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 8866 user1 20 0 1246M 1.2G 5488 S 2.3 15.5 1:04 3 prog But ps shows the following: user1 8866 8.5 2.8 253844 232456 ? S 11:51 0:51 prog And summing up the sizes from /proc/8866/maps gives 259936256 bytes, or 253844K, which exactly agrees with the ps total. Also, top's displayed size for this process grew from 561M to 1246M during the time it took me to type in this comment--although the ps and /proc/8866/maps values haven't budged. If it helps, here's the output for stat and statm for this same process: # cat /proc/8866/stat 8866 (prog) S 1 8866 8866 0 -1 256 28884 423 2238 704 298 48 1 1 20 0 0 0 7303209 259936256 58206 4294967295 134512640 135480512 3221217376 3221212292 3076385297 0 4096 528384 16395 3222736608 0 0 17 3 0 0 5267 1120 3632 916 # cat /proc/8866/statm 317160 317150 1372 272 263316 53562 785
BTW, Karel's test script fails on our kernel (2.4.21-25.ELsmp). 89 processes show FAILED and just 27 show OK. And one of the failed processes was in fact the python interpreter for the script itself, which obviously hadn't been running for very bloody long. :-)
A fix for this problem was committed to the RHEL3 U4 patch pool Wednesday evening (in kernel version 2.4.21-27.EL).
*** Bug 138101 has been marked as a duplicate of this bug. ***
Yes, 2.4.21-27.ELsmp seem to fix the issue completly, proc-mem-test.py shows all procs OK and in summary "test PASS".
The fix for this problem has also been committed to the RHEL3 U5 patch pool this evening (in kernel version 2.4.21-27.3.EL).
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html
Hi , we have Linux rac5 2.4.21-27.EL #1 SMP Wed Dec 1 21:54:21 EST 2004 ia64 ia64 ia64 GNU/Linux. We have the same problem. RH 3 update 4
In response to comment #16, the problem originally reported in this bug was *fixed* in U4 (in 2.4.21-27.EL). If you're still having a new problem, please open a new bug report with an exact description of the problem and a way to reproduce it. Thanks in advance.
Never mind, I just noticed that you already opened bug 157171. Thanks.