Bug 137927
Summary: | Process memory usage incorrect in top. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Jason Smith <smithj4> |
Component: | kernel | Assignee: | Rik van Riel <riel> |
Status: | CLOSED ERRATA | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.0 | CC: | anderson, coughlan, george_robinson, jorton, kzak, petrides, rodrigo, v, villapla, wirth |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-12-20 20:56:53 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jason Smith
2004-11-02 21:34:57 UTC
1591M is 319.4% of *physical* RAM, 509876k - I expect this is desired behaviour? Joe, the problem is not %MEM -- it's probably counted correctly. The problem is SIZE and RSS that "ps" reports right and "top" shows some strange output. I think problems was resolved (thanks to Jason Smith). Jason runs unstable kernel (see: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=121434#c198) and this kernel produces wrong /proc/<pid>/statm data. Dumps: PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 6232 smithj4 15 0 913M 894M 5300 S 0.4 179.6 101:07 1 gnome-terminal # cat /proc/6232/statm 233769 228990 1325 187 694 228109 2697 ^^^^^^ After conversion from pages to KB it's: python -c "print 228990 << 2" 915960 So 915M, that's almost same as in the "top" output. # cat /proc/6232/stat 6232 (gnome-terminal) S 1 4726 4726 0 -1 4194304 278393 3980530 51621 4394127 560022 46748 42882 17739 15 0 0 0 849277 58626048 5720 ^^^^ 4294967295 134512640 134802800 3221203328 3221202796 45691177 0 0 4096 66800 3222490981 0 0 17 0 0 0 560022 46748 42882 17739 In the "stat" file is probably right value: python -c "print 5720 << 2" 22880 So 22M, -- this "stat" file uses the "ps" utils. You can check all by: # cat /proc/6232/status Name: gnome-terminal State: S (sleeping) Tgid: 6232 Pid: 6232 PPid: 1 TracerPid: 0 Uid: 1829 1829 1829 1829 Gid: 31016 31016 31016 31016 FDSize: 256 Groups: 31016 VmSize: 57252 kB VmLck: 0 kB VmRSS: 22884 kB ^^^^^^ VmData: 36472 kB VmStk: 148 kB VmExe: 284 kB VmLib: 14048 kB SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000001000 SigCgt: 00000000800104f0 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 ... so a problem is the reporter's unstable kernel. If this is deemed a kernel problem, please reassign to Dave Anderson. We tested 2.4.21-23.ELsmp (RHEL 3 U4 beta) and found that the RSS value reported by top and ps aux do not agree. Karel Zak determined that /proc/<pid>/stat and /proc/<pid>/statm do not agree on this value. He wrote a handy script to test this, and tried various kernels. The results: It passes on: 2.4.9-e.49smp (people.redhat.com) 2.4.21-11.ELsmp (porkchop) 2.4.26 #1 SMP (Debian) 2.6.8-1.521 (my FC box) 2.4.20-31.9smp (Red Hat Linux release 9 (Shrike)) If fails on 2.4.21-23.ELsmp. Test is available on: http://people.redhat.com/kzak/procps/proc-mem-test.py usage: ps -A -opid= | ./proc-mem-test.py *** Bug 136630 has been marked as a duplicate of this bug. *** We see this behavior on kernel-smp-2.4.21-25.EL with an proprietary monitoring application, too. Numbers in "ps vx" are right, top claims rss size of several gbytes after some hours. Just another "me too." Look at this top listing: PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 1363 user1 15 0 122G 121G 4924 S 0.2 1556.6 318:10 3 prog 1369 user1 15 0 46.0G 45G 4876 S 0.4 586.8 19:05 3 prog 1366 user1 15 0 45.7G 45G 4828 S 0.0 582.2 20:55 1 prog 1360 user1 15 0 21.2G 21G 4912 S 0.0 271.0 9:40 0 prog And it continues like that. I'm sure it goes without saying that these processes were nowhere near this size. This is on the current RHEL3 beta kernel (2.4.21-25.ELsmp). Here's a more specific example...top currently gives the following output for process 8866: PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 8866 user1 20 0 1246M 1.2G 5488 S 2.3 15.5 1:04 3 prog But ps shows the following: user1 8866 8.5 2.8 253844 232456 ? S 11:51 0:51 prog And summing up the sizes from /proc/8866/maps gives 259936256 bytes, or 253844K, which exactly agrees with the ps total. Also, top's displayed size for this process grew from 561M to 1246M during the time it took me to type in this comment--although the ps and /proc/8866/maps values haven't budged. If it helps, here's the output for stat and statm for this same process: # cat /proc/8866/stat 8866 (prog) S 1 8866 8866 0 -1 256 28884 423 2238 704 298 48 1 1 20 0 0 0 7303209 259936256 58206 4294967295 134512640 135480512 3221217376 3221212292 3076385297 0 4096 528384 16395 3222736608 0 0 17 3 0 0 5267 1120 3632 916 # cat /proc/8866/statm 317160 317150 1372 272 263316 53562 785 BTW, Karel's test script fails on our kernel (2.4.21-25.ELsmp). 89 processes show FAILED and just 27 show OK. And one of the failed processes was in fact the python interpreter for the script itself, which obviously hadn't been running for very bloody long. :-) A fix for this problem was committed to the RHEL3 U4 patch pool Wednesday evening (in kernel version 2.4.21-27.EL). *** Bug 138101 has been marked as a duplicate of this bug. *** Yes, 2.4.21-27.ELsmp seem to fix the issue completly, proc-mem-test.py shows all procs OK and in summary "test PASS". The fix for this problem has also been committed to the RHEL3 U5 patch pool this evening (in kernel version 2.4.21-27.3.EL). An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html Hi , we have Linux rac5 2.4.21-27.EL #1 SMP Wed Dec 1 21:54:21 EST 2004 ia64 ia64 ia64 GNU/Linux. We have the same problem. RH 3 update 4 In response to comment #16, the problem originally reported in this bug was *fixed* in U4 (in 2.4.21-27.EL). If you're still having a new problem, please open a new bug report with an exact description of the problem and a way to reproduce it. Thanks in advance. Never mind, I just noticed that you already opened bug 157171. Thanks. |