Bug 851508 - Bogus (excessive) /proc/pid/status VmSize,VmmRSS values (& ps output) for threaded processes
Bogus (excessive) /proc/pid/status VmSize,VmmRSS values (& ps output) for thr...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.6
x86_64 Linux
unspecified Severity medium
: rc
: ---
Assigned To: Red Hat Kernel Manager
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-24 07:49 EDT by paul manser
Modified: 2015-07-13 19:49 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-10-04 05:44:58 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
pgm: malloc 2gb, create 'n' threads, do nothing, quit after 100s (1.82 KB, text/plain)
2012-08-24 07:49 EDT, paul manser
no flags Details
corrected test case - uncommented comment in first version (1.83 KB, text/plain)
2012-08-24 09:14 EDT, paul manser
no flags Details

  None (edit)
Description paul manser 2012-08-24 07:49:46 EDT
Created attachment 606839 [details]
pgm: malloc 2gb, create 'n' threads, do nothing, quit after 100s

Description of problem:
Multi-threaded programs like attached example show incorrect/excessive
reported VmSize and VMRSS values in /proc/pid/status file and ps command
output. We have large applications that show the problem but the small
attached example also shows it.
On multiple hardware with same cpu architecture (but varying RAM/disk sizes)
problem always occurs on some machines but not others. 

Version-Release number of selected component (if applicable):

example machine 
lc-cam-512 ~ > uname -a
Linux lc-cam-512 2.6.18-238.5.1.el5 #1 SMP Mon Feb 21 05:52:39 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

Issue has not been seen on the few 5.8 version machines we have available

How reproducible:

compile and run attached example program, monitor associated 
/proc/pid/status file and/or ps output 
Program allocates 2GB memory and starts 'n'(first argument) threads
the program runs for 100s, both the threads and master sleeping 
with a wake-up every second, no other processing.

compile with : 
gcc thread_experience.c -D_REENTRANT -lpthread 

Steps to Reproduce:
1. a.out 5 
2. periodically inspect /proc/pid/status during execution
3.
  
Actual results:

ps -o pid,ppid,pcpu,rss,size,vsize,cputime,lstart -Lp 8205
  PID  PPID %CPU   RSS    SZ    VSZ     TIME                  STARTED
 8205  8203  0.0   608 23068996 23074668 00:00:00 Fri Aug 24 12:43:49 2012
 8205  8203  0.0   608 23068996 23074668 00:00:00 Fri Aug 24 12:43:49 2012
 8205  8203  0.0   608 23068996 23074668 00:00:00 Fri Aug 24 12:43:49 2012
 8205  8203  0.0   608 23068996 23074668 00:00:00 Fri Aug 24 12:43:49 2012
 8205  8203  0.0   608 23068996 23074668 00:00:00 Fri Aug 24 12:43:49 2012
 8205  8203  0.0   608 23068996 23074668 00:00:00 Fri Aug 24 12:43:49 2012

Name:   a.out
State:  S (sleeping)
SleepAVG:       78%
Tgid:   823
Pid:    823
PPid:   821
TracerPid:      0
Uid:    26235   26235   26235   26235
Gid:    20      20      20      20
FDSize: 256
Groups: 20 1165 51256
VmPeak: 23074668 kB
VmSize: 23074668 kB
VmLck:         0 kB
VmHWM:       604 kB
VmRSS:       604 kB
VmData: 23068900 kB
VmStk:        96 kB
VmExe:         4 kB
VmLib:      1536 kB
VmPTE:        64 kB
StaBrk: 12670000 kB
Brk:    12691000 kB
StaStk: 7fffe24f7160 kB
Threads:        6

 Expected results:

ps -o pid,ppid,pcpu,rss,size,vsize,cputime,lstart -Lp 1233
  PID  PPID %CPU   RSS    SZ    VSZ     TIME                  STARTED
 1233   840  0.0   596 2107708 2113380 00:00:00 Fri Aug 24 12:42:10 2012
 1233   840  0.0   596 2107708 2113380 00:00:00 Fri Aug 24 12:42:10 2012
 1233   840  0.0   596 2107708 2113380 00:00:00 Fri Aug 24 12:42:10 2012
 1233   840  0.0   596 2107708 2113380 00:00:00 Fri Aug 24 12:42:10 2012
 1233   840  0.0   596 2107708 2113380 00:00:00 Fri Aug 24 12:42:10 2012
 1233   840  0.0   596 2107708 2113380 00:00:00 Fri Aug 24 12:42:10 2012


Name:   a.out
State:  S (sleeping)
SleepAVG:       98%
Tgid:   28425
Pid:    28425
PPid:   28374
TracerPid:      0
Uid:    26235   26235   26235   26235
Gid:    20      20      20      20
FDSize: 64
Groups: 20 1165 51256
VmPeak:  2113380 kB
VmSize:  2113380 kB
VmLck:         0 kB
VmHWM:       596 kB
VmRSS:       596 kB
VmData:  2107620 kB
VmStk:        88 kB
VmExe:         4 kB
VmLib:      1536 kB
VmPTE:        68 kB
StaBrk: 04e81000 kB
Brk:    04ea2000 kB
StaStk: 7fff6c923a30 kB
Threads:        6



Additional info:
Comment 1 paul manser 2012-08-24 09:14:08 EDT
Created attachment 606858 [details]
corrected test case - uncommented comment in first version
Comment 2 paul manser 2012-08-24 10:51:57 EDT
Have now identified some additional details, it seems the VmSize,VmmRSS (& ps output) are correct and that the same binary is sometimes showing an
additional 4GB(4194304 kB)  virtual space allocation for each thread,
which makes the total about 2 + 4*5 =22GB in the above example 

The addtional space is being allocated only on some machines and then only
if the job is being executed by the LSF (form platform / IBM) job 
distribution system in which case the job has a process tree like the following

res,12350 -d /tools/lsf/Current/conf -m xl-cbga-03 /home/pmanser/.lsbatch/1345818946.181820
  |-1345818946.1818,12377 /home/pmanser/.lsbatch/1345818946.181820
  |   `-a.out,12379 5
  |       |-{a.out},12380
  |       |-{a.out},12381
  |       |-{a.out},12382
  |       |-{a.out},12383
  |       `-{a.out},12384
  `-{res},12378

if the program is run directly on the machine from a standard linux shell 
I see no issue.

So somehow the executing lsf program is causing the behaviour of the 
threads to be modified.
Inspection of the dynamically linked libraries 

show the same liraries when jobs is executed direcly and via lsf eg.
lc-cam-513 spice_leakage > grep lib64 /proc/12914/maps
3752600000-375261c000 r-xp 00000000 08:01 24674657                       /lib64/ld-2.5.so
375281b000-375281c000 r--p 0001b000 08:01 24674657                       /lib64/ld-2.5.so
375281c000-375281d000 rw-p 0001c000 08:01 24674657                       /lib64/ld-2.5.so
3752e00000-3752f4e000 r-xp 00000000 08:01 24674668                       /lib64/libc-2.5.so
3752f4e000-375314e000 ---p 0014e000 08:01 24674668                       /lib64/libc-2.5.so
375314e000-3753152000 r--p 0014e000 08:01 24674668                       /lib64/libc-2.5.so
3753152000-3753153000 rw-p 00152000 08:01 24674668                       /lib64/libc-2.5.so
3753a00000-3753a16000 r-xp 00000000 08:01 24674670                       /lib64/libpthread-2.5.so
3753a16000-3753c15000 ---p 00016000 08:01 24674670                       /lib64/libpthread-2.5.so
3753c15000-3753c16000 r--p 00015000 08:01 24674670                       /lib64/libpthread-2.5.so
3753c16000-3753c17000 rw-p 00016000 08:01 24674670                       /lib64/libpthread-2.5.so


Thanks for any ideas on likely cause.

Regards, Paul
Comment 3 paul manser 2012-09-10 06:49:04 EDT
Hi,
   We believe we identified cause.

If the process limit stacksize is set to a discrete value from executing shell, as opposed to unlimited; then increased virtual memory allocation of stacksize
multiplied by thread count is reserved. If however stacksize limit
is set to unlimited then no additional virtual memory is taken (per thread)

Regards, Paul
Comment 4 Minto Joseph 2012-10-04 05:44:58 EDT
I am able to reproduce the issue by increasing the stack size using ulimit,

# ulimit -s 1000240
# ./a.out 5
[..]

# ps -o pid,ppid,pcpu,rss,size,vsize,cputime,lstart -Lp 31913
  PID  PPID %CPU   RSS    SZ    VSZ     TIME                  STARTED
31913 31779  0.0   584 7098644 7104312 00:00:00 Wed Oct  3 11:49:57 2012
31913 31779  0.0   584 7098644 7104312 00:00:00 Wed Oct  3 11:49:57 2012
31913 31779  0.0   584 7098644 7104312 00:00:00 Wed Oct  3 11:49:57 2012
31913 31779  0.0   584 7098644 7104312 00:00:00 Wed Oct  3 11:49:57 2012
31913 31779  0.0   584 7098644 7104312 00:00:00 Wed Oct  3 11:49:57 2012
31913 31779  0.0   584 7098644 7104312 00:00:00 Wed Oct  3 11:49:57 2012


This is an expected behaviour.

From 'man  pthread_create',

On Linux/x86-32, the default stack size for a new thread is 2 megabytes.  Under the NPTL threading implementation, if the RLIMIT_STACK soft resource  limit  at  the  time  the  program  started has any value other than "unlimited", then it determines the default stack size of new threads.  Using pthread_attr_setstacksize(3), the stack size attribute can be explicitly set in the attr argument used to create a thread, in order to obtain a stack size other than the default.

Closing the bugzilla.
Comment 5 Chris Pugmire 2015-07-13 19:49:35 EDT
This bug/issue is incorrectly diagnosed, I spent many hours with a similar problem before discovering the true cause has nothing to do with setstacksize(), I'm posting this additional info here in case others strike the same problem.

The problem is that malloc() may create multiple arena's for each new thread based on the first threads malloc usage, the result can be that each thread gets an address space similar to the main program, e.g. your main thread is using 100mb, then each thread will get an address space of another 100mb.  

The impact of this on the system is hard to determine, but it does seem to cause performance issues in some situations.  

It appears doing this at the start of your program can stop this from occurring.

    rv = mallopt(-7, 1);  // M_ARENA_TEST
    rv = mallopt(-8, 1);  // M_ARENA_MAX

The real issue is described here by people who know much more than me:

https://sourceware.org/bugzilla/show_bug.cgi?id=11261

Note You need to log in before you can comment on or make changes to this bug.