Bug 806903 - readdir fails to read entire contents of /proc when pid exceeds 32768 using 32-bit application
Summary: readdir fails to read entire contents of /proc when pid exceeds 32768 using 3...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: glibc
Version: 5.5
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Jeff Law
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-26 13:23 UTC by John Tavares
Modified: 2016-11-24 16:12 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-04-02 17:43:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
gzipped tar file containing 32-bit test program and source code for it (4.06 KB, application/x-gzip)
2012-04-02 11:22 UTC, John Tavares
no flags Details

Description John Tavares 2012-03-26 13:23:46 UTC
Description of problem:
32-bit application using readdir to read the contents of /proc in order to report process level activity fails due to readdir returning a Invalid argument error when it encounters a directory(process) greater than 32768

Version-Release number of selected component (if applicable):
$ ls -l /lib/libc.so.6 
lrwxrwxrwx 1 root root 11 Jul 14  2010 /lib/libc.so.6 -> libc-2.5.so

How reproducible:
Increase the kernel.pid_max setting to 65536 and using a 32-bit application to read the contents of /proc when one or more processes pid exceeds 32768

Steps to Reproduce:
1. Change kernel.pid_max setting from 32768 to 65536 by running su root -c "sysctl -w kernel.pid_max=65536"
2. Create enough processes such that the pids start to exceed 32768
3. Using 32-bit application, attempt to read contents of /proc
  
Actual results:

Shortened results
Found 27712
Found 28539
Found 28540
Found 28542
Found 31147
Found 31170
Found 32162
Error - could not read all contents of /proc: Invalid argument


Expected results:

Shortened
Found 27712
Found 28539
Found 28540
Found 28542
Found 31147
Found 31170
Found 32162
Found 36268
Found 58715
Found 58716


Additional info:
I suspect that this is a kernel bug since I have found it works on on distribution using the same version of glibc.  Also, this does not appear to be an issue on RHEL 6.  It also does not fail when trying to read a pseudo /proc directory where I created a similar directory structure under /tmp/proc to simulate the top level directory contents of /proc.

Comment 1 Jeff Law 2012-03-30 19:52:38 UTC
John, it would significantly help if you indicated what 32-bit application you are using to read the contents of /proc.  I've tried a few on a Red Hat Enterprise Linux 5.5 VM and have not managed to trigger a failure yet.

Comment 2 John Tavares 2012-04-02 11:22:43 UTC
Created attachment 574466 [details]
gzipped tar file containing 32-bit test program and source code for it

Trying to resubmit test code/executable to use that shows the issues once prerequisite conditions as I described int he how to reproduce have been met.

Comment 3 John Tavares 2012-04-02 11:25:17 UTC
I use my own 32-bit binary to read /proc.  I had attached a zip file with source and the binary I built to make this easy.  The problem will only show if you change the kernel.pid_max setting to say 65536 (sysctl -w kernel.pid_max=65536) and create enough processes until processes with a pid greater than 32768 start to appear.  Once this happens, the problem shows itself.  Note that you must be on a x64 system for all of this.  

From strace, the problem is related to getdents:

getdents(3, /* d_reclen == 0, problem here *//* 1 entries */, 32768) = 6484
getdents(3, /* 0 entries */, 32768)     = 0

Comment 4 John Tavares 2012-04-02 12:25:52 UTC
$ uname -r
2.6.18-194.el5
$ uname -m
x86_64
$ cat /etc/redhat\-release 
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
$ sysctl kernel.pid_max
kernel.pid_max = 65536
$ ps -eaf | tail -3    
qa_inst  65279  8149  0 17:53 pts/36   00:00:00 ps -eaf
qa_inst  65280  8149  0 17:53 pts/36   00:00:00 tail -3

Here is the tail output the sample program results:

Found 31170
Found 32162
Error - could not read all contents of /proc: Invalid argument

Comment 5 Jeff Law 2012-04-02 17:24:10 UTC
Thanks John.  I've been able to reproduce the problem.  As you hinted at in the initial report, right now this appears to be a kernel problem.  I'm still doing some analysis, but the signs are pointing that direction.

Comment 6 John Tavares 2012-04-02 17:38:58 UTC
You are welcome.  The problem appears to be isolated to /proc.  I had initially tried to simulate this until I found a system that I could reproduce it by trying make a copy of /proc under /tmp (/tmp/proc/...) and created directories to make it look like there where processes > 32768 and I could not reproduce it by doing that.  I have also been trying to see if this is a generic issue or to specific architectures.  So far, I have been I have only been able to find a similar system running on s390x, and the problem does not appear to be there.  I am trying to find a similar system on IA64 and PowerPC, but I have yet to do so.  So as of now, I have only seen this on x64.

Comment 7 Jeff Law 2012-04-02 17:43:02 UTC
This was fixed in Red Hat Enterprise Linux 5.6 which was released with kernel 2.6.18-238.el5.  Simple bisection shows that 2.6.18-219.el5 fails while 2.6.18-221.el5 works.

Looking at the ChangeLogs, this change stands out as potentially fixing the problem, perhaps as a side effect of the RFE.


- [fs] proc: add file position and flags info in /proc (Jerome Marchand) [498081]

Regardless of precisely which change in the 220/221 kernel fixed the bug, the errata for the kernel update is here:


http://rhn.redhat.com/errata/RHSA-2011-0017.html


Note You need to log in before you can comment on or make changes to this bug.