Bug 98893

Summary: top segmentation faults under heavy load
Product: [Retired] Red Hat Linux Reporter: gerry.morong
Component: procpsAssignee: Daniel Walsh <dwalsh>
Status: CLOSED RAWHIDE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 7.3CC: gerry.morong
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-03-29 15:23:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description gerry.morong 2003-07-09 21:58:23 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; compaq)

Description of problem:
After upgrading from kernel from 2.4.18-24.7.smp to 2.4.20-18.7.smp, top will 
segmentation fault after several house on a loaded system.

Version-Release number of selected component (if applicable):
2.0.7-12

How reproducible:
Always

Steps to Reproduce:
1.Load up 4 CPU system system to achieve about 4.0 load average
2.Run top with no options
3.Leave it running for several hours until a segmentation fault is reported.
    

Actual Results:  Top stops with a segmentation fault

Expected Results:  Top should continue running

Additional info:

Comment 1 Alexander Larsson 2003-07-10 10:35:14 UTC
I can't debug this without a backtrace or at least a strace of the crash.
However, the segfault is very likely to have been fixed already in later
versions of procps. Can you try the one in RHL9 (2.0.11-6) or the current
rawhide version and see if they fix the problem.

Comment 2 gerry.morong 2003-07-10 20:05:59 UTC
Rebuilt the 2.0.11 from the source (the binary package expected a newer 
glibc).  Should know something by Friday (7/11/2003)

Comment 3 gerry.morong 2003-07-21 18:49:24 UTC
Died again.

Core was generated by `top'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libproc.so.2.0.11...(no debugging symbols found)...
done.
Loaded symbols for /lib/libproc.so.2.0.11
Reading symbols from /usr/lib/libncurses.so.5...(no debugging symbols found)...
done.
Loaded symbols for /usr/lib/libncurses.so.5
Reading symbols from /lib/i686/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
#0  0x40026eec in stat2proc () from /lib/libproc.so.2.0.11
(gdb) where
#0  0x40026eec in stat2proc () from /lib/libproc.so.2.0.11
#1  0x40027532 in readproc () from /lib/libproc.so.2.0.11
#2  0x0804f276 in readproctab2 ()
#3  0x0804bf92 in show_procs ()
#4  0x0804a1f4 in main ()
#5  0x42017589 in __libc_start_main () from /lib/i686/libc.so.6
(gdb) 


Comment 4 Alexander Larsson 2003-08-06 08:49:11 UTC
Interesting. It seems to die parsing /proc/$pid/stat. I haven't seen that
before. Unfortunately there is no debug information in the package you built, so
its a bit hard to figure out exactly what happened.

Could you try building a version with debug information and try again?
Note that the makefiles for procps manually strip the binaries on install, so i
have this patch in recent rpms:

--- procps-2.0.11/Makefile.dontstrip	2002-12-04 21:49:07.000000000 +0100
+++ procps-2.0.11/Makefile	2003-01-21 10:53:40.000000000 +0100
@@ -14,7 +14,7 @@
 export USRBINDIR  =  $(DESTDIR)/usr/bin
 export PROCDIR    =  $(DESTDIR)/usr/bin# /usr/proc/bin for Solaris devotees
 export OWNERGROUP =  --owner 0 --group 0
-export INSTALLBIN =  install --strip
+export INSTALLBIN =  install
 export INSTALLLIB =  install
 export INSTALLSCT =  install
 export INSTALLMAN =  install --mode a=r


Comment 5 Alexander Larsson 2004-02-05 10:23:55 UTC
Mass reassign to new owner

Comment 6 Daniel Walsh 2004-03-29 02:26:18 UTC
Could you check to see if this problem is fixed by version 3.1.15 or
later?



Comment 7 gerry.morong 2004-03-29 15:09:08 UTC
Daniel,

I have been reassigned and no longer tied to the IT group where I 
reported this issue.  Since I was the only one tracking this issue, I 
would just close it.

Gerry