From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; compaq) Description of problem: After upgrading from kernel from 2.4.18-24.7.smp to 2.4.20-18.7.smp, top will segmentation fault after several house on a loaded system. Version-Release number of selected component (if applicable): 2.0.7-12 How reproducible: Always Steps to Reproduce: 1.Load up 4 CPU system system to achieve about 4.0 load average 2.Run top with no options 3.Leave it running for several hours until a segmentation fault is reported. Actual Results: Top stops with a segmentation fault Expected Results: Top should continue running Additional info:
I can't debug this without a backtrace or at least a strace of the crash. However, the segfault is very likely to have been fixed already in later versions of procps. Can you try the one in RHL9 (2.0.11-6) or the current rawhide version and see if they fix the problem.
Rebuilt the 2.0.11 from the source (the binary package expected a newer glibc). Should know something by Friday (7/11/2003)
Died again. Core was generated by `top'. Program terminated with signal 11, Segmentation fault. Reading symbols from /lib/libproc.so.2.0.11...(no debugging symbols found)... done. Loaded symbols for /lib/libproc.so.2.0.11 Reading symbols from /usr/lib/libncurses.so.5...(no debugging symbols found)... done. Loaded symbols for /usr/lib/libncurses.so.5 Reading symbols from /lib/i686/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/i686/libc.so.6 Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/ld-linux.so.2 #0 0x40026eec in stat2proc () from /lib/libproc.so.2.0.11 (gdb) where #0 0x40026eec in stat2proc () from /lib/libproc.so.2.0.11 #1 0x40027532 in readproc () from /lib/libproc.so.2.0.11 #2 0x0804f276 in readproctab2 () #3 0x0804bf92 in show_procs () #4 0x0804a1f4 in main () #5 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6 (gdb)
Interesting. It seems to die parsing /proc/$pid/stat. I haven't seen that before. Unfortunately there is no debug information in the package you built, so its a bit hard to figure out exactly what happened. Could you try building a version with debug information and try again? Note that the makefiles for procps manually strip the binaries on install, so i have this patch in recent rpms: --- procps-2.0.11/Makefile.dontstrip 2002-12-04 21:49:07.000000000 +0100 +++ procps-2.0.11/Makefile 2003-01-21 10:53:40.000000000 +0100 @@ -14,7 +14,7 @@ export USRBINDIR = $(DESTDIR)/usr/bin export PROCDIR = $(DESTDIR)/usr/bin# /usr/proc/bin for Solaris devotees export OWNERGROUP = --owner 0 --group 0 -export INSTALLBIN = install --strip +export INSTALLBIN = install export INSTALLLIB = install export INSTALLSCT = install export INSTALLMAN = install --mode a=r
Mass reassign to new owner
Could you check to see if this problem is fixed by version 3.1.15 or later?
Daniel, I have been reassigned and no longer tied to the IT group where I reported this issue. Since I was the only one tracking this issue, I would just close it. Gerry