Description of problem: Running tuna on 24-core AMD system and encountered the following python traceback: # tuna Traceback (most recent call last): File "/usr/lib/python2.4/site-packages/tuna/tuna_gui.py", line 93, in refresh self.irqview.refresh() File "/usr/lib/python2.4/site-packages/tuna/gui/irqview.py", line 259, in refresh self.show() File "/usr/lib/python2.4/site-packages/tuna/gui/irqview.py", line 240, in show self.set_irq_columns(row, irq, irq_info, nics) File "/usr/lib/python2.4/site-packages/tuna/gui/irqview.py", line 188, in set_irq_columns pids = self.ps.find_by_regex(irq_re) File "/usr/lib/python2.4/site-packages/procfs/procfs.py", line 229, in find_by_regex if regex.match(self.processes[pid]["stat"]["comm"]): File "/usr/lib/python2.4/site-packages/procfs/procfs.py", line 147, in __getitem__ setattr(self, attr, sclass(self.pid, self.basedir)) File "/usr/lib/python2.4/site-packages/procfs/procfs.py", line 60, in __init__ self.load(basedir) File "/usr/lib/python2.4/site-packages/procfs/procfs.py", line 72, in load f = open("%s/%d/stat" % (basedir, self.pid)) IOError: [Errno 2] No such file or directory: '/proc/9461/stat' Version-Release number of selected component (if applicable): # rpm -q tuna tuna-0.9.2-1.el5rt How reproducible: random Steps to Reproduce: 1. run tuna on amd-istanbul-24.farm.hsv.redhat.com 2. let run for >10 minutes 3. hope to see traceback Additional info: Probably just another spot that needs to be guarded by try/except to catch process disappearance.
Problem is in python-linux-procfs, commited a fix upstream and will provide a package to test on this machine.
Tested with the istambul machine, couldn't reproduce. Also tested localy with a machine with 10 Gbit/s cards, brew build at: https://brewweb.devel.redhat.com/taskinfo?taskID=2433122 Will tag after some more testing.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: * Cause: * Consequence: Python backtrace * Fix: * Result: Works properly on large (>=24 core) cpu systems
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,4 +1,4 @@ -* Cause: +* Cause: Processes can terminate while its procfs data is being read * Consequence: Python backtrace -* Fix: +* Fix: Catch exception and remove dead process from data structures * Result: Works properly on large (>=24 core) cpu systems
Tried running tuna-0.9.2-1 and python-linux-procfs-0.4.2-1 on a 32 cores box for 30 minutes without triggering this bug. Ran tuna-0.9.4-1 and python-linux-procfs-0.4.5-1 for over 1 hour without any issues. As it seems to work reliable -> moving to VERIFIED.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,4 +1 @@ -* Cause: Processes can terminate while its procfs data is being read +On systems with a large number of CPUs (24 and more), Tuna may have attempted to read procfs data for a terminated process and terminate unexpectedly. With this update, Tuna has been modified to catch an exception and remove the terminated process from its data structures.-* Consequence: Python backtrace -* Fix: Catch exception and remove dead process from data structures -* Result: Works properly on large (>=24 core) cpu systems
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0762.html