Bug 577365

Summary: python-linux-procfs: python traceback while monitoring system
Product: Red Hat Enterprise MRG Reporter: Clark Williams <williams>
Component: realtime-utilitiesAssignee: Arnaldo Carvalho de Melo <acme>
Status: CLOSED ERRATA QA Contact: David Sommerseth <davids>
Severity: medium Docs Contact:
Priority: low    
Version: 1.2CC: bhu, lgoncalv, ovasik
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
On systems with a large number of CPUs (24 and more), Tuna may have attempted to read procfs data for a terminated process and terminate unexpectedly. With this update, Tuna has been modified to catch an exception and remove the terminated process from its data structures.
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-11 15:10:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Clark Williams 2010-03-26 19:11:46 UTC
Description of problem:

Running tuna on 24-core AMD system and encountered the following python traceback:

# tuna
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/tuna/tuna_gui.py", line 93, in refresh
    self.irqview.refresh()
  File "/usr/lib/python2.4/site-packages/tuna/gui/irqview.py", line 259, in refresh
    self.show()
  File "/usr/lib/python2.4/site-packages/tuna/gui/irqview.py", line 240, in show
    self.set_irq_columns(row, irq, irq_info, nics)
  File "/usr/lib/python2.4/site-packages/tuna/gui/irqview.py", line 188, in set_irq_columns
    pids = self.ps.find_by_regex(irq_re)
  File "/usr/lib/python2.4/site-packages/procfs/procfs.py", line 229, in find_by_regex
    if regex.match(self.processes[pid]["stat"]["comm"]):
  File "/usr/lib/python2.4/site-packages/procfs/procfs.py", line 147, in __getitem__
    setattr(self, attr, sclass(self.pid, self.basedir))
  File "/usr/lib/python2.4/site-packages/procfs/procfs.py", line 60, in __init__
    self.load(basedir)
  File "/usr/lib/python2.4/site-packages/procfs/procfs.py", line 72, in load
    f = open("%s/%d/stat" % (basedir, self.pid))
IOError: [Errno 2] No such file or directory: '/proc/9461/stat'


Version-Release number of selected component (if applicable):

# rpm -q tuna
tuna-0.9.2-1.el5rt


How reproducible:

random

Steps to Reproduce:
1. run tuna on amd-istanbul-24.farm.hsv.redhat.com
2. let run for >10 minutes
3. hope to see traceback


Additional info:

Probably just another spot that needs to be guarded by try/except to catch process disappearance.

Comment 1 Arnaldo Carvalho de Melo 2010-03-26 19:32:44 UTC
Problem is in python-linux-procfs, commited a fix upstream and will provide a package to test on this machine.

Comment 2 Arnaldo Carvalho de Melo 2010-05-10 20:04:36 UTC
Tested with the istambul machine, couldn't reproduce. Also tested localy with a machine with 10 Gbit/s cards, brew build at:

https://brewweb.devel.redhat.com/taskinfo?taskID=2433122

Will tag after some more testing.

Comment 3 Clark Williams 2010-10-04 19:30:42 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
* Cause: 
* Consequence: Python backtrace
* Fix: 
* Result: Works properly on large (>=24 core) cpu systems

Comment 5 Arnaldo Carvalho de Melo 2010-10-05 15:33:48 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1,4 @@
-* Cause: 
+* Cause: Processes can terminate while its procfs data is being read
 * Consequence: Python backtrace
-* Fix: 
+* Fix: Catch exception and remove dead process from data structures
 * Result: Works properly on large (>=24 core) cpu systems

Comment 6 David Sommerseth 2010-10-07 14:32:25 UTC
Tried running tuna-0.9.2-1 and python-linux-procfs-0.4.2-1 on a 32 cores box for 30 minutes without triggering this bug.

Ran tuna-0.9.4-1 and python-linux-procfs-0.4.5-1 for over 1 hour without any issues.  As it seems to work reliable -> moving to VERIFIED.

Comment 7 Jaromir Hradilek 2010-10-11 14:20:04 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1 @@
-* Cause: Processes can terminate while its procfs data is being read
+On systems with a large number of CPUs (24 and more), Tuna may have attempted to read procfs data for a terminated process and terminate unexpectedly. With this update, Tuna has been modified to catch an exception and remove the terminated process from its data structures.-* Consequence: Python backtrace
-* Fix: Catch exception and remove dead process from data structures
-* Result: Works properly on large (>=24 core) cpu systems

Comment 8 errata-xmlrpc 2010-10-11 15:10:29 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0762.html