Bug 728127

Summary: [abrt] sos-2.2-8.el6: sosreport.py:730:sosreport:IOError: [Errno 2] No such file or directory: '/var/spool/abrt/ccpp-2011-08-04-08:59:28-5400/dhcp-25-35-2011080408591312441169/sos_logs/sosreport-plugin-errors.txt'
Product: Red Hat Enterprise Linux 6 Reporter: Michal Nowak <mnowak>
Component: sosAssignee: Bryn M. Reeves <bmr>
Status: CLOSED CANTFIX QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1CC: agk, bmr, djasa, dkutalek, gavin, jmoskovc, ohudlick, plambri, prc, rmarko, sos-team, tlavigne
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: abrt_hash:87ea4927202cee0eb772b09989d4d1ee4547ce54
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-04-02 16:06:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1056252    

Description Michal Nowak 2011-08-04 07:07:40 UTC
abrt version: 2.0.5
cmdline:        
executable:     /usr/sbin/sosreport
kernel:         2.6.32-174.el6.x86_64
reason:         sosreport.py:730:sosreport:IOError: [Errno 2] No such file or directory: '/var/spool/abrt/ccpp-2011-08-04-08:59:28-5400/dhcp-25-35-2011080408591312441169/sos_logs/sosreport-plugin-errors.txt'
time:           Thu Aug  4 08:59:34 2011
uid:            0
username:       root

backtrace:
:sosreport.py:730:sosreport:IOError: [Errno 2] No such file or directory: '/var/spool/abrt/ccpp-2011-08-04-08:59:28-5400/dhcp-25-35-2011080408591312441169/sos_logs/sosreport-plugin-errors.txt'
:
:Traceback (most recent call last):
:  File "/usr/sbin/sosreport", line 23, in <module>
:    sosreport(sys.argv[1:])
:  File "/usr/lib/python2.6/site-packages/sos/sosreport.py", line 730, in sosreport
:    error_log = open(logdir + "/sosreport-plugin-errors.txt", "a")
:IOError: [Errno 2] No such file or directory: '/var/spool/abrt/ccpp-2011-08-04-08:59:28-5400/dhcp-25-35-2011080408591312441169/sos_logs/sosreport-plugin-errors.txt'
:
:Local variables in innermost frame:
:pluginClass: <class sos.plugins.yum.yum at 0x3c84e30>
:names: deque(['yumdebug'])
:plugins: ['__init__.py', '__init__.pyc', '__init__.pyo', 'acpid.py', 'acpid.pyc', 'acpid.pyo', 'amd.py', 'amd.pyc', 'amd.pyo', 'anaconda.py', 'anaconda.pyc', 'anaconda.pyo', 'apache.py', 'apache.pyc', 'apache.pyo', 'auditd.py', 'auditd.pyc', 'auditd.pyo', 'autofs.py', 'autofs.pyc', 'autofs.pyo', 'bootloader.py', 'bootloader.pyc', 'bootloader.pyo', 'cluster.py', 'cluster.pyc', 'cluster.pyo', 'cobbler.py', 'cobbler.pyc', 'cobbler.pyo', 'corosync.py', 'corosync.pyc', 'corosync.pyo', 'crontab.py', 'crontab.pyc', 'crontab.pyo', 'cs.py', 'cs.pyc', 'cs.pyo', 'devicemapper.py', 'devicemapper.pyc', 'devicemapper.pyo', 'dhcp.py', 'dhcp.pyc', 'dhcp.pyo', 'dovecot.py', 'dovecot.pyc', 'dovecot.pyo', 'ds.py', 'ds.pyc', 'ds.pyo', 'emc.py', 'emc.pyc', 'emc.pyo', 'filesys.py', 'filesys.pyc', 'filesys.pyo', 'ftp.py', 'ftp.pyc', 'ftp.pyo', 'gdm.py', 'gdm.pyc', 'gdm.pyo', 'general.py', 'general.pyc', 'general.pyo', 'hardware.py', 'hardware.pyc', 'hardware.pyo', 'hts.py', 'hts.pyc', 'hts.pyo', 'i18n.py', 'i18n.pyc', 'i18n.pyo', 'initrd.py', 'initrd.pyc', 'initrd.pyo', 'ipa.py', 'ipa.pyc', 'ipa.pyo', 'ipsec.py', 'ipsec.pyc', 'ipsec.pyo', 'iscsi.py', 'iscsi.pyc', 'iscsi.pyo', 'kdump.py', 'kdump.pyc', 'kdump.pyo', 'kernel.py', 'kernel.pyc', 'kernel.pyo', 'kvm.py', 'kvm.pyc', 'kvm.pyo', 'ldap.py', 'ldap.pyc', 'ldap.pyo', 'libraries.py', 'libraries.pyc', 'libraries.pyo', 'libvirt.py', 'libvirt.pyc', 'libvirt.pyo', 'logrotate.py', 'logrotate.pyc', 'logrotate.pyo', 'lsbrelease.py', 'lsbrelease.pyc', 'lsbrelease.pyo', 'memory.py', 'memory.pyc', 'memory.pyo', 'mrggrid.py', 'mrggrid.pyc', 'mrggrid.pyo', 'mrgmessg.py', 'mrgmessg.pyc', 'mrgmessg.pyo', 'mysql.py', 'mysql.pyc', 'mysql.pyo', 'named.py', 'named.pyc', 'named.pyo', 'netdump.py', 'netdump.pyc', 'netdump.pyo', 'networking.py', 'networking.pyc', 'networking.pyo', 'nfsserver.py', 'nfsserver.pyc', 'nfsserver.pyo', 'nscd.py', 'nscd.pyc', 'nscd.pyo', 'ntp.py', 'ntp.pyc', 'ntp.pyo', 'oddjob.py', 'oddjob.pyc', 'oddjob.pyo', 'openssl.py', 'openssl.pyc', 'openssl.pyo', 'openswan.py', 'openswan.pyc', 'openswan.pyo', 'pam.py', 'pam.pyc', 'pam.pyo', 'postfix.py', 'postfix.pyc', 'postfix.pyo', 'ppp.py', 'ppp.pyc', 'ppp.pyo', 'printing.py', 'printing.pyc', 'printing.pyo', 'process.py', 'process.pyc', 'process.pyo', 'psacct.py', 'psacct.pyc', 'psacct.pyo', 'pxe.py', 'pxe.pyc', 'pxe.pyo', 'qpidd.py', 'qpidd.pyc', 'qpidd.pyo', 'quagga.py', 'quagga.pyc', 'quagga.pyo', 'radius.py', 'radius.pyc', 'radius.pyo', 'rhn.py', 'rhn.pyc', 'rhn.pyo', 'rpm.py', 'rpm.pyc', 'rpm.pyo', 's390.py', 's390.pyc', 's390.pyo', 'samba.py', 'samba.pyc', 'samba.pyo', 'sanitize.py', 'sanitize.pyc', 'sanitize.pyo', 'sar.py', 'sar.pyc', 'sar.pyo', 'selinux.py', 'selinux.pyc', 'selinux.pyo', 'sendmail.py', 'sendmail.pyc', 'sendmail.pyo', 'smartcard.py', 'smartcard.pyc', 'smartcard.pyo', 'snmp.py', 'snmp.pyc', 'snmp.pyo', 'soundcard.py', 'soundcard.pyc', 'soundcard.pyo', 'squid.py', 'squid.pyc', 'squid.pyo', 'ssh.py', 'ssh.pyc', 'ssh.pyo', 'sssd.py', 'sssd.pyc', 'sssd.pyo', 'startup.py', 'startup.pyc', 'startup.pyo', 'system.py', 'system.pyc', 'system.pyo', 'systemtap.py', 'systemtap.pyc', 'systemtap.pyo', 'tftpserver.py', 'tftpserver.pyc', 'tftpserver.pyo', 'tomcat.py', 'tomcat.pyc', 'tomcat.pyo', 'udev.py', 'udev.pyc', 'udev.pyo', 'veritas.py', 'veritas.pyc', 'veritas.pyo', 'vmware.py', 'vmware.pyc', 'vmware.pyo', 'x11.py', 'x11.pyc', 'x11.pyo', 'xen.py', 'xen.pyc', 'xen.pyo', 'xinetd.py', 'xinetd.pyc', 'xinetd.pyo', 'yum.py', 'yum.pyc', 'yum.pyo']
:parms: deque([{'enabled': False, 'speed': 'slow', 'desc': 'gather yum debugging data'}])
:paths: ['/usr/share/rhn/', '/usr/sbin', '/usr/lib64/python26.zip', '/usr/lib64/python2.6', '/usr/lib64/python2.6/plat-linux2', '/usr/lib64/python2.6/lib-tk', '/usr/lib64/python2.6/lib-old', '/usr/lib64/python2.6/lib-dynload', '/usr/lib64/python2.6/site-packages', '/usr/lib64/python2.6/site-packages/PIL', '/usr/lib64/python2.6/site-packages/gst-0.10', '/usr/lib64/python2.6/site-packages/gtk-2.0', '/usr/lib64/python2.6/site-packages/webkit-1.0', '/usr/lib/python2.6/site-packages', '/usr/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg-info']
:console: <logging.StreamHandler instance at 0x142f248>
:logdir: '/var/spool/abrt/ccpp-2011-08-04-08:59:28-5400/dhcp-25-35-2011080408591312441169/sos_logs'
:conf_disable_plugins: None
:soslog: <logging.Logger instance at 0x7f620e55b5a8>
:config: <ConfigParser.ConfigParser instance at 0x1430320>
:cmddir: '/var/spool/abrt/ccpp-2011-08-04-08:59:28-5400/dhcp-25-35-2011080408591312441169/sos_commands'
:config_file: '/etc/sos.conf'
:optname: 'yumdebug'
:pluginpath: '/usr/lib/python2.6/site-packages/sos/plugins'
:alloptions: deque([(<sos.plugins.devicemapper.devicemapper instance at 0x161cb00>, 'devicemapper', 'lvmdump', {'enabled': False, 'speed': 'slow', 'desc': 'collect raw metadata from PVs'}), (<sos.plugins.filesys.filesys instance at 0x1630a70>, 'filesys', 'dumpe2fs', {'enabled': False, 'speed': 'slow', 'desc': 'dump filesystem information'}), (<sos.plugins.general.general instance at 0x163a050>, 'general', 'syslogsize', {'enabled': 15, 'speed': '', 'desc': 'max size (MiB) to collect per syslog file'}), (<sos.plugins.general.general instance at 0x163a050>, 'general', 'all_logs', {'enabled': False, 'speed': '', 'desc': 'collect all log files defined in syslog.conf'}), (<sos.plugins.kernel.kernel instance at 0x163ac68>, 'kernel', 'modinfo', {'enabled': True, 'speed': 'fast', 'desc': 'gathers information on all kernel modules'}), (<sos.plugins.libraries.libraries instance at 0x1645200>, 'libraries', 'ldconfigv', {'enabled': False, 'speed': 'slow', 'desc': 'the name of each directory as it is scanned, and any links that are created.'}), (<sos.plugins.networking.networking instance at 0x1645cb0>, 'networking', 'traceroute', {'enabled': False, 'speed': 'slow', 'desc': 'collects a traceroute to rhn.redhat.com'}), (<sos.plugins.rpm.rpm instance at 0x3c70758>, 'rpm', 'rpmq', {'enabled': True, 'speed': 'fast', 'desc': 'queries for package information via rpm -q'}), (<sos.plugins.rpm.rpm instance at 0x3c70758>, 'rpm', 'rpmva', {'enabled': False, 'speed': 'slow', 'desc': 'runs a verify on all packages'}), (<sos.plugins.startup.startup instance at 0x3c81128>, 'startup', 'servicestatus', {'enabled': False, 'speed': 'slow', 'desc': 'get a status of all running services'}), (<sos.plugins.yum.yum instance at 0x3c81c20>, 'yum', 'yumdebug', {'enabled': False, 'speed': 'slow', 'desc': 'gather yum debugging data'})])
:path: '/usr/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg-info'
:plugbase: 'yum.'
:plug: <sos.plugins.process.process instance at 0x3c62b48>
:plugname: 'process'
:i: (('process', <sos.plugins.process.process instance at 0x3c62b48>),)
:xmlrep: <sos.sosreport.XmlReport instance at 0x142f3f8>
:plugruncount: 13
:commons: {'cmddir': '/var/spool/abrt/ccpp-2011-08-04-08:59:28-5400/dhcp-25-35-2011080408591312441169/sos_commands', 'xmlreport': <sos.sosreport.XmlReport instance at 0x142f3f8>, 'logdir': '/var/spool/abrt/ccpp-2011-08-04-08:59:28-5400/dhcp-25-35-2011080408591312441169/sos_logs', 'soslog': <logging.Logger instance at 0x7f620e55b5a8>, 'policy': <sos.policyredhat.SosPolicy instance at 0x14303b0>, 'dstroot': '/var/spool/abrt/ccpp-2011-08-04-08:59:28-5400/dhcp-25-35-2011080408591312441169', 'cmdlineopts': <Values at 0x142fd88: {'diagnose': False, 'analyze': False, 'tmp_dir': '/var/spool/abrt/ccpp-2011-08-04-08:59:28-5400', 'customerName': None, 'plugopts': deque(['rpm.rpmva=off']), 'verbosity': None, 'ticketNumber': None, 'batch': True, 'upload': False, 'debug': None, 'build': False, 'listPlugins': False, 'usealloptions': False, 'onlyplugins': deque(['anaconda', 'bootloader', 'devicemapper', 'filesys', 'hardware', 'kernel', 'libraries', 'memory', 'networking', 'nfsserver', 'pam', 'process', 'rpm', 'ssh', 'startup', 'yum', 'general', 'x11']), 'report': False, 'config_file': None, 'enableplugins': deque([]), 'profiler': False, 'noplugins': deque([]), 'nocolors': True}>, 'config': <ConfigParser.ConfigParser instance at 0x1430320>, 'verbosity': None, 'rptdir': '/var/spool/abrt/ccpp-2011-08-04-08:59:28-5400/dhcp-25-35-2011080408591312441169/sos_reports'}
:optparm: {'enabled': False, 'speed': 'slow', 'desc': 'gather yum debugging data'}
:plugin_names: deque(['acpid', 'amd', 'anaconda', 'apache', 'auditd', 'autofs', 'bootloader', 'cluster', 'cobbler', 'corosync', 'crontab', 'cs', 'devicemapper', 'dhcp', 'dovecot', 'ds', 'emc', 'filesys', 'ftp', 'gdm', 'general', 'hardware', 'hts', 'i18n', 'initrd', 'ipa', 'ipsec', 'iscsi', 'kdump', 'kernel', 'kvm', 'ldap', 'libraries', 'libvirt', 'logrotate', 'lsbrelease', 'memory', 'mrggrid', 'mrgmessg', 'mysql', 'named', 'netdump', 'networking', 'nfsserver', 'nscd', 'ntp', 'oddjob', 'openssl', 'openswan', 'pam', 'postfix', 'ppp', 'printing', 'process', 'psacct', 'pxe', 'qpidd', 'quagga', 'radius', 'rhn', 'rpm', 's390', 'samba', 'sanitize', 'sar', 'selinux', 'sendmail', 'smartcard', 'snmp', 'soundcard', 'squid', 'ssh', 'sssd', 'startup', 'system', 'systemtap', 'tftpserver', 'tomcat', 'udev', 'veritas', 'vmware', 'x11', 'xen', 'xinetd', 'yum'])
:conf_disable_plugins_list: deque([])
:flog: <logging.FileHandler instance at 0x142f098>
:rptdir: '/var/spool/abrt/ccpp-2011-08-04-08:59:28-5400/dhcp-25-35-2011080408591312441169/sos_reports'

comment:
:(Not sure this is a problem at all.)
:
:sosreport was gathering info about a system because abrt asked for it, then I send TERM signal to sosreport (needed it to crash as I was testing something else) and found in the logs that abrt catched sosreport's crash.

Comment 1 Michal Nowak 2011-08-04 07:07:52 UTC
Package: sos-2.2-8.el6
Architecture: x86_64
OS Release: Red Hat Enterprise Linux Workstation release 6.1 (Santiago)

Comment
-----
(Not sure this is a problem at all.)

sosreport was gathering info about a system because abrt asked for it, then I send TERM signal to sosreport (needed it to crash as I was testing something else) and found in the logs that abrt catched sosreport's crash.

Comment 3 Bryn M. Reeves 2011-08-09 17:04:34 UTC
Did the directory listed actually exist at this point? It would be good to handle this better if not but I'm not clear on the circumstances of this problem as we're running under abrt

Comment 4 Michal Nowak 2011-08-10 14:44:23 UTC
(In reply to comment #3)
> Did the directory listed actually exist at this point? 

No idea.

> It would be good to
> handle this better if not but I'm not clear on the circumstances of this
> problem as we're running under abrt

Nothing obvious. It's just a corner case.

Comment 5 Bryn M. Reeves 2011-08-10 14:57:08 UTC
Best guess something removed our dstroot while sos was running - otherwise e.g. we'd have hit exceptions on the calls to os.mkdir (right after we set up logdir in sosreport.py).

I'm not sure this is a case we really need to be too concerned about - if some{one,thing} is using alternate dstroot and running sos it should take care of stopping sos before removing any paths from the file system.

Comment 6 Michal Nowak 2011-09-02 09:59:37 UTC
It seems I came across this issue once again.

`sosreport' is being executed by abrt event script with --tmp-dir option set to e.g. /var/spool/abrt/ccpp-2011-09-02-11:49:45-19415. `sosreport's run takes some time and it may happen user executes `abrt rm /var/spool/abrt/ccpp-2011-09-02-11:49:45-19415' before `sosreport' finishes.

Can this scenario lead to the mentioned error?

Comment 7 RHEL Program Management 2011-10-07 16:06:12 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 9 Bryn M. Reeves 2011-12-13 16:34:16 UTC
In reply to comment #6 this scenario is outside sos's control and you would see the same result e.g. if you pass a path in /tmp with --tmp-dir and then remove it while sos is running.

The "abrt rm" command needs to synchronise with the abrt action that's running sosreport and either wait for it to finish or kill it off.

Comment 10 Michal Nowak 2011-12-14 10:32:10 UTC
There might be a problem in ABRT but I think there also might be one in sosreport. Can it be possible to catch the exception in sosreport and exit w/o backtrace?

Comment 11 Michal Nowak 2012-03-06 13:02:58 UTC
Sorry but I need to reopen it.

I still see the problem in 6.3 and essentially every abrt test fails because it finishes before sosreport does and since tests tend to clean after themselves I found sosreport many times attempting to write to non-existing dir or completely hung.

Comment 12 Bryn M. Reeves 2012-03-06 13:25:46 UTC
Why can't abrt kill processes that it started before attempting to clean them up?

Comment 13 Michal Nowak 2012-03-06 14:30:34 UTC
Jiri, could you answer comment #12, please?

What if abrt-cli is modified not to remove $crash_dir when some process is operating on top of it unless '-f/--force' used?

This is what lsof shows when sosreport runs:

sosreport 10872 root  cwd    DIR  253,0     4096 1573722 /var/spool/abrt/ccpp-2012-03-06-09:13:48-10857

Comment 14 Jiri Moskovcak 2012-03-06 15:00:25 UTC
There seems to be a race condition when ABRT runs sos first and then decides to remove the crash directory because the crash is a duplicate (or somehow broken).

Comment 15 Michal Nowak 2012-03-06 15:31:50 UTC
What if user decides to remove the $crash_dir *before* sosreport ends? That happens a lot in our tests.

Comment 16 Jiri Moskovcak 2012-03-06 16:16:38 UTC
(In reply to comment #15)
> What if user decides to remove the $crash_dir *before* sosreport ends? That
> happens a lot in our tests.

- that's the same scenario, it doesn't matter who removes the dir..

Comment 21 Bryn M. Reeves 2012-10-18 13:36:21 UTC
OK, I can reproduce this now (it seems the TERM is really needed to trigger the exception that then causes the exception in the exception handling code.. I'd missed this when testing previously):

1. sosreport --batch
2. <CTRL>-Z
3. rm -rf $DSTROOT; kill -TERM %1; fg

Actual results:
All processes ended, cleaning up.
sosreport --batch
Traceback (most recent call last):
  File "/usr/sbin/sosreport", line 23, in <module>
    sosreport(sys.argv[1:])
  File "/usr/lib/python2.6/site-packages/sos/sosreport.py", line 734, in sosreport
    error_log = open(logdir + "/sosreport-plugin-errors.txt", "a")
IOError: [Errno 2] No such file or directory: '/tmp/rhel6-vm1-2012101814331350567190/sos_logs/sosreport-plugin-errors.txt'

Expected results:
Give up when logging fails and print the real exception to the terminal.

Comment 24 Pierguido Lambri 2012-10-19 13:58:48 UTC
I tried to reproduce the error with these commands:

# sosreport --tmp-dir=/root

I removed the directory were sosreport was writing and send a TERM signal to the sosreport process.

Apart having a broken sosreport tar file, I got this backtrace:

  Completed [39/55] ...      All processes ended, cleaning up.
error copying file /var/log/cups/error_log-20121014
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/sos/plugintools.py", line 200, in doCopyFileOrDir
    tdstpath, abspath = self.__copyFile(srcpath)
  File "/usr/lib/python2.6/site-packages/sos/plugintools.py", line 237, in __copyFile
    shutil.copyfileobj(fsrc, fdst, -1)
  File "/usr/lib64/python2.6/shutil.py", line 31, in copyfileobj
    fdst.write(buf)
  File "/usr/lib/python2.6/site-packages/sos/sosreport.py", line 65, in exittermhandler
    doExitCode()
  File "/usr/lib/python2.6/site-packages/sos/sosreport.py", line 75, in doExitCode
    doExit(1)
  File "/usr/lib/python2.6/site-packages/sos/sosreport.py", line 86, in doExit
    sys.exit(error)
SystemExit: 1
  Completed [55/55] ...

Comment 25 Bryn M. Reeves 2012-11-09 11:22:40 UTC
Thanks Pier - this is the kind of thing I was worried about. The patches seem to make the problem a bit less likely but when you hit it it the result is worse.

I'll look at this again but I'm not sure we can really solve it without a major rewrite of the signal and exception handling that I don't think would be appropriate for sos-2.2 at this stage.

Comment 27 Bryn M. Reeves 2013-06-05 13:44:06 UTC
*** Bug 971016 has been marked as a duplicate of this bug. ***

Comment 31 Bryn M. Reeves 2014-04-02 16:06:59 UTC
This bug is being closed CANTFIX as it describes a rare corner case which requires the user to remove the temporary directory that sos is using while it is running. This is not recommended and may lead to incomplete data collection or unexpected exceptions within sos python code.

Altering the exception handling behaviour at this stage in sos-2.2 carries an unacceptable risk of regressions and due to the low impact and likelihood of this problem occurring in the wild it will not be addressed in an update to Red Hat Enterprise Linux 6.

Comment 32 Red Hat Bugzilla 2023-09-14 01:24:46 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days