RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1085042 - sosreport will fail with "no space left" error, but it appears to generate a valid sosreport anyway
Summary: sosreport will fail with "no space left" error, but it appears to generate a ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: sos
Version: 6.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Bryn M. Reeves
QA Contact: David Kutálek
URL:
Whiteboard:
Depends On:
Blocks: 1089003
TreeView+ depends on / blocked
 
Reported: 2014-04-07 16:10 UTC by Wesley Duffee-Braun
Modified: 2018-12-09 17:43 UTC (History)
7 users (show)

Fixed In Version: sos-2.2-53.el6
Doc Type: Bug Fix
Doc Text:
Cause: Previous versions of sos failed to correctly handle file system exceptions resulting from out of space conditions. Consequence: Running sosreport with insufficient space could lead to thousands of errors being logged and the creation of an unusable report tarball. Fix: All IO paths in sosreport now correctly handle out-of-space and other fatal file system exceptions. Result: Attempting to run sos with insufficient space now results in an immediate descriptive error and the tool will not attempt to create a report archive.
Clone Of:
: 1089003 (view as bug list)
Environment:
Last Closed: 2014-10-14 07:23:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Check tar exit status and fail if non-zero (3.29 KB, patch)
2014-04-07 18:41 UTC, Bryn M. Reeves
no flags Details | Diff
Check tar exit status and fail if non-zero (2.35 KB, patch)
2014-04-07 18:48 UTC, Bryn M. Reeves
no flags Details | Diff
Improve error reporting on mkdir failure (1.68 KB, patch)
2014-04-07 19:43 UTC, Bryn M. Reeves
no flags Details | Diff
Abort run on fatal IO exceptions (1.58 KB, patch)
2014-04-07 19:46 UTC, Bryn M. Reeves
no flags Details | Diff
Raise fatal IO exceptions in PluginBase (1.16 KB, patch)
2014-04-07 19:47 UTC, Bryn M. Reeves
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:1528 0 normal SHIPPED_LIVE sos bug fix and enhancement update 2014-10-14 01:22:00 UTC

Description Wesley Duffee-Braun 2014-04-07 16:10:51 UTC
Description of problem:
When the sosreport process attempts to the create the compressed archive, but no space is left on the device, the following error is generated:

/usr/bin/xz: (stdout): Write error: No space left on device
/bin/tar: -: Wrote only 8192 of 10240 bytes
/bin/tar: Error is not recoverable: exiting now

However, the sosreport process continues and displays the following:

Your sosreport has been generated and saved in:
/tmp/sosreport-name.01065558-20140407112648-cb2a.tar.xz
The md5sum is: 0867e79613336b249d2abaaeec80cb2a

This gives the impression that the sosreport has completed successfully. It would be helpful to stop the creation process when /bin/tar exits and show the error instead of continuing on and creating an unusable sosreport.

Version-Release number of selected component (if applicable):
sos-2.2-38.el6_4.2.noarch

How reproducible:
Always

Steps to Reproduce:
1. Attempt to create a sosreport with a target file in a location that doesn't have enough space to hold the created file

Actual results:
The sosreport continues being created, and even states that the sosreport is created, saved and an md5sum is created.

Expected results:
sosreport exits with an error about the space issue.

Additional info:
I recognize that you cannot know beforehand about the space required (per BZ 399311) but once it is clear that the sosreport will not be valid, letting the process exit with an error is preferable to the current output.

Comment 2 Bryn M. Reeves 2014-04-07 16:25:32 UTC
In that situation we should bail with an error message when finalizing the archive fails. This should also leave the temporary tree in place to preserve diagnostic data.

I've filed an upstream issue for this as although this part of the code has been completely rewritten I suspect it still fails in unhelpful ways on ENOSPC:

https://github.com/sosreport/sos/issues/266

Comment 3 Bryn M. Reeves 2014-04-07 16:29:04 UTC
Btw, bug 399311 was closed WONTFIX without an explanation by another contributor. Although we're unlikely to change that behaviour in sos-2.2 (RHEL6) there's no problem with our revisiting the problem upstream and considering it for future releases (we have a couple of other situations that could be helped by a two-pass 'mark-and-sweep' style approach to file collection).

Comment 4 Wesley Duffee-Braun 2014-04-07 16:51:52 UTC
Hi Bryn,

I could see the benefit of reopening 399311, but for this BZ the customer would be happy with the bail+error described in Comment #2. Right now we are wasting cycles as the customer will upload what appears to be a good sosreport, and we don't know it is invalid until I pull it down and see that it errors on opening. 

Thanks!
Wesley


(In reply to Bryn M. Reeves from comment #3)
> Btw, bug 399311 was closed WONTFIX without an explanation by another
> contributor. Although we're unlikely to change that behaviour in sos-2.2
> (RHEL6) there's no problem with our revisiting the problem upstream and
> considering it for future releases (we have a couple of other situations
> that could be helped by a two-pass 'mark-and-sweep' style approach to file
> collection).

Comment 5 Bryn M. Reeves 2014-04-07 17:25:43 UTC
There are several problems here:

* failure to gracefully handle ENOSPC coming out of 
  Plugin.collect()->copyStuff()->doCopyFileOrDir():

Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/sos/plugintools.py", line 241, in doCopyFileOrDir
    tdstpath, abspath = self.__copyFile(srcpath)
  File "/usr/lib/python2.6/site-packages/sos/plugintools.py", line 270, in __copyFile
    os.makedirs(new_dir)
  File "/usr/lib64/python2.6/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 28] No space left on device: '/tmp/rhel6-vm1-2014040718191396891192/etc/sysconfig'

* Similar problem in log flush path:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/logging/__init__.py", line 774, in emit
    self.flush()
  File "/usr/lib64/python2.6/logging/__init__.py", line 746, in flush
    self.stream.flush()
IOError: [Errno 28] No space left on device

* The tar error reported in comment #0.

The last is actually the hardest to hit; you have to have just enough space for the temporary archive to fit but not enough space for the temporary archive _and_ the compressed copy to co-exist.

Comment 6 Bryn M. Reeves 2014-04-07 17:29:30 UTC
> I could see the benefit of reopening 399311

Like I said; it's unlikely to be suitable for a RHEL6 update anyway but we could have at least used that bug to track some better error checking in 2.2. That's what I intend to use this bz for.

Comment 7 Bryn M. Reeves 2014-04-07 18:11:36 UTC
Steps to Reproduce:
0. It helps to start with a near-full file system

1. generate a first sosreport, e.g.
   # sosreport -v --batch -n rpm 2>&1 | tee /boot/sos.lg

2. extract the resulting tarball:
   # tar xf sosreport-*.tar.xz

3. Remove the tar archive
   # rm -f sosreport-*.tar.xz

4. Fill the file system completely
   # dd if=/dev/zero of=filler bs=128k

5. Remove the sosreport directory tree
   # rm -rf sosreport-*/

6. generate a second sosreport with the same parameters
   # sosreport -v --batch -n rpm 2>&1 | tee /boot/sos.log

This should reliably reproduce the finalizing problem without going too far off into the weeds on the first two.

Comment 8 Bryn M. Reeves 2014-04-07 18:41:19 UTC
Created attachment 883724 [details]
Check tar exit status and fail if non-zero

Comment 9 Bryn M. Reeves 2014-04-07 18:48:35 UTC
Created attachment 883725 [details]
Check tar exit status and fail if non-zero

Remove a stray hunk patch from the previous attachment. This patch fixes the broken finalizing behaviour; sos will now exit and leave the temporary directory tree in place:

Traceback (most recent call last):
  File "/usr/lib64/python2.6/logging/__init__.py", line 774, in emit
    self.flush()
  File "/usr/lib64/python2.6/logging/__init__.py", line 746, in flush
    self.stream.flush()
IOError: [Errno 28] No space left on device
error collection output of 'yum -C repolist', traceback follows:

Creating compressed archive...

Failed to create compressed archive.

  sosreport build tree is located at : /tmp/rhel6-vm1-2014040719351396895750

Comment 10 Bryn M. Reeves 2014-04-07 19:43:18 UTC
Created attachment 883755 [details]
Improve error reporting on mkdir failure

Comment 11 Bryn M. Reeves 2014-04-07 19:46:30 UTC
Created attachment 883756 [details]
Abort run on fatal IO exceptions

Abort the run if a plugin hits a fatal write IO exception (EROFS, ENOSPC). Do not clean up the temporary tree.
    
Signed-off-by: Bryn M. Reeves <bmr>

Comment 12 Bryn M. Reeves 2014-04-07 19:47:21 UTC
Created attachment 883757 [details]
Raise fatal IO exceptions in PluginBase

Comment 20 Bryn M. Reeves 2014-06-17 17:15:13 UTC
commit afdf2edaffc89fab80f8f2ac79280ffe91346cbe
Author: Bryn M. Reeves <bmr>
Date:   Mon Apr 7 20:31:44 2014 +0100

    Raise fatal IO exceptions in PluginBase
    
    Signed-off-by: Bryn M. Reeves <bmr>

commit 4ef8b6883032a540e7744c2c48c973fc7fcd3cf4
Author: Bryn M. Reeves <bmr>
Date:   Mon Apr 7 20:30:05 2014 +0100

    Abort run on fatal IO exceptions
    
    Abort the run if a plugin hits a fatal write IO exception (EROFS,
    ENOSPC). Do not clean up the temporary tree.
    
    Signed-off-by: Bryn M. Reeves <bmr>

commit 5cf0e7a830cce8988e0def4714f21be2426bfdeb
Author: Bryn M. Reeves <bmr>
Date:   Mon Apr 7 20:29:09 2014 +0100

    Improve error reporting on mkdir failure
    
    Signed-off-by: Bryn M. Reeves <bmr>

commit 07f08e997bc794bd717bb87f10b79c65497c3243
Author: Bryn M. Reeves <bmr>
Date:   Mon Apr 7 20:27:01 2014 +0100

    Check tar exit status when creating archive
    
    Check the exit status of the tar process and abort without
    cleaning up if non-zero.
    
    Signed-off-by: Bryn M. Reeves <bmr>

Comment 23 errata-xmlrpc 2014-10-14 07:23:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1528.html


Note You need to log in before you can comment on or make changes to this bug.