Bug 1085042
Summary: | sosreport will fail with "no space left" error, but it appears to generate a valid sosreport anyway | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Wesley Duffee-Braun <wduffee> | |
Component: | sos | Assignee: | Bryn M. Reeves <bmr> | |
Status: | CLOSED ERRATA | QA Contact: | David Kutálek <dkutalek> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 6.4 | CC: | agk, bmr, dkutalek, gavin, rlavande, sacpatil, wduffee | |
Target Milestone: | rc | Keywords: | UserExperience | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | sos-2.2-53.el6 | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
Previous versions of sos failed to correctly handle file system exceptions resulting from out of space conditions.
Consequence:
Running sosreport with insufficient space could lead to thousands of errors being logged and the creation of an unusable report tarball.
Fix:
All IO paths in sosreport now correctly handle out-of-space and other fatal file system exceptions.
Result:
Attempting to run sos with insufficient space now results in an immediate descriptive error and the tool will not attempt to create a report archive.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1089003 (view as bug list) | Environment: | ||
Last Closed: | 2014-10-14 07:23:14 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1089003 | |||
Attachments: |
Description
Wesley Duffee-Braun
2014-04-07 16:10:51 UTC
In that situation we should bail with an error message when finalizing the archive fails. This should also leave the temporary tree in place to preserve diagnostic data. I've filed an upstream issue for this as although this part of the code has been completely rewritten I suspect it still fails in unhelpful ways on ENOSPC: https://github.com/sosreport/sos/issues/266 Btw, bug 399311 was closed WONTFIX without an explanation by another contributor. Although we're unlikely to change that behaviour in sos-2.2 (RHEL6) there's no problem with our revisiting the problem upstream and considering it for future releases (we have a couple of other situations that could be helped by a two-pass 'mark-and-sweep' style approach to file collection). Hi Bryn, I could see the benefit of reopening 399311, but for this BZ the customer would be happy with the bail+error described in Comment #2. Right now we are wasting cycles as the customer will upload what appears to be a good sosreport, and we don't know it is invalid until I pull it down and see that it errors on opening. Thanks! Wesley (In reply to Bryn M. Reeves from comment #3) > Btw, bug 399311 was closed WONTFIX without an explanation by another > contributor. Although we're unlikely to change that behaviour in sos-2.2 > (RHEL6) there's no problem with our revisiting the problem upstream and > considering it for future releases (we have a couple of other situations > that could be helped by a two-pass 'mark-and-sweep' style approach to file > collection). There are several problems here: * failure to gracefully handle ENOSPC coming out of Plugin.collect()->copyStuff()->doCopyFileOrDir(): Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/sos/plugintools.py", line 241, in doCopyFileOrDir tdstpath, abspath = self.__copyFile(srcpath) File "/usr/lib/python2.6/site-packages/sos/plugintools.py", line 270, in __copyFile os.makedirs(new_dir) File "/usr/lib64/python2.6/os.py", line 157, in makedirs mkdir(name, mode) OSError: [Errno 28] No space left on device: '/tmp/rhel6-vm1-2014040718191396891192/etc/sysconfig' * Similar problem in log flush path: Traceback (most recent call last): File "/usr/lib64/python2.6/logging/__init__.py", line 774, in emit self.flush() File "/usr/lib64/python2.6/logging/__init__.py", line 746, in flush self.stream.flush() IOError: [Errno 28] No space left on device * The tar error reported in comment #0. The last is actually the hardest to hit; you have to have just enough space for the temporary archive to fit but not enough space for the temporary archive _and_ the compressed copy to co-exist. > I could see the benefit of reopening 399311
Like I said; it's unlikely to be suitable for a RHEL6 update anyway but we could have at least used that bug to track some better error checking in 2.2. That's what I intend to use this bz for.
Steps to Reproduce: 0. It helps to start with a near-full file system 1. generate a first sosreport, e.g. # sosreport -v --batch -n rpm 2>&1 | tee /boot/sos.lg 2. extract the resulting tarball: # tar xf sosreport-*.tar.xz 3. Remove the tar archive # rm -f sosreport-*.tar.xz 4. Fill the file system completely # dd if=/dev/zero of=filler bs=128k 5. Remove the sosreport directory tree # rm -rf sosreport-*/ 6. generate a second sosreport with the same parameters # sosreport -v --batch -n rpm 2>&1 | tee /boot/sos.log This should reliably reproduce the finalizing problem without going too far off into the weeds on the first two. Created attachment 883724 [details]
Check tar exit status and fail if non-zero
Created attachment 883725 [details]
Check tar exit status and fail if non-zero
Remove a stray hunk patch from the previous attachment. This patch fixes the broken finalizing behaviour; sos will now exit and leave the temporary directory tree in place:
Traceback (most recent call last):
File "/usr/lib64/python2.6/logging/__init__.py", line 774, in emit
self.flush()
File "/usr/lib64/python2.6/logging/__init__.py", line 746, in flush
self.stream.flush()
IOError: [Errno 28] No space left on device
error collection output of 'yum -C repolist', traceback follows:
Creating compressed archive...
Failed to create compressed archive.
sosreport build tree is located at : /tmp/rhel6-vm1-2014040719351396895750
Created attachment 883755 [details]
Improve error reporting on mkdir failure
Created attachment 883756 [details]
Abort run on fatal IO exceptions
Abort the run if a plugin hits a fatal write IO exception (EROFS, ENOSPC). Do not clean up the temporary tree.
Signed-off-by: Bryn M. Reeves <bmr>
Created attachment 883757 [details]
Raise fatal IO exceptions in PluginBase
commit afdf2edaffc89fab80f8f2ac79280ffe91346cbe Author: Bryn M. Reeves <bmr> Date: Mon Apr 7 20:31:44 2014 +0100 Raise fatal IO exceptions in PluginBase Signed-off-by: Bryn M. Reeves <bmr> commit 4ef8b6883032a540e7744c2c48c973fc7fcd3cf4 Author: Bryn M. Reeves <bmr> Date: Mon Apr 7 20:30:05 2014 +0100 Abort run on fatal IO exceptions Abort the run if a plugin hits a fatal write IO exception (EROFS, ENOSPC). Do not clean up the temporary tree. Signed-off-by: Bryn M. Reeves <bmr> commit 5cf0e7a830cce8988e0def4714f21be2426bfdeb Author: Bryn M. Reeves <bmr> Date: Mon Apr 7 20:29:09 2014 +0100 Improve error reporting on mkdir failure Signed-off-by: Bryn M. Reeves <bmr> commit 07f08e997bc794bd717bb87f10b79c65497c3243 Author: Bryn M. Reeves <bmr> Date: Mon Apr 7 20:27:01 2014 +0100 Check tar exit status when creating archive Check the exit status of the tar process and abort without cleaning up if non-zero. Signed-off-by: Bryn M. Reeves <bmr> Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1528.html |