Bug 1597532
| Summary: | "AttributeError: module 'stat' has no attribute 'ISBLK'" seen on ppc64le | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Pavel Moravec <pmoravec> | ||||
| Component: | sos | Assignee: | Pavel Moravec <pmoravec> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Miroslav Hradílek <mhradile> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 7.6 | CC: | agk, bmr, gavin, mhradile, plambri, pmoravec, salmy, sbradley | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | sos-3.6-3.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1614953 (view as bug list) | Environment: | |||||
| Last Closed: | 2018-10-30 10:33:42 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1614953 | ||||||
| Attachments: |
|
||||||
Reproducer of the current failure:
# have rhn as symlink
mv /etc/sysconfig/rhn /etc/sysconfig/rhn.target
ln -s /etc/sysconfig/rhn.target /etc/sysconfig/rhn
# have a symlink pointing "into" the rhn symlink
date > /etc/sysconfig/rhn/rhn.link.repo
ln -s /etc/sysconfig/rhn/rhn.link.repo /etc/yum.repos.d/rhn.link.repo
# call sos for 2 relevant plugins:
sosreport -o yum,system --batch --build
returns in sos_logs/system-plugin-errors.txt :
Traceback (most recent call last):
File "/root/sos-master/sos/sosreport.py", line 1104, in collect_plugin
plug.collect()
File "/root/sos-master/sos/plugins/__init__.py", line 947, in collect
self._collect_copy_specs()
File "/root/sos-master/sos/plugins/__init__.py", line 905, in _collect_copy_specs
self._do_copy_path(path)
File "/root/sos-master/sos/plugins/__init__.py", line 435, in _do_copy_path
self._copy_dir(srcpath)
File "/root/sos-master/sos/plugins/__init__.py", line 381, in _copy_dir
self._do_copy_path(os.path.join(srcpath, afile), dest=None)
File "/root/sos-master/sos/plugins/__init__.py", line 431, in _do_copy_path
self._copy_symlink(srcpath)
File "/root/sos-master/sos/plugins/__init__.py", line 353, in _copy_symlink
self.archive.add_link(reldest, dstpath)
File "/root/sos-master/sos/archive.py", line 315, in add_link
dest = self._check_path(link_name, P_LINK)
File "/root/sos-master/sos/archive.py", line 224, in _check_path
raise ValueError(ve_msg % (dest, "symbolic link"))
ValueError: path '/var/tmp/sos.qQBlL1/sosreport-pmoravec-sat63-2018-08-28-etvyibz/etc/sysconfig/rhn' exists and is not a symbolic link
So we shall have: etc/sysconfig/rhn: symbolic link to `rhn.target' etc/sysconfig/rhn.target: directory while we have etc/sysconfig/rhn a directory. So somewhere at https://github.com/sosreport/sos/blob/master/sos/plugins/__init__.py#L353 we have to detect this (maybe at archive.py#L315 ?) and: - move the dest (etc/sysconfig/rhn) to reldest (etc/sysconfig/rhn.target) - call add_link properly But this requires nontrivial change in archive.py :-/ (In reply to Pavel Moravec from comment #8) > So we shall have: > > etc/sysconfig/rhn: symbolic link to `rhn.target' > etc/sysconfig/rhn.target: directory > > while we have etc/sysconfig/rhn a directory. So somewhere at > > https://github.com/sosreport/sos/blob/master/sos/plugins/__init__.py#L353 > > we have to detect this (maybe at archive.py#L315 ?) and: > > - move the dest (etc/sysconfig/rhn) to reldest (etc/sysconfig/rhn.target) > - call add_link properly > > But this requires nontrivial change in archive.py :-/ .. (or dont fix consequences, but cause? when creating the rhn as directory, check if it isnt symlink, first) (In reply to Pavel Moravec from comment #9) > (In reply to Pavel Moravec from comment #8) > > So we shall have: > > > > etc/sysconfig/rhn: symbolic link to `rhn.target' > > etc/sysconfig/rhn.target: directory > > > > while we have etc/sysconfig/rhn a directory. So somewhere at > > > > https://github.com/sosreport/sos/blob/master/sos/plugins/__init__.py#L353 > > > > we have to detect this (maybe at archive.py#L315 ?) and: > > > > - move the dest (etc/sysconfig/rhn) to reldest (etc/sysconfig/rhn.target) > > - call add_link properly > > > > But this requires nontrivial change in archive.py :-/ > > .. (or dont fix consequences, but cause? when creating the rhn as directory, > check if it isnt symlink, first) That wont work, most probably. Described better on https://github.com/sosreport/sos/issues/1404 There are two things happening here (that are only possible due to the threadpool patches): - Automatic leading path creation is assuming that all non-basename path components are directories and not symlinks - The code that ends up adding /etc/sysconfig from the system plugin is trying to create the /etc/sysconfig/rhn symlink, and notices that it already exists but as a directory Historically this pattern was not a problem since the natural ordering of plugins meant that system always executed before yum: since that's no longer guaranteed with the threadpool changes it opens up a race: if yum runs before system we hit the problem during the tree copy of /etc/sysconfig. I am reluctant to make large changes here at this stage of development: a workaround may be the best we can manage for 7.6. Very simple (non-pep8 compliant) workaround:
diff --git a/sos/archive.py b/sos/archive.py
index 5d99170f..1ce349a4 100644
--- a/sos/archive.py
+++ b/sos/archive.py
@@ -220,7 +220,7 @@ class FileCacheArchive(Archive):
ve_msg = "path '%s' exists and is not a %s"
if path_type == P_FILE and not stat.S_ISREG(st.st_mode):
raise ValueError(ve_msg % (dest, "regular file"))
- if path_type == P_LINK and not stat.S_ISLNK(st.st_mode):
+ if path_type == P_LINK and not stat.S_ISLNK(st.st_mode) and not stat.S_ISDIR(st.st_mode):
raise ValueError(ve_msg % (dest, "symbolic link"))
if path_type == P_NODE and not is_special(st.st_mode):
raise ValueError(ve_msg % (dest, "special file"))
This just acknowledges the bug in the symlink leading path handling, and says that it's OK if the path exists and is a real directory. It allows both plugins to continue and collect data, but it's not a real solution: the main problem is that you can end up with two copies of the data, one that has the symlink "flattened" as a directory, and a second that has the name of the symlink target (rhn vs. rhn.target in Pavel's reproducer):
# ls sosreport-localhost-2018-08-29-gibskns/etc/sysconfig/rhn.target/
rhn.link.repo
# ls sosreport-localhost-2018-08-29-gibskns/etc/sysconfig/rhn
rhn.link.repo
This isn't the worst thing that sos has ever done, and at this stage it may be our best solution. I understand the real problem now, but a fix is going to be considerably more complex than this patch.
I am, as a QA, ok with the workaround. The first problem is that we do not properly honour a directory that is
actually a symbolic link in all cases: if add_copy_spec() is given a
path that is _under_ a symlinked directory, we flatten that out to an
actual directory of the same name:
/etc/rhn -> /etc/rhn.target
vs: /etc/rhn
This only causes problems when we later perform a tree copy into the
same target path: this time, the code looks to see if the object in the
archive matches the object that we are trying to copy in, and as this
time the original is a symlink but the object in the archive is a real
directory, we raise the error seen.
I don't especially like the proposed workaround: it's not an accurate
reflection of the host file system and it introduces ambiguity into the
report (which path shall I use? rhn? rhn.target?).
I have some changes that I'm working on now that replace the original
sos.archive.Archive._makedirs() method (a simple call to the Python
os.makedirs()) with a custom implementation that walks the path to the
destination and ensures that all objects are present and of the type
found in the main file system (iow we will create empty dirs as symlink
targets when needed). I'm not hugely fond of this approach, either, but
it seems to be the smallest set of changes needed to give the correct
behaviour here without the kind of side-effect the workaround causes.
I have this working now for the base case: it still exhibits the same
bug WRT symbolic links but I am hoping to be able to add that later
today or in the morning and to start more thorough testing.
I would prefer this to be the version that we ship in 7.6 because I
think it will be less confusing and more robust but I appreciate we are
running out of time (although that said, I am inclined to think this bug
warrants blocker+).
For upstream I think we need to have a serious look at how we are
managing file copying between the Plugin class and the Archive: right
now the division of labour is not well-defined, and following the logic
is confusing and a source of bugs.
Btw, are we filing a new bug for this? It really has nothing at all to do with bug 1597532.
Created attachment 1479838 [details]
Create leading path components in sos.archive.Archive
Work-in-progress patch to replace Archive._makedirs() with Archive._make_leading_paths() to allow for the option of handling symlinked directories.
For this to fix the case in Pavel's reproducer it needs to be extended to check when the source path is a symbolic link, and to both create the target directory, and referring symlink during the call (and propagate permissions etc. as needed).
(In reply to Bryn M. Reeves from comment #14) > Btw, are we filing a new bug for this? It really has nothing at all to do > with bug 1597532. Separated to https://bugzilla.redhat.com/show_bug.cgi?id=1624043 . Moving this BZ back to ON_QA to deal that finding in the new BZ. (I just can't reset Verified=FailedQA /o\ ) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:3144 |
Description of problem: On a ppc64le arch, below error was raised: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1088, in collect_plugin plug.collect() File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 947, in collect self._collect_copy_specs() File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 905, in _collect_copy_specs self._do_copy_path(path) File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 454, in _do_copy_path self.archive.add_file(srcpath, dest) File "/usr/lib/python2.7/site-packages/sos/archive.py", line 243, in add_file dest = self._check_path(dest, P_FILE) File "/usr/lib/python2.7/site-packages/sos/archive.py", line 228, in _check_path print "dest=%s path_type=%s (is %s?) special=%s mode=%s" %(dest, path_type, P_NODE, is_special(st.st_mode), st.st_mode) File "/usr/lib/python2.7/site-packages/sos/archive.py", line 208, in is_special stat.ISBLK(mode), AttributeError: 'module' object has no attribute 'ISBLK' by multiple plugins (kernel, logs, networking, powerpc, systemd,..) when running on many threads. That bug is captured in: https://github.com/sosreport/sos/issues/1373 so it is just a matter of backporting the typo. Version-Release number of selected component (if applicable): sos-3.6-2 How reproducible: 100% on ppc64le system I have Steps to Reproduce: 1. sosreport --threads=30 --batch --build (optionally enable all plugins and add "-a" option to collect as much data as possible - whenever collecting same dir or special file by 2 plugins, youshould hit it - no idea why ppc64le exhibits the error every time while x86_64 does not) 2. check output Actual results: errors like: caught exception in plugin method "systemd.collect()" writing traceback to sos_logs/systemd-plugin-errors.txt Expected results: no such errors Additional info: