Red Hat Bugzilla – Bug 1482574
jars plugin makes sos sit indefinitely
Last modified: 2018-06-08 12:59:32 EDT
Description of problem: If I run sosreport on one box, (not to self sysdsat01 at DBAG) it never completes (well the longest I waited was 70 minutes). If I add -vvv I see tat it gets to setting up ipmitool but no further output after that. After some fiddling (trying all the profiles until I found one that hung and then the plugins of that profile), I fould that it's the jars plugin that is making it hang. Version-Release number of selected component (if applicable): s0s-3.4-6.el7.noarch How reproducible: always Steps to Reproduce: 1. sosreport -v Actual results: gets to setting up ipmitool and then no further output but a lot of disk read (as per iotop -o) Expected results: sosreport finishes the setup phase and then runs the plugins Additional info: sosreport -n jars works just fine sosreport -o jars hangs
# sosreport -vvv -o jars set sysroot to '/' (default) sosreport (version 3.4) This command will collect diagnostic and configuration information from this Red Hat Enterprise Linux system and installed applications. An archive containing the collected information will be generated in /var/tmp/sos.sMtUxZ and may be provided to a Red Hat support representative. Any information provided to Red Hat will be treated in accordance with the published support policies at: https://access.redhat.com/support/ The generated archive may contain data considered sensitive and its content should be reviewed by the originating organization before being passed to any third party. No changes will be made to system configuration. Press ENTER to continue, or CTRL-C to quit. Please enter your first initial and last name [REDACTED]: Please enter the case id that you are generating this report for []: Setting up archive ... [archive:TarFileArchive] initialised empty FileCacheArchive at '/var/tmp/sos.sMtUxZ/sosreport-REDACTED-20170817173054' [sos.sosreport:setup] executing 'sosreport -vvv -o jars' Setting up plugins ... it will sit at that point seemingly indefinitely
The root cause is imho here: /usr/lib/python2.7/site-packages/sos/plugins/jars.py jar_locations = ( "/usr/share/java", # common location for JARs "/usr/lib/java", # common location for JARs containing native code "/opt", # location for RHSCL and 3rd party software "/usr/local", # used by sysadmins when installing SW locally "/var/lib" # Java services commonly explode WARs there ) locations = list(Jars.jar_locations) .. for location in locations: for dirpath, _, filenames in os.walk(location): <do something here> If either of the jar_locations has many files/dirs, the os.walk can spend there whatever long time. You can try playing with it via running: find /var/lib | wc -l (and the same for other directories from the list) and if some such dir contains really many files/directories, let comment out that dir in the jar_locations (just ensure latest item in the list cant be followed by comma), and try re-running sosreport. Anyway, we will definitely need to remove the "/var/lib" dir from the list. As e.g. /var/lib/pulp can have millions of files there..
This needs to go to z-stream. Without that, e.g. Satellite6 having millions of files under /var/lib/pulp would not be able to run sosreport (in a reasonable time). Workaround - disable jars plug-in. (please update the KCS with other products affected)
*** Bug 1483397 has been marked as a duplicate of this bug. ***
You can avoid this for a single run by disabling jars with -n: # sosreport -n jars Or persistently by adding a line to the 'plugins' section of /etc/sos.conf: [plugins] disable = jars (use a comma-separated list if you wish to disable multiple plugins).
Upstream PR: https://github.com/sosreport/sos/pull/1077 Steve, could you pls. pm_ack for 7.5 (in fact we would like to get into 7.4.z even where we need pm_ack now)?
Reproducer steps for QE: - have a system with million files/dirs under /var/lib (or /opt or usr/local) - an example is Satellite6 with more synchronized repositories, putting all files (RPMs, repo metadata etc) under /var/lib/pulp - run sosreport (with jars plug-in enabled) If necessary, I can provide a reproducer machine for verification.
(In reply to Pavel Moravec from comment #6) This is from a lightly loaded Sat6. Few hosts attached to it but a lot of software synced to it. Give a shout if you want me to check a couple more (I have one prod Sat I can ask the find /var/lib on and I have a staging Sat at this customer) [...] > You can try playing with it via running: > > find /var/lib | wc -l > > (and the same for other directories from the list) # find /usr/share/java | wc -l 812 # find /usr/lib/java | wc -l 3 # find /opt | wc -l 61069 # find /usr/local | wc -l 38 # time find /var/lib | wc -l 11639625 real 45m53.524s user 0m31.590s sys 4m18.416s So it seems indeed that the 11.5 million files under /var/lib of this Satellite are the problem.
*** Bug 1486377 has been marked as a duplicate of this bug. ***
*** Bug 1495872 has been marked as a duplicate of this bug. ***
Posted to upstream via https://github.com/sosreport/sos/commit/6fc42802b87f95dba1d6bfda49ae158143e7799c
Fixed via sos 3.5 rebase.
*** Bug 1530401 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0963