Description of problem: While running sosreport on RHELOSP8 below exception is caught <snip> Running 43/103: logs... caught exception in plugin method "logs.collect()" writing traceback to sos_logs/logs-plugin-errors.txt Running 44/103: lsbrelease... caught exception in plugin method "lsbrelease.collect()" writing traceback to sos_logs/lsbrelease-plugin-errors.txt Running 103/103: yum... </snip> Version-Release number of selected component (if applicable): sos-3.2-36.el7ost.2 Steps to Reproduce: 1. configure rhos-release 8 latest puddle rhos-release 8 -p 2016-03-10.1 Installed: /etc/yum.repos.d/rhos-release-rhel-7.2.repo Installed: /etc/yum.repos.d/rhos-release-8.repo # rhos-release 8 -p 2016-03-10.1 2. Run sosreport Actual results: # sosreport sosreport (version 3.2) This command will collect diagnostic and configuration information from this Red Hat Enterprise Linux system and installed applications. An archive containing the collected information will be generated in /var/tmp/sos.eXbMTB and may be provided to a Red Hat support representative. Any information provided to Red Hat will be treated in accordance with the published support policies at: https://access.redhat.com/support/ The generated archive may contain data considered sensitive and its content should be reviewed by the originating organization before being passed to any third party. No changes will be made to system configuration. Press ENTER to continue, or CTRL-C to quit. Please enter your first initial and last name [overcloud-controller-0.localdomain]: dry Please enter the case id that you are generating this report for []: run Setting up archive ... Setting up plugins ... Running plugins. Please wait ... Running 43/103: logs... caught exception in plugin method "logs.collect()" writing traceback to sos_logs/logs-plugin-errors.txt Running 44/103: lsbrelease... caught exception in plugin method "lsbrelease.collect()" writing traceback to sos_logs/lsbrelease-plugin-errors.txt Running 103/103: yum... Creating compressed archive... Your sosreport has been generated and saved in: /var/tmp/sosreport-dry.run-20160323001919.tar.xz The checksum is: 23dad635a81081aead7ab39535e6a545 Please send this file to your support representative. Expected results: No exception must be caught. Additional info:
Created attachment 1139336 [details] logs-plugin-errors.txt attaching logs-plugin-errors.txt
Poornima, this just appears to be due to the host in question being under considerable memory pressure. Can you confirm how much free memory was available at the time you ran sosreport, the total amount of memory on the host, what else is running on the node etc? Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1252, in collect plug.collect() File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 716, in collect self._collect_cmd_output() File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 695, in _collect_cmd_output stderr=stderr, chroot=chroot, runat=runat) File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 633, in get_cmd_output_now chroot=chroot, runat=runat) File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 525, in get_command_output chroot=root, chdir=runat) File "/usr/lib/python2.7/site-packages/sos/utilities.py", line 155, in sos_get_command_output 'output': stdout.decode('utf-8', 'ignore') File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) MemoryError
We still should try to fail cleanly in this situation. > File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode > return codecs.utf_8_decode(input, errors, True) > MemoryError This may be caused by a process writing huge amounts of data to stdio rather than systemic memory problems.
Lee this issue is reported both on under and overcloud deployments: memory available in nodes node : Undercloud : $ free - m total used free shared buff/cache available Mem: 15977888 4474632 4448168 504 7055088 11191844 Swap: 0 0 0 Overcloud compute node: $ free -m total used free shared buff/cache available Mem: 9601 5617 237 54 3746 3598 Swap: 0 0 0
What is being called at the time of the failure? It is outputting more data than can fit into physical memory.
Isolation : This is reproducible in specific scenarios where journal daemon is running on the system and collects data resulting in huge logs right from system uptime .
> This is reproducible in specific scenarios What are they?
Here are few scenario with journald inactive : - No exception caught with RHEL7 sos-3.2-35.el7_2.3.noarch depolyed system it is observed when the wc > 1647020. - over fresh director deployment this issue is not reproducible (journald service is not running). - Exception is captured observed on osp 8 when the amount of logs in the journal increases $ journalctl -b | wc -l 208219 worth to check for journal size and gather latest info so we may ideally consider the above scenarios.
FYI - I'm seeing the issue on my lab systems. Director installed. Virtual. ,... but I am way below the recommended minimum memory stated in the director install guide - "Memory A minimum of 32 GB of RAM for each Controller node. For optimal performance, it is recommended to use 64 GB for each Controller node." So this is information for you. Regards, Brad My v8 (12.0.4) test cloud overcloud controller root@overcloud-controller-0 ~]# free - m total used free shared buff/cache available Mem: 8010956 5522936 828080 41308 1659940 2108808 Swap: 4092 4092 0 [root@overcloud-controller-0 ~]# sosreport sosreport (version 3.2) ... Setting up archive ... Setting up plugins ... Running plugins. Please wait ... Running 44/104: logs... caught exception in plugin method "logs.collect()" writing traceback to sos_logs/logs-plugin-errors.txt Running 104/104: yum... Creating compressed archive... ... [root@overcloud-controller-0 ~]# journalctl -b | wc -l 477727 [root@overcloud-controller-0 ~]# ps aux | grep journal root 378 0.1 0.2 61820 18740 ? Ss Aug08 79:58 /usr/lib/systemd/systemd-journald My v7 (2015.1.4) test cloud controller node [root@overcloud-controller-0 ~]# free - m total used free shared buff/cache available Mem: 8010956 5713676 1392520 57360 904760 1967416 Swap: 0 0 0 [root@overcloud-controller-0 ~]# sosreport sosreport (version 3.2) ... Setting up archive ... Setting up plugins ... Running plugins. Please wait ... Running 44/103: logs... Killed [root@overcloud-controller-0 ~]# journalctl -b | wc -l 438637 [root@overcloud-controller-0 ~]# ps aux | grep journal root 368 0.1 0.6 102772 48872 ? Ss Aug05 68:13 /usr/lib/systemd/systemd-journald
Thanks Brad, this isn't something that I can see being addressed prior to RHEL 7.3 coming out and the current 7.2 fork of sosreport for OSP hitting EOL. As such I'm going to close this bug out as a duplicate of the RHEL bug 1183244. *** This bug has been marked as a duplicate of bug 1183244 ***