Bug 1320182 - caught exception in sosreport at plugin method "logs.collect()"
Summary: caught exception in sosreport at plugin method "logs.collect()"
Keywords:
Status: CLOSED DUPLICATE of bug 1183244
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: sos
Version: 8.0 (Liberty)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 8.0 (Liberty)
Assignee: Lee Yarwood
QA Contact: BaseOS QE - Apps
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-22 14:15 UTC by Poornima
Modified: 2016-09-16 10:03 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-16 10:03:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs-plugin-errors.txt (936 bytes, text/plain)
2016-03-23 05:52 UTC, Poornima
no flags Details

Description Poornima 2016-03-22 14:15:07 UTC
Description of problem:
While running sosreport on RHELOSP8 below exception is caught 

<snip>
 Running 43/103: logs...        caught exception in plugin method "logs.collect()"
writing traceback to sos_logs/logs-plugin-errors.txt
  Running 44/103: lsbrelease...        caught exception in plugin method "lsbrelease.collect()"
writing traceback to sos_logs/lsbrelease-plugin-errors.txt
  Running 103/103: yum...                        
</snip>

Version-Release number of selected component (if applicable):

sos-3.2-36.el7ost.2


Steps to Reproduce:
1. configure rhos-release 8 latest puddle 

rhos-release 8 -p 2016-03-10.1
Installed: /etc/yum.repos.d/rhos-release-rhel-7.2.repo
Installed: /etc/yum.repos.d/rhos-release-8.repo
# rhos-release 8   -p 2016-03-10.1

2. Run sosreport 


Actual results:

# sosreport 

sosreport (version 3.2)

This command will collect diagnostic and configuration information from
this Red Hat Enterprise Linux system and installed applications.

An archive containing the collected information will be generated in
/var/tmp/sos.eXbMTB and may be provided to a Red Hat support
representative.

Any information provided to Red Hat will be treated in accordance with
the published support policies at:

  https://access.redhat.com/support/

The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.

No changes will be made to system configuration.

Press ENTER to continue, or CTRL-C to quit.

Please enter your first initial and last name [overcloud-controller-0.localdomain]: dry
Please enter the case id that you are generating this report for []: run

 Setting up archive ...
 Setting up plugins ...
 Running plugins. Please wait ...

  Running 43/103: logs...        caught exception in plugin method "logs.collect()"
writing traceback to sos_logs/logs-plugin-errors.txt
  Running 44/103: lsbrelease...        caught exception in plugin method "lsbrelease.collect()"
writing traceback to sos_logs/lsbrelease-plugin-errors.txt
  Running 103/103: yum...                        
Creating compressed archive...

Your sosreport has been generated and saved in:
  /var/tmp/sosreport-dry.run-20160323001919.tar.xz

The checksum is: 23dad635a81081aead7ab39535e6a545

Please send this file to your support representative.


Expected results:

No exception must be caught. 

Additional info:

Comment 2 Poornima 2016-03-23 05:52:59 UTC
Created attachment 1139336 [details]
logs-plugin-errors.txt

attaching logs-plugin-errors.txt

Comment 3 Lee Yarwood 2016-03-23 10:15:08 UTC
Poornima, this just appears to be due to the host in question being under considerable memory pressure. Can you confirm how much free memory was available at the time you ran sosreport, the total amount of memory on the host, what else is running on the node etc?

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1252, in collect
    plug.collect()
  File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 716, in collect
    self._collect_cmd_output()
  File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 695, in _collect_cmd_output
    stderr=stderr, chroot=chroot, runat=runat)
  File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 633, in get_cmd_output_now
    chroot=chroot, runat=runat)
  File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 525, in get_command_output
    chroot=root, chdir=runat)
  File "/usr/lib/python2.7/site-packages/sos/utilities.py", line 155, in sos_get_command_output
    'output': stdout.decode('utf-8', 'ignore')
  File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
MemoryError

Comment 4 Bryn M. Reeves 2016-03-23 13:18:41 UTC
We still should try to fail cleanly in this situation.


>  File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
>    return codecs.utf_8_decode(input, errors, True)
> MemoryError

This may be caused by a process writing huge amounts of data to stdio rather than systemic memory problems.

Comment 5 Poornima 2016-04-04 05:44:01 UTC
Lee this issue is reported both on under and overcloud deployments:

memory available in nodes node :
Undercloud :
$ free - m
total used free shared buff/cache available
Mem: 15977888 4474632 4448168 504 7055088 11191844
Swap: 0 0 0

Overcloud compute node:

$ free -m
total used free shared buff/cache available
Mem: 9601 5617 237 54 3746 3598
Swap: 0 0 0

Comment 7 Bryn M. Reeves 2016-04-04 10:44:24 UTC
What is being called at the time of the failure? It is outputting more data than can fit into physical memory.

Comment 9 Poornima 2016-04-13 12:27:41 UTC
Isolation :
This is reproducible in specific scenarios where journal  daemon  is running on the system and collects data resulting in huge logs right from system uptime .

Comment 11 Bryn M. Reeves 2016-04-13 12:57:32 UTC
> This is reproducible in specific scenarios

What are they?

Comment 12 Poornima 2016-04-29 10:34:26 UTC
Here are few scenario with journald inactive :

- No exception caught with RHEL7 sos-3.2-35.el7_2.3.noarch depolyed system
    it is observed when the wc > 1647020.

- over fresh director deployment this issue is not reproducible (journald service is not running).

- Exception is captured observed on osp 8 when the amount of logs in the journal increases
    $ journalctl -b | wc -l
    208219

worth to check for journal size and gather latest info
so we may ideally consider the above scenarios.

Comment 13 Bradford Nichols 2016-09-15 18:52:12 UTC
FYI - I'm seeing the issue on my lab systems. Director installed. Virtual. ,... but I am way below the recommended minimum memory stated in the director install guide - "Memory A minimum of 32 GB of RAM for each Controller node. For optimal performance, it is recommended to use 64 GB for each Controller node."

So this is information for you. 

Regards,
Brad

My v8 (12.0.4) test cloud overcloud controller

root@overcloud-controller-0 ~]# free - m
              total        used        free      shared  buff/cache   available
Mem:        8010956     5522936      828080       41308     1659940     2108808
Swap:          4092        4092           0
[root@overcloud-controller-0 ~]# sosreport

sosreport (version 3.2)
...
Setting up archive ...
 Setting up plugins ...
 Running plugins. Please wait ...

  Running 44/104: logs...        caught exception in plugin method "logs.collect()"
writing traceback to sos_logs/logs-plugin-errors.txt
  Running 104/104: yum...                        
Creating compressed archive...
...
[root@overcloud-controller-0 ~]# journalctl -b | wc -l
477727
[root@overcloud-controller-0 ~]# ps aux | grep journal
root       378  0.1  0.2  61820 18740 ?        Ss   Aug08  79:58 /usr/lib/systemd/systemd-journald


My v7 (2015.1.4) test cloud controller node
[root@overcloud-controller-0 ~]# free - m
              total        used        free      shared  buff/cache   available
Mem:        8010956     5713676     1392520       57360      904760     1967416
Swap:             0           0           0
[root@overcloud-controller-0 ~]# sosreport

sosreport (version 3.2)
...
Setting up archive ...
 Setting up plugins ...
 Running plugins. Please wait ...

  Running 44/103: logs...        Killed       
[root@overcloud-controller-0 ~]# journalctl -b | wc -l
438637
[root@overcloud-controller-0 ~]# ps aux | grep journal
root       368  0.1  0.6 102772 48872 ?        Ss   Aug05  68:13 /usr/lib/systemd/systemd-journald

Comment 14 Lee Yarwood 2016-09-16 10:03:44 UTC
Thanks Brad, this isn't something that I can see being addressed prior to RHEL 7.3 coming out and the current 7.2 fork of sosreport for OSP hitting EOL. As such I'm going to close this bug out as a duplicate of the RHEL bug 1183244.

*** This bug has been marked as a duplicate of bug 1183244 ***


Note You need to log in before you can comment on or make changes to this bug.