Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1320182

Summary:

caught exception in sosreport at plugin method "logs.collect()"

Product:

Red Hat OpenStack

Reporter:

Poornima <pkshiras>

Component:

sos

Assignee:

Lee Yarwood <lyarwood>

Status:

CLOSED DUPLICATE

QA Contact:

BaseOS QE - Apps <qe-baseos-apps>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

8.0 (Liberty)

CC:

abehl, agk, bmr, bradnichols, gavin, pkshiras, plambri, sbradley

Target Milestone:

---

Keywords:

ZStream

Target Release:

8.0 (Liberty)

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2016-09-16 10:03:44 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
logs-plugin-errors.txt	none

Description Poornima 2016-03-22 14:15:07 UTC

Description of problem:
While running sosreport on RHELOSP8 below exception is caught

<snip>
Running 43/103: logs... caught exception in plugin method "logs.collect()"
writing traceback to sos_logs/logs-plugin-errors.txt
Running 44/103: lsbrelease... caught exception in plugin method "lsbrelease.collect()"
writing traceback to sos_logs/lsbrelease-plugin-errors.txt
Running 103/103: yum...
</snip>

Version-Release number of selected component (if applicable):

sos-3.2-36.el7ost.2

Steps to Reproduce:
1. configure rhos-release 8 latest puddle

rhos-release 8 -p 2016-03-10.1
Installed: /etc/yum.repos.d/rhos-release-rhel-7.2.repo
Installed: /etc/yum.repos.d/rhos-release-8.repo
# rhos-release 8 -p 2016-03-10.1

2. Run sosreport

Actual results:

# sosreport

sosreport (version 3.2)

This command will collect diagnostic and configuration information from
this Red Hat Enterprise Linux system and installed applications.

An archive containing the collected information will be generated in
/var/tmp/sos.eXbMTB and may be provided to a Red Hat support
representative.

Any information provided to Red Hat will be treated in accordance with
the published support policies at:

https://access.redhat.com/support/

The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.

No changes will be made to system configuration.

Press ENTER to continue, or CTRL-C to quit.

Please enter your first initial and last name [overcloud-controller-0.localdomain]: dry
Please enter the case id that you are generating this report for []: run

Setting up archive ...
Setting up plugins ...
Running plugins. Please wait ...

Running 43/103: logs... caught exception in plugin method "logs.collect()"
writing traceback to sos_logs/logs-plugin-errors.txt
Running 44/103: lsbrelease... caught exception in plugin method "lsbrelease.collect()"
writing traceback to sos_logs/lsbrelease-plugin-errors.txt
Running 103/103: yum...
Creating compressed archive...

Your sosreport has been generated and saved in:
/var/tmp/sosreport-dry.run-20160323001919.tar.xz

The checksum is: 23dad635a81081aead7ab39535e6a545

Please send this file to your support representative.

Expected results:

No exception must be caught.

Additional info:

Comment 2 Poornima 2016-03-23 05:52:59 UTC

Created attachment 1139336 [details]
logs-plugin-errors.txt

attaching logs-plugin-errors.txt

Comment 3 Lee Yarwood 2016-03-23 10:15:08 UTC

Poornima, this just appears to be due to the host in question being under considerable memory pressure. Can you confirm how much free memory was available at the time you ran sosreport, the total amount of memory on the host, what else is running on the node etc?

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1252, in collect
    plug.collect()
  File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 716, in collect
    self._collect_cmd_output()
  File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 695, in _collect_cmd_output
    stderr=stderr, chroot=chroot, runat=runat)
  File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 633, in get_cmd_output_now
    chroot=chroot, runat=runat)
  File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 525, in get_command_output
    chroot=root, chdir=runat)
  File "/usr/lib/python2.7/site-packages/sos/utilities.py", line 155, in sos_get_command_output
    'output': stdout.decode('utf-8', 'ignore')
  File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
MemoryError

Comment 4 Bryn M. Reeves 2016-03-23 13:18:41 UTC

We still should try to fail cleanly in this situation.


>  File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
>    return codecs.utf_8_decode(input, errors, True)
> MemoryError

This may be caused by a process writing huge amounts of data to stdio rather than systemic memory problems.

Comment 5 Poornima 2016-04-04 05:44:01 UTC

Lee this issue is reported both on under and overcloud deployments:

memory available in nodes node :
Undercloud :
$ free - m
total used free shared buff/cache available
Mem: 15977888 4474632 4448168 504 7055088 11191844
Swap: 0 0 0

Overcloud compute node:

$ free -m
total used free shared buff/cache available
Mem: 9601 5617 237 54 3746 3598
Swap: 0 0 0

Comment 7 Bryn M. Reeves 2016-04-04 10:44:24 UTC

What is being called at the time of the failure? It is outputting more data than can fit into physical memory.

Comment 9 Poornima 2016-04-13 12:27:41 UTC

Isolation :
This is reproducible in specific scenarios where journal  daemon  is running on the system and collects data resulting in huge logs right from system uptime .

Comment 11 Bryn M. Reeves 2016-04-13 12:57:32 UTC

> This is reproducible in specific scenarios

What are they?

Comment 12 Poornima 2016-04-29 10:34:26 UTC

Here are few scenario with journald inactive :

- No exception caught with RHEL7 sos-3.2-35.el7_2.3.noarch depolyed system
    it is observed when the wc > 1647020.

- over fresh director deployment this issue is not reproducible (journald service is not running).

- Exception is captured observed on osp 8 when the amount of logs in the journal increases
    $ journalctl -b | wc -l
    208219

worth to check for journal size and gather latest info
so we may ideally consider the above scenarios.

Comment 13 Bradford Nichols 2016-09-15 18:52:12 UTC

FYI - I'm seeing the issue on my lab systems. Director installed. Virtual. ,... but I am way below the recommended minimum memory stated in the director install guide - "Memory A minimum of 32 GB of RAM for each Controller node. For optimal performance, it is recommended to use 64 GB for each Controller node."

So this is information for you. 

Regards,
Brad

My v8 (12.0.4) test cloud overcloud controller

root@overcloud-controller-0 ~]# free - m
              total        used        free      shared  buff/cache   available
Mem:        8010956     5522936      828080       41308     1659940     2108808
Swap:          4092        4092           0
[root@overcloud-controller-0 ~]# sosreport

sosreport (version 3.2)
...
Setting up archive ...
 Setting up plugins ...
 Running plugins. Please wait ...

  Running 44/104: logs...        caught exception in plugin method "logs.collect()"
writing traceback to sos_logs/logs-plugin-errors.txt
  Running 104/104: yum...                        
Creating compressed archive...
...
[root@overcloud-controller-0 ~]# journalctl -b | wc -l
477727
[root@overcloud-controller-0 ~]# ps aux | grep journal
root       378  0.1  0.2  61820 18740 ?        Ss   Aug08  79:58 /usr/lib/systemd/systemd-journald


My v7 (2015.1.4) test cloud controller node
[root@overcloud-controller-0 ~]# free - m
              total        used        free      shared  buff/cache   available
Mem:        8010956     5713676     1392520       57360      904760     1967416
Swap:             0           0           0
[root@overcloud-controller-0 ~]# sosreport

sosreport (version 3.2)
...
Setting up archive ...
 Setting up plugins ...
 Running plugins. Please wait ...

  Running 44/103: logs...        Killed       
[root@overcloud-controller-0 ~]# journalctl -b | wc -l
438637
[root@overcloud-controller-0 ~]# ps aux | grep journal
root       368  0.1  0.6 102772 48872 ?        Ss   Aug05  68:13 /usr/lib/systemd/systemd-journald

Comment 14 Lee Yarwood 2016-09-16 10:03:44 UTC

Thanks Brad, this isn't something that I can see being addressed prior to RHEL 7.3 coming out and the current 7.2 fork of sosreport for OSP hitting EOL. As such I'm going to close this bug out as a duplicate of the RHEL bug 1183244.

*** This bug has been marked as a duplicate of bug 1183244 ***