Bug 1320182
| Summary: | caught exception in sosreport at plugin method "logs.collect()" | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Poornima <pkshiras> | ||||
| Component: | sos | Assignee: | Lee Yarwood <lyarwood> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | BaseOS QE - Apps <qe-baseos-apps> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 8.0 (Liberty) | CC: | abehl, agk, bmr, bradnichols, gavin, pkshiras, plambri, sbradley | ||||
| Target Milestone: | --- | Keywords: | ZStream | ||||
| Target Release: | 8.0 (Liberty) | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-09-16 10:03:44 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Poornima
2016-03-22 14:15:07 UTC
Created attachment 1139336 [details]
logs-plugin-errors.txt
attaching logs-plugin-errors.txt
Poornima, this just appears to be due to the host in question being under considerable memory pressure. Can you confirm how much free memory was available at the time you ran sosreport, the total amount of memory on the host, what else is running on the node etc?
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1252, in collect
plug.collect()
File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 716, in collect
self._collect_cmd_output()
File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 695, in _collect_cmd_output
stderr=stderr, chroot=chroot, runat=runat)
File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 633, in get_cmd_output_now
chroot=chroot, runat=runat)
File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 525, in get_command_output
chroot=root, chdir=runat)
File "/usr/lib/python2.7/site-packages/sos/utilities.py", line 155, in sos_get_command_output
'output': stdout.decode('utf-8', 'ignore')
File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
MemoryError
We still should try to fail cleanly in this situation.
> File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
> return codecs.utf_8_decode(input, errors, True)
> MemoryError
This may be caused by a process writing huge amounts of data to stdio rather than systemic memory problems.
Lee this issue is reported both on under and overcloud deployments: memory available in nodes node : Undercloud : $ free - m total used free shared buff/cache available Mem: 15977888 4474632 4448168 504 7055088 11191844 Swap: 0 0 0 Overcloud compute node: $ free -m total used free shared buff/cache available Mem: 9601 5617 237 54 3746 3598 Swap: 0 0 0 What is being called at the time of the failure? It is outputting more data than can fit into physical memory. Isolation : This is reproducible in specific scenarios where journal daemon is running on the system and collects data resulting in huge logs right from system uptime . > This is reproducible in specific scenarios
What are they?
Here are few scenario with journald inactive :
- No exception caught with RHEL7 sos-3.2-35.el7_2.3.noarch depolyed system
it is observed when the wc > 1647020.
- over fresh director deployment this issue is not reproducible (journald service is not running).
- Exception is captured observed on osp 8 when the amount of logs in the journal increases
$ journalctl -b | wc -l
208219
worth to check for journal size and gather latest info
so we may ideally consider the above scenarios.
FYI - I'm seeing the issue on my lab systems. Director installed. Virtual. ,... but I am way below the recommended minimum memory stated in the director install guide - "Memory A minimum of 32 GB of RAM for each Controller node. For optimal performance, it is recommended to use 64 GB for each Controller node."
So this is information for you.
Regards,
Brad
My v8 (12.0.4) test cloud overcloud controller
root@overcloud-controller-0 ~]# free - m
total used free shared buff/cache available
Mem: 8010956 5522936 828080 41308 1659940 2108808
Swap: 4092 4092 0
[root@overcloud-controller-0 ~]# sosreport
sosreport (version 3.2)
...
Setting up archive ...
Setting up plugins ...
Running plugins. Please wait ...
Running 44/104: logs... caught exception in plugin method "logs.collect()"
writing traceback to sos_logs/logs-plugin-errors.txt
Running 104/104: yum...
Creating compressed archive...
...
[root@overcloud-controller-0 ~]# journalctl -b | wc -l
477727
[root@overcloud-controller-0 ~]# ps aux | grep journal
root 378 0.1 0.2 61820 18740 ? Ss Aug08 79:58 /usr/lib/systemd/systemd-journald
My v7 (2015.1.4) test cloud controller node
[root@overcloud-controller-0 ~]# free - m
total used free shared buff/cache available
Mem: 8010956 5713676 1392520 57360 904760 1967416
Swap: 0 0 0
[root@overcloud-controller-0 ~]# sosreport
sosreport (version 3.2)
...
Setting up archive ...
Setting up plugins ...
Running plugins. Please wait ...
Running 44/103: logs... Killed
[root@overcloud-controller-0 ~]# journalctl -b | wc -l
438637
[root@overcloud-controller-0 ~]# ps aux | grep journal
root 368 0.1 0.6 102772 48872 ? Ss Aug05 68:13 /usr/lib/systemd/systemd-journald
Thanks Brad, this isn't something that I can see being addressed prior to RHEL 7.3 coming out and the current 7.2 fork of sosreport for OSP hitting EOL. As such I'm going to close this bug out as a duplicate of the RHEL bug 1183244. *** This bug has been marked as a duplicate of bug 1183244 *** |