Bug 1174155

Summary: [RHEL7][log-collector] Missing some info from host's archive due to sos 3 refactoring
Product: Red Hat Enterprise Virtualization Manager Reporter: Petr Beňas <pbenas>
Component: ovirt-log-collectorAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED ERRATA QA Contact: Gonza <grafuls>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: bmr, dfediuck, didi, gklein, grafuls, lsurette, nyechiel, pstehlik, rbalakri, Rhev-m-bugs, sbonazzo, stirabos, yeylon, ykaul
Target Milestone: ovirt-3.6.0-rcKeywords: Regression, ZStream
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, some host information was missing in sosreports. This was caused by sos 3 using a different plug-in schema than in previous sos versions. The rhevm-log-collector has been reconfigured so the log collector can get the required information to create complete sosreports.
Story Points: ---
Clone Of:
: 1175137 (view as bug list) Environment:
Last Closed: 2016-03-09 19:59:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1175137    

Description Petr Beňas 2014-12-15 09:50:54 UTC
Description of problem:
Some host information is not collected. This info was collected in previous versions, probably a regression. 

Version-Release number of selected component (if applicable):
rhevm-log-collector-3.5.0-4.el6ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. engine-log-collector collect
2. inspect the archive for a host

Actual results:
Bellow snippet of test log indicates which files are missing from the host's archive. 
0 not in <1:1>; .*/ifconfig
2 not in <3:3>; .*/ls(mod|of|pci)
0 not in <1:1>; .*/vgdisplay
0 not in <1:1>; .*/lib/modules/2[.]6[.].*
0 not in <1:1>; .*/lib/modules/2[.]6[.].*/modules.dep
0 not in <1:1>; .*/proc/cpuinfo
0 not in <1:1>; .*/proc/iomem
0 not in <1:1>; .*/proc/ioports
0 not in <1:1>; .*/proc/mdstat
0 not in <1:1>; .*/proc/partitions
0 not in <1:1>; .*/proc/bus
0 not in <1:1>; .*/proc/scsi
0 not in <1:1>; .*/sbin

Also manually verified these files are missing from the collected host's archive. 

Expected results:
All the above files (or command outputs) are collected from hosts. 

Additional info:
Discovered by automated test
http://jenkins.qa.lab.tlv.redhat.com:8080/view/RhevmCore/view/3.5-ALL/job/3.5-git-rhevmCore-infra_tools_log_collector_nfs/

Comment 1 Doron Fediuck 2014-12-16 07:22:25 UTC
Which OS was it running on?

I suspect ifconfig may not be relevant for el7, and the same goes for
kernel 2.6.x.

Comment 2 Sandro Bonazzola 2014-12-16 07:33:07 UTC
Also /lib and /sbin does not exists anymore on EL7.

Comment 4 Petr Beňas 2014-12-16 12:11:14 UTC
(In reply to Doron Fediuck from comment #1)
> Which OS was it running on?
> 
> I suspect ifconfig may not be relevant for el7, and the same goes for
> kernel 2.6.x.

It was el7. Agree about /lib and /sbin. 
How about the missing entries in /proc? And why isn't ifconfig relevant? Could output of `ip a` or equivalent info be found somewhere else in the archive?

Comment 5 Doron Fediuck 2014-12-16 12:42:14 UTC
Using 'ip a' will not require an additional dependency, so this is a better
approach. However, it will change the report format in case someone has a 
script to analyze it. 

Sandro, can you please list here the expected changes (sample output) once
adapted to RHEL7? It should be properly documented.

Comment 6 Simone Tiraboschi 2014-12-16 14:21:06 UTC
Using sos-report on el7, 'ip a' are already collected and present in ip_addr file. ifconfig has been obsoleted by ip and so I think we simply have to fix any dependent script.

Here a sample:
[root@rl66n1 test]# cat log-collector-data/r70st7.localdomain/sosreport-r70st7.localdomain-20141216143111/ip_addr
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
1: lo    inet6 ::1/128 scope host \       valid_lft forever preferred_lft forever
2: eth0    inet6 fe80::21a:4aff:fe4f:bd22/64 scope link \       valid_lft forever preferred_lft forever
5: rhevm    inet 192.168.1.132/24 brd 192.168.1.255 scope global dynamic rhevm\       valid_lft 172408sec preferred_lft 172408sec
5: rhevm    inet6 fe80::21a:4aff:fe4f:bd22/64 scope link \       valid_lft forever preferred_lft forever


Checking about the missing proc entries.
It's reproducible also on my setup.

Comment 7 Simone Tiraboschi 2014-12-16 15:00:27 UTC
On el7 we are missing at least processor, pci, md, block and scsi sos plugins.

Comment 8 Simone Tiraboschi 2014-12-16 16:23:47 UTC
With the proposed patch we get
no: .*/ifconfig
ok: .*/ip_addr
ok: .*/ls(mod|of|pci)
ok: .*/vgdisplay
no: .*/lib/modules/2[.]6[.].*
no: .*/lib/modules/2[.]6[.].*/modules.dep
ok: .*/lib/modules/3[.]10[.].*
ok: .*/lib/modules/3[.]10[.].*/modules.dep
ok: .*/proc/cpuinfo
ok: .*/proc/iomem
ok: .*/proc/ioports
ok: .*/proc/mdstat
ok: .*/proc/partitions
ok: .*/proc/bus
ok: .*/proc/scsi
no: .*/sbin

The test would probably need to be updated too.

Comment 10 Sandro Bonazzola 2014-12-17 07:47:55 UTC
Adding Bryn to this bug, he may help with differences at sos level.

Comment 13 Bryn M. Reeves 2014-12-17 11:55:14 UTC
The ifconfig -> ip change was required since ifconfig cannot report on biosdevname and other modern interface name types (since it's used in sos to detect all interfaces this is a significant problem).

I'm not sure about the other items you mention; some of them are definitely still collected (e.g. mdstat) and some have been removed (/proc/bus - partly moved to pci which now collects /proc/bus/pci but we can add other subdirectories if needed).

It's difficult to say exactly why you're seeing all those differences though; is it possible to see the version of sos and plugins you're using and some sample output?

Comment 14 Petr Beňas 2014-12-17 12:24:58 UTC
(In reply to Simone Tiraboschi from comment #8)
> With the proposed patch we get
> no: .*/ifconfig
> ok: .*/ip_addr
> ok: .*/ls(mod|of|pci)
> ok: .*/vgdisplay
> no: .*/lib/modules/2[.]6[.].*
> no: .*/lib/modules/2[.]6[.].*/modules.dep
> ok: .*/lib/modules/3[.]10[.].*
> ok: .*/lib/modules/3[.]10[.].*/modules.dep
> ok: .*/proc/cpuinfo
> ok: .*/proc/iomem
> ok: .*/proc/ioports
> ok: .*/proc/mdstat
> ok: .*/proc/partitions
> ok: .*/proc/bus
> ok: .*/proc/scsi
> no: .*/sbin

How about /usr/sbin? 

> 
> The test would probably need to be updated too.

Comment 15 Bryn M. Reeves 2014-12-17 13:16:49 UTC
Btw, what is the "proposed patch" mentioned in comment #8?

Comment 16 Simone Tiraboschi 2014-12-17 13:27:47 UTC
We need to handle at the same time 1.7, 2.2, 3.0, 3.2.
The user can launch engine-log-collector on the engine host, it collects the engine host itself sos reports plus sos reports from each manged hypervisor hosts.
The hypervisor hosts could run different OSs and so different sos but the result should be as similar as possible.

I'm proposing this patch: 
http://gerrit.ovirt.org/36219
to enable also processor, pci, md, block, scsi, multipath, systemd, sanloc and lvm2 plugins on 3.0 and 3.2 to get something similar to what we got on 2.2 with just
libvirt, vdsm, general, networking, hardware, process, yum, filesys, devicemapper, selinux, kernel, memory, rpm.

Do you think we are missing something else on 3.0 and 3.2?

Comment 17 Bryn M. Reeves 2014-12-17 14:20:51 UTC
> We need to handle at the same time 1.7, 2.2, 3.0, 3.2.

We're really supporting RHEL5 hypervisors? Considering that release is now in maintenance we will not be able to make any changes to help on the os side.

> I'm proposing this patch: 
> http://gerrit.ovirt.org/36219
> to enable also processor, pci, md, block, scsi, multipath, systemd, sanloc and > lvm2 plugins on 3.0 and 3.2 to get something similar to what we got on 2.2 > 
> with just libvirt, vdsm, general, networking, hardware, process, yum, filesys, 
> devicemapper, selinux, kernel, memory, rpm.

I'd recommend not using any hard-coded plugin lists; they've broken numerous times in the past and will continue to do so.

Sos is designed to enable the most appropriate plugins for the environment where it runs by default. If there are problems for RHEV with the default set then PLEASE work with us; working around the problem in a way that is hidden to us helps nobody.

E.g. in 3.2 we now have profiles support which seems to be exactly what RHEV wants - it's trivial now to define a "rhev" profile that will include just the plugins you want.

I often hear from GSS engineers that RHEV sosreport as "weird" and missing commonly used information - looking at the lists in the attached python file e.g. the "report3" list seems to be missing:

auditd, boot, cgroups, devicemapper, filesys, general, hardware, kernel, memory, process, hardware, yum

Are these added elsewhere? Either way I would still suggest not using plugin lists - we can still make updates to RHEL6 and 7 packages to accommodate RHEV's needs but only if we are aware of them.

Comment 18 Simone Tiraboschi 2014-12-17 14:43:18 UTC
(In reply to Bryn M. Reeves from comment #17)
> I'd recommend not using any hard-coded plugin lists; they've broken numerous
> times in the past and will continue to do so.
> 
> Sos is designed to enable the most appropriate plugins for the environment
> where it runs by default. If there are problems for RHEV with the default
> set then PLEASE work with us; working around the problem in a way that is
> hidden to us helps nobody.

It's just to reduce the final archive size, we have other bugs where people already complains about file size.

> E.g. in 3.2 we now have profiles support which seems to be exactly what RHEV
> wants - it's trivial now to define a "rhev" profile that will include just
> the plugins you want.

It's great but it's only since 3.2 and we still have to work with RHEL 7.0 for a long time.
By the way, I'll open a RFE to use it on 3.2.

> I often hear from GSS engineers that RHEV sosreport as "weird" and missing
> commonly used information - looking at the lists in the attached python file
> e.g. the "report3" list seems to be missing:
> 
> auditd, boot, cgroups, devicemapper, filesys, general, hardware, kernel,
> memory, process, hardware, yum
> 
> Are these added elsewhere?

Yes, report3 plugins are just appended after the list of plugin we use for sos 2.2

> Either way I would still suggest not using plugin
> lists - we can still make updates to RHEL6 and 7 packages to accommodate
> RHEV's needs but only if we are aware of them.

I'll open a RFE to use a sos profile on RHEL 7.1 with sos 3.2; but what are you proposing for RHEL6 and RHEL 7.0?

Comment 19 Bryn M. Reeves 2014-12-17 15:11:01 UTC
> It's just to reduce the final archive size, we have other bugs where people 
> already complains about file size.

Where are the sos bugs for these problems? And why not use the log_size options? (which are available globally from 3.2).

> Yes, report3 plugins are just appended after the list of plugin we use for sos 
> 2.2

OK cool - that should work (assuming no further plugin set changes for now) as there are no plugins in the 2.2 list that were removed in 3.x

> I'll open a RFE to use a sos profile on RHEL 7.1 with sos 3.2; but what are 
> you proposing for RHEL6 and RHEL 7.0?

We're planning to backport 3.2 to RHEL6.7 (bug 1144525).

For 7.0 if there are key things we can backport (the profiles patch is actually very simple - I would not be opposed to backporting it) then we'd be happy to do that to have everything in sync as much a possible.

Comment 21 Sandro Bonazzola 2015-02-20 11:08:02 UTC
Automated message: can you please update doctext or set it as not required?

Comment 25 errata-xmlrpc 2016-03-09 19:59:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0392.html