Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1140406

Summary: sos-report fails on rhev-h 3.2
Product: Red Hat Enterprise Virtualization Manager Reporter: Anitha Udgiri <audgiri>
Component: ovirt-nodeAssignee: Douglas Schilling Landgraf <dougsland>
Status: CLOSED DUPLICATE QA Contact: Pavel Stehlik <pstehlik>
Severity: medium Docs Contact:
Priority: high    
Version: 3.2.0CC: audgiri, bazulay, bmr, cshao, danken, dfediuck, didi, dougsland, ecohen, fdeutsch, hadong, huiwa, iheim, leiwang, lpeer, lsurette, lveyde, oourfali, pbandark, pstehlik, rbalakri, Rhev-m-bugs, rmainz, sbonazzo, stirabos, usurse, yaniwang, ycui, yeylon
Target Milestone: ---Keywords: Reopened
Target Release: 3.5.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: node
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-04-21 13:34:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Anitha Udgiri 2014-09-10 21:43:23 UTC
Description of problem:

Collecting a sos-report from a host via rhevm-log-collector gives the following error :

OSError: [Errno 2] No such file or directory: '/sos_commands/gluster/glusterfs-statedumps'
  Running plugins. Please wait ...

  Completed [52/54] ...  

Customer is running the following :

Hardware: HP BL4600c g8 blades in an HP C7K chassis.

RHEVM 3.2
RHEV Hypervisor - 6.5 - 20131204.0.3.2.el6_5

This seems to be a duplicate of this BZ : https://bugzilla.redhat.com/show_bug.cgi?id=1112538

Let us know what logs are needed here if any.

Comment 1 Bryn M. Reeves 2014-09-11 09:47:38 UTC
What version/release of the sos package is in use?

Comment 2 Anitha Udgiri 2014-09-11 18:01:39 UTC
(In reply to Bryn M. Reeves from comment #1)
> What version/release of the sos package is in use?

sos-2.2-47.el6.noarch - This is what I see packaged with the version of the iso image that is being used.

Comment 4 Pratik Pravin Bandarkar 2014-09-11 20:44:44 UTC
even if on working rhevh, you execute `sosreport -v` command, it throw lots of error messages:

---->p---->o----
Running plugins. Please wait ...

  Completed [3/54] ...      could not run command: /sbin/lilo -q
  Completed [5/54] ...      could not run command: /usr/sbin/rg_test test /etc/cluster/cluster.conf
could not run command: fence_tool ls -n
could not run command: gfs_control ls -n
could not run command: dlm_tool log_plock
could not run command: cman_tool services
could not run command: cman_tool nodes
could not run command: cman_tool status
could not run command: ccs_tool lsnode
could not run command: /sbin/ipvsadm -L
could not run command: cman_tool -a nodes
could not run command: corosync-quorumtool -l
could not run command: corosync-quorumtool -s
could not run command: corosync-cpgtool
could not run command: corosync-objctl
could not run command: group_tool ls -g1
could not run command: gfs_control ls -n
could not run command: gfs_control dump
could not run command: fence_tool dump
could not run command: dlm_tool dump
could not run command: dlm_tool ls -n
could not run command: crm_report -S --dest /tmp/dhcp210-167-2014091203461410493593/sos_commands/cluster/crm_report
  Completed [6/54] ...      could not run command: for i in `ls /home/`;        do echo "User :" $i;/usr/bin/crontab -l -u $i;        echo "---------------";done
  Completed [10/54] ...      could not run command: /usr/sbin/foreman-debug -a -d /tmp/dhcp210-167-2014091203461410493593/sos_commands/foreman/foreman-debug
  Completed [13/54] ...      could not run command: /usr/sbin/gluster peer status
  Completed [14/54] ...      error copying file /sys/bus/scsi/uevent (IOError)
error copying file /sys/bus/scsi/drivers/sr/uevent (IOError)
error copying file /sys/bus/scsi/drivers/sr/unbind (IOError)
error copying file /sys/bus/scsi/drivers/sr/bind (IOError)
error copying file /sys/bus/scsi/drivers_probe (IOError)
could not run command: /usr/bin/cpufreq-info
  Completed [19/54] ...      error copying file /sys/module/ipmi_si/parameters/hotmod (IOError)
error copying file /sys/module/md_mod/parameters/new_array (IOError)
error copying file /sys/module/libfcoe/parameters/disable (IOError)
error copying file /sys/module/libfcoe/parameters/enable (IOError)
error copying file /sys/module/libfcoe/parameters/destroy (IOError)
error copying file /sys/module/libfcoe/parameters/create_vn2vn (IOError)
error copying file /sys/module/libfcoe/parameters/create (IOError)
error copying file /sys/module/fcoe/parameters/disable (IOError)
error copying file /sys/module/fcoe/parameters/enable (IOError)
error copying file /sys/module/fcoe/parameters/destroy (IOError)
error copying file /sys/module/fcoe/parameters/create (IOError)
could not run command: /usr/sbin/dkms status
  Completed [20/54] ...      could not run command: /usr/bin/klist -ket /etc/krb5.keytab
  Completed [30/54] ...      could not run command: /sbin/ethtool ;vdsmdummy;
could not run command: /sbin/ethtool -i ;vdsmdummy;
could not run command: /sbin/ethtool -k ;vdsmdummy;
could not run command: /sbin/ethtool -S ;vdsmdummy;
could not run command: /sbin/ethtool -a ;vdsmdummy;
could not run command: /sbin/ethtool -c ;vdsmdummy;
could not run command: /sbin/ethtool -g ;vdsmdummy;
could not run command: /usr/sbin/brctl showstp ;vdsmdummy;
  Completed [32/54] ...      could not run command: /usr/sbin/ntptrace -n
  Completed [37/54] ...      could not run command: /usr/bin/lpstat -t
could not run command: /usr/bin/lpstat -s
could not run command: /usr/bin/lpstat -d
  Completed [41/54] ...      could not run command: /usr/bin/klist -ket /etc/krb5.keytab
could not run command: /usr/bin/wbinfo --domain='.' -g
could not run command: /usr/bin/wbinfo --domain='.' -u
could not run command: /usr/bin/testparm -s -v
  Completed [47/54] ...      file or directory /etc/rc.d/rc6.d/K03libvirtd does not exist
file or directory /etc/rc.d/rc6.d/K01libvirt-guests does not exist
file or directory /etc/rc.d/rc5.d/K03libvirtd does not exist
file or directory /etc/rc.d/rc5.d/K01libvirt-guests does not exist
file or directory /etc/rc.d/rc4.d/K03libvirtd does not exist
file or directory /etc/rc.d/rc4.d/K01libvirt-guests does not exist
file or directory /etc/rc.d/rc3.d/K03libvirtd does not exist
file or directory /etc/rc.d/rc3.d/K01libvirt-guests does not exist
file or directory /etc/rc.d/rc2.d/K03libvirtd does not exist
file or directory /etc/rc.d/rc2.d/K01libvirt-guests does not exist
file or directory /etc/rc.d/rc1.d/K03libvirtd does not exist
file or directory /etc/rc.d/rc1.d/K01libvirt-guests does not exist
file or directory /etc/rc.d/rc0.d/K03libvirtd does not exist
file or directory /etc/rc.d/rc0.d/K01libvirt-guests does not exist
  Completed [49/54] ...      error copying file /proc/sys/vm/compact_memory (IOError)
error copying file /proc/sys/fs/binfmt_misc/register (IOError)
error copying file /proc/sys/net/ipv4/route/flush (IOError)
error copying file /proc/sys/net/ipv6/route/flush (IOError)
  Completed [50/54] ...      could not run command: /usr/bin/stap -V 2
  Completed [52/54] ...      error copying file /var/run/vdsm/svdsm.sock (IOError)
  Completed [54/54] ...      
Creating compressed archive...

Your sosreport has been generated and saved in:
  /tmp/sosreport-dhcp210-167-20140912034737-cb94.tar.xz

The md5sum is: 15f835c734444a1c55bd9a9f74e5cb94

Please send this file to your support representative.
-----o<-----o<-----

But, at the end it genereates sosreport. So, need to find why it throw errors in verbose output.

Comment 6 haiyang,dong 2014-09-12 08:42:25 UTC
Can reproduce this issue in the follow version:
rhev-hypervisor6-6.5-20131204.0.3.2.iso
sos-2.2-47.el6.noarch
ovirt-node-2.5.0-17.el6_4.14.noarch

The same issue info as this BZ : https://bugzilla.redhat.com/show_bug.cgi?id=1112538

*** This bug has been marked as a duplicate of bug 1112538 ***

Comment 7 Bryn M. Reeves 2014-09-12 08:46:58 UTC
> So, need to find why it throw errors in verbose output.

Those are not 'errors' in the sense that something is broken. Most of them are just 'command not found' messages; i.e. some command we are expecting to be present does not exist or cannot be run. You might want to find out why basic commands like ethtool are missing but e.g. the cluster commands at the start of the log are not always expected to be present (which is why the messages are only logged in verbose mode).

The remaining errors are mostly read IO errors on files in pseudo file systems (/proc and /sys). These are also expected since not all files in these locations support reading. There's one additional one here from the VDSM control socket but again this is not an error condition that would prevent sos from running to completion.

I'm assuming that most of the command and file not found errors are side effects of the way that RHEV images are composed (i.e. deleting files from the file system but leaving the records in the RPM database. Since most plugins trigger on the presence of packages this will cause a large number of 'could not run' messages to be logged).

So far the only bug reported here is in the Gluster plugin:

  OSError: [Errno 2] No such file or directory: '/sos_commands/gluster/glusterfs-statedumps'
  Running plugins. Please wait ...

This is a clear bug and should be fixed. Other than that there is no description of a problem in any of the comments so far.

If sos fails to complete and you think it may be waiting on some VDSM component please collect the output of 'ps ax --forest' while the problem is happening (along with -vv output from sos so that we can see where it gets stuck).

Comment 8 Bryn M. Reeves 2014-09-12 08:49:14 UTC
haiyang, I don't think that's correct. The gluster error in comment #0 seems coincidental in this bug (it does not cause sos to 'fail' which is the bug description). Comment #3 and comment #5 seem to suggest sos is blocking while waiting for VDSM resources.

Comment 18 Bryn M. Reeves 2014-12-15 12:12:13 UTC
See comment #7 - this is s bug in the gluster plugin. It's not present in the upstream version of the plugin and is a straightforward fix.

Comment 19 Bryn M. Reeves 2014-12-15 12:14:40 UTC
In fact this was already fixed in RHEL-6.6 via bug 1002619.

See the following patch:

  sos-bz1002619-gluster-update-plugin-and-add-log-file-size-limiting.patch

Comment 20 Anitha Udgiri 2014-12-22 22:51:16 UTC
Bryn,
   Are we saying that this BZ may no longer be relevant and the actual gluster issue causing the problem is already fixed?

Could you please explain?

Comment 21 Bryn M. Reeves 2014-12-23 12:09:54 UTC
I've no idea what you're shipping in RHS / RHEV (RHS had their own fork of the sos RPM at one point) but this is fixed in RHEL, yes.

Comment 27 Aharon Canan 2015-01-11 12:20:24 UTC
rhevm 3.5.1 is not compiled - so it should be modified instead of on_qa

In case it is not rhevm but rhev-h and should be on_qa please set the right product/component/versions amd move back to on_qa.

Comment 35 Fabian Deutsch 2015-04-21 13:34:05 UTC
In bug 1198482 we see that sosreport is going to be fixed with the next update to RHEV-H on RHEV 3.5.

*** This bug has been marked as a duplicate of bug 1198482 ***