Bug 1140406
| Summary: | sos-report fails on rhev-h 3.2 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Anitha Udgiri <audgiri> |
| Component: | ovirt-node | Assignee: | Douglas Schilling Landgraf <dougsland> |
| Status: | CLOSED DUPLICATE | QA Contact: | Pavel Stehlik <pstehlik> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 3.2.0 | CC: | audgiri, bazulay, bmr, cshao, danken, dfediuck, didi, dougsland, ecohen, fdeutsch, hadong, huiwa, iheim, leiwang, lpeer, lsurette, lveyde, oourfali, pbandark, pstehlik, rbalakri, Rhev-m-bugs, rmainz, sbonazzo, stirabos, usurse, yaniwang, ycui, yeylon |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | 3.5.3 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | node | ||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-04-21 13:34:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Node | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Anitha Udgiri
2014-09-10 21:43:23 UTC
What version/release of the sos package is in use? (In reply to Bryn M. Reeves from comment #1) > What version/release of the sos package is in use? sos-2.2-47.el6.noarch - This is what I see packaged with the version of the iso image that is being used. even if on working rhevh, you execute `sosreport -v` command, it throw lots of error messages: ---->p---->o---- Running plugins. Please wait ... Completed [3/54] ... could not run command: /sbin/lilo -q Completed [5/54] ... could not run command: /usr/sbin/rg_test test /etc/cluster/cluster.conf could not run command: fence_tool ls -n could not run command: gfs_control ls -n could not run command: dlm_tool log_plock could not run command: cman_tool services could not run command: cman_tool nodes could not run command: cman_tool status could not run command: ccs_tool lsnode could not run command: /sbin/ipvsadm -L could not run command: cman_tool -a nodes could not run command: corosync-quorumtool -l could not run command: corosync-quorumtool -s could not run command: corosync-cpgtool could not run command: corosync-objctl could not run command: group_tool ls -g1 could not run command: gfs_control ls -n could not run command: gfs_control dump could not run command: fence_tool dump could not run command: dlm_tool dump could not run command: dlm_tool ls -n could not run command: crm_report -S --dest /tmp/dhcp210-167-2014091203461410493593/sos_commands/cluster/crm_report Completed [6/54] ... could not run command: for i in `ls /home/`; do echo "User :" $i;/usr/bin/crontab -l -u $i; echo "---------------";done Completed [10/54] ... could not run command: /usr/sbin/foreman-debug -a -d /tmp/dhcp210-167-2014091203461410493593/sos_commands/foreman/foreman-debug Completed [13/54] ... could not run command: /usr/sbin/gluster peer status Completed [14/54] ... error copying file /sys/bus/scsi/uevent (IOError) error copying file /sys/bus/scsi/drivers/sr/uevent (IOError) error copying file /sys/bus/scsi/drivers/sr/unbind (IOError) error copying file /sys/bus/scsi/drivers/sr/bind (IOError) error copying file /sys/bus/scsi/drivers_probe (IOError) could not run command: /usr/bin/cpufreq-info Completed [19/54] ... error copying file /sys/module/ipmi_si/parameters/hotmod (IOError) error copying file /sys/module/md_mod/parameters/new_array (IOError) error copying file /sys/module/libfcoe/parameters/disable (IOError) error copying file /sys/module/libfcoe/parameters/enable (IOError) error copying file /sys/module/libfcoe/parameters/destroy (IOError) error copying file /sys/module/libfcoe/parameters/create_vn2vn (IOError) error copying file /sys/module/libfcoe/parameters/create (IOError) error copying file /sys/module/fcoe/parameters/disable (IOError) error copying file /sys/module/fcoe/parameters/enable (IOError) error copying file /sys/module/fcoe/parameters/destroy (IOError) error copying file /sys/module/fcoe/parameters/create (IOError) could not run command: /usr/sbin/dkms status Completed [20/54] ... could not run command: /usr/bin/klist -ket /etc/krb5.keytab Completed [30/54] ... could not run command: /sbin/ethtool ;vdsmdummy; could not run command: /sbin/ethtool -i ;vdsmdummy; could not run command: /sbin/ethtool -k ;vdsmdummy; could not run command: /sbin/ethtool -S ;vdsmdummy; could not run command: /sbin/ethtool -a ;vdsmdummy; could not run command: /sbin/ethtool -c ;vdsmdummy; could not run command: /sbin/ethtool -g ;vdsmdummy; could not run command: /usr/sbin/brctl showstp ;vdsmdummy; Completed [32/54] ... could not run command: /usr/sbin/ntptrace -n Completed [37/54] ... could not run command: /usr/bin/lpstat -t could not run command: /usr/bin/lpstat -s could not run command: /usr/bin/lpstat -d Completed [41/54] ... could not run command: /usr/bin/klist -ket /etc/krb5.keytab could not run command: /usr/bin/wbinfo --domain='.' -g could not run command: /usr/bin/wbinfo --domain='.' -u could not run command: /usr/bin/testparm -s -v Completed [47/54] ... file or directory /etc/rc.d/rc6.d/K03libvirtd does not exist file or directory /etc/rc.d/rc6.d/K01libvirt-guests does not exist file or directory /etc/rc.d/rc5.d/K03libvirtd does not exist file or directory /etc/rc.d/rc5.d/K01libvirt-guests does not exist file or directory /etc/rc.d/rc4.d/K03libvirtd does not exist file or directory /etc/rc.d/rc4.d/K01libvirt-guests does not exist file or directory /etc/rc.d/rc3.d/K03libvirtd does not exist file or directory /etc/rc.d/rc3.d/K01libvirt-guests does not exist file or directory /etc/rc.d/rc2.d/K03libvirtd does not exist file or directory /etc/rc.d/rc2.d/K01libvirt-guests does not exist file or directory /etc/rc.d/rc1.d/K03libvirtd does not exist file or directory /etc/rc.d/rc1.d/K01libvirt-guests does not exist file or directory /etc/rc.d/rc0.d/K03libvirtd does not exist file or directory /etc/rc.d/rc0.d/K01libvirt-guests does not exist Completed [49/54] ... error copying file /proc/sys/vm/compact_memory (IOError) error copying file /proc/sys/fs/binfmt_misc/register (IOError) error copying file /proc/sys/net/ipv4/route/flush (IOError) error copying file /proc/sys/net/ipv6/route/flush (IOError) Completed [50/54] ... could not run command: /usr/bin/stap -V 2 Completed [52/54] ... error copying file /var/run/vdsm/svdsm.sock (IOError) Completed [54/54] ... Creating compressed archive... Your sosreport has been generated and saved in: /tmp/sosreport-dhcp210-167-20140912034737-cb94.tar.xz The md5sum is: 15f835c734444a1c55bd9a9f74e5cb94 Please send this file to your support representative. -----o<-----o<----- But, at the end it genereates sosreport. So, need to find why it throw errors in verbose output. Can reproduce this issue in the follow version: rhev-hypervisor6-6.5-20131204.0.3.2.iso sos-2.2-47.el6.noarch ovirt-node-2.5.0-17.el6_4.14.noarch The same issue info as this BZ : https://bugzilla.redhat.com/show_bug.cgi?id=1112538 *** This bug has been marked as a duplicate of bug 1112538 *** > So, need to find why it throw errors in verbose output.
Those are not 'errors' in the sense that something is broken. Most of them are just 'command not found' messages; i.e. some command we are expecting to be present does not exist or cannot be run. You might want to find out why basic commands like ethtool are missing but e.g. the cluster commands at the start of the log are not always expected to be present (which is why the messages are only logged in verbose mode).
The remaining errors are mostly read IO errors on files in pseudo file systems (/proc and /sys). These are also expected since not all files in these locations support reading. There's one additional one here from the VDSM control socket but again this is not an error condition that would prevent sos from running to completion.
I'm assuming that most of the command and file not found errors are side effects of the way that RHEV images are composed (i.e. deleting files from the file system but leaving the records in the RPM database. Since most plugins trigger on the presence of packages this will cause a large number of 'could not run' messages to be logged).
So far the only bug reported here is in the Gluster plugin:
OSError: [Errno 2] No such file or directory: '/sos_commands/gluster/glusterfs-statedumps'
Running plugins. Please wait ...
This is a clear bug and should be fixed. Other than that there is no description of a problem in any of the comments so far.
If sos fails to complete and you think it may be waiting on some VDSM component please collect the output of 'ps ax --forest' while the problem is happening (along with -vv output from sos so that we can see where it gets stuck).
haiyang, I don't think that's correct. The gluster error in comment #0 seems coincidental in this bug (it does not cause sos to 'fail' which is the bug description). Comment #3 and comment #5 seem to suggest sos is blocking while waiting for VDSM resources. See comment #7 - this is s bug in the gluster plugin. It's not present in the upstream version of the plugin and is a straightforward fix. In fact this was already fixed in RHEL-6.6 via bug 1002619. See the following patch: sos-bz1002619-gluster-update-plugin-and-add-log-file-size-limiting.patch Bryn, Are we saying that this BZ may no longer be relevant and the actual gluster issue causing the problem is already fixed? Could you please explain? I've no idea what you're shipping in RHS / RHEV (RHS had their own fork of the sos RPM at one point) but this is fixed in RHEL, yes. rhevm 3.5.1 is not compiled - so it should be modified instead of on_qa In case it is not rhevm but rhev-h and should be on_qa please set the right product/component/versions amd move back to on_qa. In bug 1198482 we see that sosreport is going to be fixed with the next update to RHEV-H on RHEV 3.5. *** This bug has been marked as a duplicate of bug 1198482 *** |