Description of problem: RHS Host became non-responsive after updating the glusterfs packages to glusterfs-3.4.0qa5-1. Before updating the glusterfs packages, it was working perfectly fine. Version-Release number of selected component (if applicable): rhsc-2.1-qa18 RHS Host: # rpm -qa |grep glusterfs glusterfs-3.4.0qa5-1.el6rhs.x86_64 glusterfs-geo-replication-3.4.0qa5-1.el6rhs.x86_64 glusterfs-fuse-3.4.0qa5-1.el6rhs.x86_64 glusterfs-devel-3.4.0qa5-1.el6rhs.x86_64 glusterfs-rdma-3.4.0qa5-1.el6rhs.x86_64 org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch glusterfs-server-3.4.0qa5-1.el6rhs.x86_64 glusterfs-debuginfo-3.4.0qa5-1.el6rhs.x86_64 # rpm -qa |grep vdsm vdsm-hook-faqemu-4.9.6-32.0.qa3.el6rhs.noarch vdsm-hook-qemucmdline-4.9.6-32.0.qa3.el6rhs.noarch vdsm-4.9.6-32.0.qa3.el6rhs.x86_64 vdsm-reg-4.9.6-32.0.qa3.el6rhs.noarch vdsm-python-4.9.6-32.0.qa3.el6rhs.x86_64 vdsm-cli-4.9.6-32.0.qa3.el6rhs.noarch vdsm-bootstrap-4.9.6-32.0.qa3.el6rhs.noarch vdsm-debug-plugin-4.9.6-32.0.qa3.el6rhs.noarch vdsm-xmlrpc-4.9.6-32.0.qa3.el6rhs.noarch vdsm-debuginfo-4.9.6-32.0.qa3.el6rhs.x86_64 vdsm-gluster-4.9.6-32.0.qa3.el6rhs.noarch While debugging, found that the vdsm service was not running. Tried to restart but failed with the following traceback: ----------------------------------- # /etc/init.d/vdsmd restart Shutting down vdsm daemon: vdsm watchdog stop [ OK ] vdsm: not running [FAILED] vdsm: libvirt already configured for vdsm [ OK ] Starting iscsid: Traceback (most recent call last): File "/usr/bin/vdsm-tool", line 143, in <module> sys.exit(main()) File "/usr/bin/vdsm-tool", line 140, in main return tool_command[cmd]["command"](*args[1:]) File "/usr/lib64/python2.6/site-packages/vdsm/tool/load_needed_modules.py", line 60, in load_needed_modules _exec_command([EX_MODPROBE, mod]) File "/usr/lib64/python2.6/site-packages/vdsm/tool/load_needed_modules.py", line 35, in _exec_command stderr=subprocess.PIPE) File "/usr/lib64/python2.6/subprocess.py", line 639, in __init__ errread, errwrite) File "/usr/lib64/python2.6/subprocess.py", line 1228, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory Starting up vdsm daemon: vdsm start [ OK ] [root@# /etc/init.d/vdsmd status VDS daemon is not running, but its watchdog is ----------------------------------- Additional info: sosreport attached
The traceback was generated when 'vdsm-tool' tried to run 'modprobe' command in the node. The 'EX_MODPROBE' was defined wrongly which caused the error. I have attached a patch to avoid this traceback. --Humble Virt SEG
Created attachment 672323 [details] aviod tracback generated by wrong ex_modprobe define
(In reply to comment #0) > Description of problem: > > RHS Host became non-responsive after updating the glusterfs packages to > glusterfs-3.4.0qa5-1. Before updating the glusterfs packages, it was working > perfectly fine. > > Version-Release number of selected component (if applicable): rhsc-2.1-qa18 > > RHS Host: > # rpm -qa |grep glusterfs > glusterfs-3.4.0qa5-1.el6rhs.x86_64 > glusterfs-geo-replication-3.4.0qa5-1.el6rhs.x86_64 > glusterfs-fuse-3.4.0qa5-1.el6rhs.x86_64 > glusterfs-devel-3.4.0qa5-1.el6rhs.x86_64 > glusterfs-rdma-3.4.0qa5-1.el6rhs.x86_64 > org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch > glusterfs-server-3.4.0qa5-1.el6rhs.x86_64 > glusterfs-debuginfo-3.4.0qa5-1.el6rhs.x86_64 > > > # rpm -qa |grep vdsm > vdsm-hook-faqemu-4.9.6-32.0.qa3.el6rhs.noarch > vdsm-hook-qemucmdline-4.9.6-32.0.qa3.el6rhs.noarch > vdsm-4.9.6-32.0.qa3.el6rhs.x86_64 > vdsm-reg-4.9.6-32.0.qa3.el6rhs.noarch > vdsm-python-4.9.6-32.0.qa3.el6rhs.x86_64 > vdsm-cli-4.9.6-32.0.qa3.el6rhs.noarch > vdsm-bootstrap-4.9.6-32.0.qa3.el6rhs.noarch > vdsm-debug-plugin-4.9.6-32.0.qa3.el6rhs.noarch > vdsm-xmlrpc-4.9.6-32.0.qa3.el6rhs.noarch > vdsm-debuginfo-4.9.6-32.0.qa3.el6rhs.x86_64 > vdsm-gluster-4.9.6-32.0.qa3.el6rhs.noarch > > > While debugging, found that the vdsm service was not running. Tried to > restart but failed with the following traceback: > > ----------------------------------- > # /etc/init.d/vdsmd restart > Shutting down vdsm daemon: > vdsm watchdog stop [ OK ] > vdsm: not running [FAILED] > vdsm: libvirt already configured for vdsm [ OK ] > Starting iscsid: > Traceback (most recent call last): > File "/usr/bin/vdsm-tool", line 143, in <module> > sys.exit(main()) > File "/usr/bin/vdsm-tool", line 140, in main > return tool_command[cmd]["command"](*args[1:]) > File > "/usr/lib64/python2.6/site-packages/vdsm/tool/load_needed_modules.py", line > 60, in load_needed_modules > _exec_command([EX_MODPROBE, mod]) > File > "/usr/lib64/python2.6/site-packages/vdsm/tool/load_needed_modules.py", line > 35, in _exec_command > stderr=subprocess.PIPE) > File "/usr/lib64/python2.6/subprocess.py", line 639, in __init__ > errread, errwrite) > File "/usr/lib64/python2.6/subprocess.py", line 1228, in _execute_child > raise child_exception > OSError: [Errno 2] No such file or directory > Starting up vdsm daemon: > vdsm start > [ OK ] > [root@# /etc/init.d/vdsmd status > VDS daemon is not running, but its watchdog is > ----------------------------------- > > > Additional info: sosreport attached I am not able to find sosreport as attachment. Could you check on this?
Created attachment 672344 [details] sosrepot of RHS server
Hi Bala, I see upstream has defined this variable in below way via commit 816238891bea4d0e6b8b713a84e6ec8d6d0bb339 [hchiramm@unused vdsm]$ git show 816238891bea4d0e6b8b713a84e6ec8d6d0bb339 commit 816238891bea4d0e6b8b713a84e6ec8d6d0bb339 Author: Wenyi Gao <wenyi.ibm.com> Date: Mon Aug 27 15:22:09 2012 +0800 Move load_needed_modules to vdsm-tool Move load_needed_modules to vdsm-tool from vdsmd.init Change-Id: Icdc8f9d07a7150c4a846228811d6c5dcb17fa358 Signed-off-by: Wenyi Gao <wenyi.ibm.com> Reviewed-on: http://gerrit.ovirt.org/7487 Reviewed-by: Douglas Schilling Landgraf <dougsland> Reviewed-by: Yaniv Bronhaim <ybronhei> Tested-by: Yaniv Bronhaim <ybronhei> Reviewed-by: Dan Kenigsberg <danken> diff --git a/configure.ac b/configure.ac index d4b6815..c50f42f 100644 --- a/configure.ac +++ b/configure.ac @@ -149,6 +149,7 @@ AC_PATH_PROG([LVM_PATH], [lvm], [/sbin/lvm]) AC_PATH_PROG([MKFS_PATH], [mkfs], [/sbin/mkfs]) AC_PATH_PROG([MKFS_MSDOS_PATH], [mkfs.msdos], [/sbin/mkfs.msdos]) AC_PATH_PROG([MKISOFS_PATH], [mkisofs], [/usr/bin/mkisofs]) +AC_PATH_PROG([MODPROBE_PATH], [modprobe], [/sbin/modprobe]) AC_PATH_PROG([MOUNT_PATH], [mount], [/bin/mount]) AC_PATH_PROG([MULTIPATH_PATH], [multipath], [/sbin/multipath]) ****************** --Humble Virt SEG
I tried to reproduce the bug by 0. installed RHS 2.0+ Beta 1. Ran 'vdsClient -s localhost getVdsCaps' which was successful 2. I updated (rpm -Uvh) vdsm, vdsm-cli, vdsm-gluster, vdsm-python with * vdsm-4.9.6-32.0.qa3.el6rhs.x86_64.rpm * vdsm-gluster-4.9.6-32.0.qa3.el6rhs.noarch.rpm * vdsm-reg-4.9.6-32.0.qa3.el6rhs.noarch.rpm * vdsm-cli-4.9.6-32.0.qa3.el6rhs.noarch.rpm * vdsm-python-4.9.6-32.0.qa3.el6rhs.x86_64.rpm * vdsm-xmlrpc-4.9.6-32.0.qa3.el6rhs.noarch.rpm [1] 3. Restarted vdsmd and ran 'vdsClient -s localhost getVdsCaps' which was successful 4. Then I uninstalled vdsm* and reinstalled (rpm -ivh) * vdsm-4.9.6-32.0.qa3.el6rhs.x86_64.rpm * vdsm-gluster-4.9.6-32.0.qa3.el6rhs.noarch.rpm * vdsm-reg-4.9.6-32.0.qa3.el6rhs.noarch.rpm * vdsm-cli-4.9.6-32.0.qa3.el6rhs.noarch.rpm * vdsm-python-4.9.6-32.0.qa3.el6rhs.x86_64.rpm * vdsm-xmlrpc-4.9.6-32.0.qa3.el6rhs.noarch.rpm [1] 5. Restarted vdsmd and ran 'vdsClient -s localhost getVdsCaps' which was successful In update process, I don't see this failure. My suspect is logrotate could have changed /var/log/vdsm dir permission. [1] vdsm-xmlrpc is a new dependency of vdsm
Hi Bala, I do suspect 'logrotate': Yesterday I poked a lot around 'logrotate' by invoking those rotations manually and also running cron jobs.Unfortunately I couldn't get it proved. We may look at it more closely and try to trigger it somehow. --Humble
Hi Bala, just a thought, Are we missing this patch in this build? http://gerrit.ovirt.org/#/c/7940/ --Humble
The patch is already available in rhev-3.2 branch of vdsm downstream
Please check this in vdsm-4.10.2-4.0.qa5.el6rhs
I ran across this on vdsm-4.10.2-4.0.qa5.el6rhs.x86_64. 'service vdsmd status' was reporting VDS daemon is not running, but its watchdog is vdsm.log owner was root:root Workaround: After changing it back to the default values vdsm:kvm, vdsm got started successfully.
This issue was with vdsm qa3 and qa5 has relevent patches in. I would like to know whether the issue in qa5 is stil reproduciable ie vdsm log owner change to root:root. I think doing clean uninstall of qa3 vdsm and install of qa5 will help too.
Lets wait for the issue exhibits again in qa5.
Created attachment 705364 [details] Run sosreport in both engine and host machine and attaching it here.
I have found the same issue with qa6. These are the steps which i have done. Tried to add a host to the cluster using its IP. During the installation process it goes to reboot state and after few seconds it goes into non-responsive state and it stays there. When i looked through the logs i found that VDSM was not running and vdsm.log While debugging came to know that vdsm.log file ownership was changed to root instead of KVM. Changed the ownership of the file back to kvm from root by executing the following command. chown vdsm.kvm vdsm.log And then restarted vdsmd service. Then tried to install the host. It got installed sucessfully I am attaching the SOSReports from engine and host.
Moving back to Assigned based on # Comment 22 We need to find the root cause of this issue and have a permanent fix, i.e; how and why the ownership of vdsm.log is changed to root which in turn made the status of the host to "Non-responsive".
(In reply to comment #23) > > We need to find the root cause of this issue and have a permanent fix, i.e; > how and why the ownership of vdsm.log is changed to root which in turn made > the status of the host to "Non-responsive". See bug 912308.
(In reply to comment #24) > (In reply to comment #23) > > > > We need to find the root cause of this issue and have a permanent fix, i.e; > > how and why the ownership of vdsm.log is changed to root which in turn made > > the status of the host to "Non-responsive". > > See bug 912308. Thanks for providing the bug. Now I see the root cause of this issue!
Please help in verifying that upstream http://gerrit.ovirt.org/12940 Seperating supervdsm log to supervdsm.log file solves this issue!
marking as verified in bb1..reopen or create new if reoccurs
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html