891843 – vdsm fails to start due to root:root ownership change of /var/log/vdsm

Bug 891843 - vdsm fails to start due to root:root ownership change of /var/log/vdsm

Summary: vdsm fails to start due to root:root ownership change of /var/log/vdsm

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	vdsm
Sub Component:
Version:	2.1
Hardware:	All
OS:	All
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Timothy Asir
QA Contact:	Prasanth
Docs Contact:
URL:
Whiteboard:
Depends On:	912308 949686
Blocks:
TreeView+	depends on / blocked

Reported:	2013-01-04 07:24 UTC by Prasanth
Modified:	2013-09-23 22:25 UTC (History)
CC List:	11 users (show)
Fixed In Version:	vdsm-4.10.2-4.0.qa5.el6rhs
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-23 22:25:58 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
aviod tracback generated by wrong ex_modprobe define (371 bytes, patch) 2013-01-04 07:39 UTC, Humble Chirammal	no flags	Details \| Diff
sosrepot of RHS server (4.80 MB, application/x-xz) 2013-01-04 09:44 UTC, Prasanth	no flags	Details
Run sosreport in both engine and host machine and attaching it here. (10.21 MB, application/x-xz) 2013-03-05 11:37 UTC, RamaKasturi	no flags	Details
View All

Description Prasanth 2013-01-04 07:24:54 UTC

Description of problem: 

RHS Host became non-responsive after updating the glusterfs packages to glusterfs-3.4.0qa5-1. Before updating the glusterfs packages, it was working perfectly fine.

Version-Release number of selected component (if applicable): rhsc-2.1-qa18

RHS Host:
# rpm -qa |grep glusterfs
glusterfs-3.4.0qa5-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0qa5-1.el6rhs.x86_64
glusterfs-fuse-3.4.0qa5-1.el6rhs.x86_64
glusterfs-devel-3.4.0qa5-1.el6rhs.x86_64
glusterfs-rdma-3.4.0qa5-1.el6rhs.x86_64
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-server-3.4.0qa5-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0qa5-1.el6rhs.x86_64


# rpm -qa |grep vdsm
vdsm-hook-faqemu-4.9.6-32.0.qa3.el6rhs.noarch
vdsm-hook-qemucmdline-4.9.6-32.0.qa3.el6rhs.noarch
vdsm-4.9.6-32.0.qa3.el6rhs.x86_64
vdsm-reg-4.9.6-32.0.qa3.el6rhs.noarch
vdsm-python-4.9.6-32.0.qa3.el6rhs.x86_64
vdsm-cli-4.9.6-32.0.qa3.el6rhs.noarch
vdsm-bootstrap-4.9.6-32.0.qa3.el6rhs.noarch
vdsm-debug-plugin-4.9.6-32.0.qa3.el6rhs.noarch
vdsm-xmlrpc-4.9.6-32.0.qa3.el6rhs.noarch
vdsm-debuginfo-4.9.6-32.0.qa3.el6rhs.x86_64
vdsm-gluster-4.9.6-32.0.qa3.el6rhs.noarch


While debugging, found that the vdsm service was not running. Tried to restart but failed with the following traceback:

-----------------------------------
# /etc/init.d/vdsmd restart
Shutting down vdsm daemon: 
vdsm watchdog stop                                         [  OK  ]
vdsm: not running                                          [FAILED]
vdsm: libvirt already configured for vdsm                  [  OK  ]
Starting iscsid: 
Traceback (most recent call last):
  File "/usr/bin/vdsm-tool", line 143, in <module>
    sys.exit(main())
  File "/usr/bin/vdsm-tool", line 140, in main
    return tool_command[cmd]["command"](*args[1:])
  File "/usr/lib64/python2.6/site-packages/vdsm/tool/load_needed_modules.py", line 60, in load_needed_modules
    _exec_command([EX_MODPROBE, mod])
  File "/usr/lib64/python2.6/site-packages/vdsm/tool/load_needed_modules.py", line 35, in _exec_command
    stderr=subprocess.PIPE)
  File "/usr/lib64/python2.6/subprocess.py", line 639, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.6/subprocess.py", line 1228, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory
Starting up vdsm daemon: 
vdsm start    
                                             [  OK  ]
[root@# /etc/init.d/vdsmd status
VDS daemon is not running, but its watchdog is
-----------------------------------


Additional info: sosreport attached

Comment 1 Humble Chirammal 2013-01-04 07:38:00 UTC

The traceback was generated when 'vdsm-tool' tried to run 'modprobe' command in the node. The 'EX_MODPROBE' was defined wrongly which caused the error.
I have attached a patch to avoid this traceback.

--Humble
Virt SEG

Comment 2 Humble Chirammal 2013-01-04 07:39:29 UTC

Created attachment 672323 [details]
aviod tracback generated by wrong ex_modprobe define

Comment 4 Bala.FA 2013-01-04 09:29:17 UTC

(In reply to comment #0)
> Description of problem: 
> 
> RHS Host became non-responsive after updating the glusterfs packages to
> glusterfs-3.4.0qa5-1. Before updating the glusterfs packages, it was working
> perfectly fine.
> 
> Version-Release number of selected component (if applicable): rhsc-2.1-qa18
> 
> RHS Host:
> # rpm -qa |grep glusterfs
> glusterfs-3.4.0qa5-1.el6rhs.x86_64
> glusterfs-geo-replication-3.4.0qa5-1.el6rhs.x86_64
> glusterfs-fuse-3.4.0qa5-1.el6rhs.x86_64
> glusterfs-devel-3.4.0qa5-1.el6rhs.x86_64
> glusterfs-rdma-3.4.0qa5-1.el6rhs.x86_64
> org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
> glusterfs-server-3.4.0qa5-1.el6rhs.x86_64
> glusterfs-debuginfo-3.4.0qa5-1.el6rhs.x86_64
> 
> 
> # rpm -qa |grep vdsm
> vdsm-hook-faqemu-4.9.6-32.0.qa3.el6rhs.noarch
> vdsm-hook-qemucmdline-4.9.6-32.0.qa3.el6rhs.noarch
> vdsm-4.9.6-32.0.qa3.el6rhs.x86_64
> vdsm-reg-4.9.6-32.0.qa3.el6rhs.noarch
> vdsm-python-4.9.6-32.0.qa3.el6rhs.x86_64
> vdsm-cli-4.9.6-32.0.qa3.el6rhs.noarch
> vdsm-bootstrap-4.9.6-32.0.qa3.el6rhs.noarch
> vdsm-debug-plugin-4.9.6-32.0.qa3.el6rhs.noarch
> vdsm-xmlrpc-4.9.6-32.0.qa3.el6rhs.noarch
> vdsm-debuginfo-4.9.6-32.0.qa3.el6rhs.x86_64
> vdsm-gluster-4.9.6-32.0.qa3.el6rhs.noarch
> 
> 
> While debugging, found that the vdsm service was not running. Tried to
> restart but failed with the following traceback:
> 
> -----------------------------------
> # /etc/init.d/vdsmd restart
> Shutting down vdsm daemon: 
> vdsm watchdog stop                                         [  OK  ]
> vdsm: not running                                          [FAILED]
> vdsm: libvirt already configured for vdsm                  [  OK  ]
> Starting iscsid: 
> Traceback (most recent call last):
>   File "/usr/bin/vdsm-tool", line 143, in <module>
>     sys.exit(main())
>   File "/usr/bin/vdsm-tool", line 140, in main
>     return tool_command[cmd]["command"](*args[1:])
>   File
> "/usr/lib64/python2.6/site-packages/vdsm/tool/load_needed_modules.py", line
> 60, in load_needed_modules
>     _exec_command([EX_MODPROBE, mod])
>   File
> "/usr/lib64/python2.6/site-packages/vdsm/tool/load_needed_modules.py", line
> 35, in _exec_command
>     stderr=subprocess.PIPE)
>   File "/usr/lib64/python2.6/subprocess.py", line 639, in __init__
>     errread, errwrite)
>   File "/usr/lib64/python2.6/subprocess.py", line 1228, in _execute_child
>     raise child_exception
> OSError: [Errno 2] No such file or directory
> Starting up vdsm daemon: 
> vdsm start    
>                                              [  OK  ]
> [root@# /etc/init.d/vdsmd status
> VDS daemon is not running, but its watchdog is
> -----------------------------------
> 
> 
> Additional info: sosreport attached


I am not able to find sosreport as attachment.  Could you check on this?

Comment 5 Prasanth 2013-01-04 09:44:45 UTC

Created attachment 672344 [details]
sosrepot of RHS server

Comment 11 Humble Chirammal 2013-01-08 10:45:10 UTC

Hi Bala,

I see upstream has defined this variable in below way via commit 816238891bea4d0e6b8b713a84e6ec8d6d0bb339


 [hchiramm@unused vdsm]$ git show 816238891bea4d0e6b8b713a84e6ec8d6d0bb339
commit 816238891bea4d0e6b8b713a84e6ec8d6d0bb339
Author: Wenyi Gao <wenyi.ibm.com>
Date:   Mon Aug 27 15:22:09 2012 +0800

    Move load_needed_modules to vdsm-tool
    
    Move load_needed_modules to vdsm-tool from vdsmd.init
    
    Change-Id: Icdc8f9d07a7150c4a846228811d6c5dcb17fa358
    Signed-off-by: Wenyi Gao <wenyi.ibm.com>
    Reviewed-on: http://gerrit.ovirt.org/7487
    Reviewed-by: Douglas Schilling Landgraf <dougsland>
    Reviewed-by: Yaniv Bronhaim <ybronhei>
    Tested-by: Yaniv Bronhaim <ybronhei>
    Reviewed-by: Dan Kenigsberg <danken>

diff --git a/configure.ac b/configure.ac
index d4b6815..c50f42f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -149,6 +149,7 @@ AC_PATH_PROG([LVM_PATH], [lvm], [/sbin/lvm])
 AC_PATH_PROG([MKFS_PATH], [mkfs], [/sbin/mkfs])
 AC_PATH_PROG([MKFS_MSDOS_PATH], [mkfs.msdos], [/sbin/mkfs.msdos])
 AC_PATH_PROG([MKISOFS_PATH], [mkisofs], [/usr/bin/mkisofs])
+AC_PATH_PROG([MODPROBE_PATH], [modprobe], [/sbin/modprobe])
 AC_PATH_PROG([MOUNT_PATH], [mount], [/bin/mount])
 AC_PATH_PROG([MULTIPATH_PATH], [multipath], [/sbin/multipath])

******************

--Humble
Virt SEG

Comment 13 Bala.FA 2013-01-08 14:00:03 UTC

I tried to reproduce the bug by

0. installed RHS 2.0+ Beta

1. Ran 'vdsClient -s localhost getVdsCaps' which was successful

2. I updated (rpm -Uvh) vdsm, vdsm-cli, vdsm-gluster, vdsm-python with
   * vdsm-4.9.6-32.0.qa3.el6rhs.x86_64.rpm
   * vdsm-gluster-4.9.6-32.0.qa3.el6rhs.noarch.rpm
   * vdsm-reg-4.9.6-32.0.qa3.el6rhs.noarch.rpm
   * vdsm-cli-4.9.6-32.0.qa3.el6rhs.noarch.rpm
   * vdsm-python-4.9.6-32.0.qa3.el6rhs.x86_64.rpm
   * vdsm-xmlrpc-4.9.6-32.0.qa3.el6rhs.noarch.rpm [1]

3. Restarted vdsmd and ran 'vdsClient -s localhost getVdsCaps' which was successful

4. Then I uninstalled vdsm* and reinstalled (rpm -ivh)
   * vdsm-4.9.6-32.0.qa3.el6rhs.x86_64.rpm
   * vdsm-gluster-4.9.6-32.0.qa3.el6rhs.noarch.rpm
   * vdsm-reg-4.9.6-32.0.qa3.el6rhs.noarch.rpm
   * vdsm-cli-4.9.6-32.0.qa3.el6rhs.noarch.rpm
   * vdsm-python-4.9.6-32.0.qa3.el6rhs.x86_64.rpm
   * vdsm-xmlrpc-4.9.6-32.0.qa3.el6rhs.noarch.rpm [1]

5. Restarted vdsmd and ran 'vdsClient -s localhost getVdsCaps' which was successful

In update process, I don't see this failure.  My suspect is logrotate could have changed /var/log/vdsm dir permission.

[1] vdsm-xmlrpc is a new dependency of vdsm

Comment 14 Humble Chirammal 2013-01-09 04:45:21 UTC

Hi Bala,

I do suspect 'logrotate':  Yesterday I poked a lot around 'logrotate' by invoking those rotations manually and also running cron jobs.Unfortunately I couldn't get it proved. We may look at it more closely and try to trigger it somehow.

--Humble

Comment 15 Humble Chirammal 2013-01-09 08:10:47 UTC

Hi Bala,

just a thought, Are we missing this patch in this build? 

http://gerrit.ovirt.org/#/c/7940/

--Humble

Comment 16 Bala.FA 2013-01-09 10:38:02 UTC

The patch is already available in rhev-3.2 branch of vdsm downstream

Comment 17 Bala.FA 2013-02-04 13:31:13 UTC

Please check this in vdsm-4.10.2-4.0.qa5.el6rhs

Comment 18 Dustin Tsang 2013-02-05 16:11:45 UTC

I ran across this on vdsm-4.10.2-4.0.qa5.el6rhs.x86_64.

'service vdsmd status' was reporting
VDS daemon is not running, but its watchdog is

vdsm.log owner was root:root

Workaround:
After changing it back to the default values vdsm:kvm, vdsm got started successfully.

Comment 19 Bala.FA 2013-02-06 00:57:54 UTC

This issue was with vdsm qa3 and qa5 has relevent patches in.  I would like to know whether the issue in qa5 is stil reproduciable ie vdsm log owner change to root:root.  I think doing clean uninstall of qa3 vdsm and install of qa5 will help too.

Comment 20 Bala.FA 2013-02-06 01:00:00 UTC

Lets wait for the issue exhibits again in qa5.

Comment 21 RamaKasturi 2013-03-05 11:37:44 UTC

Created attachment 705364 [details]
Run sosreport in both engine and host machine and attaching it here.

Comment 22 RamaKasturi 2013-03-05 11:41:20 UTC

I have found the same issue with qa6. These are the steps which i have done.

Tried to add a host to the cluster using its IP.

During the installation process it goes to reboot state and after few seconds
it goes into non-responsive state and it stays there.

When i looked through the logs i found that VDSM was not running and vdsm.log 

While debugging came to know that vdsm.log file ownership was changed to root instead of KVM.

Changed the ownership of the file back to kvm from root by executing the following command.

chown vdsm.kvm vdsm.log

And then restarted vdsmd service.

Then tried to install the host. It got installed sucessfully

I am attaching the SOSReports from engine and host.

Comment 23 Prasanth 2013-03-05 12:01:09 UTC

Moving back to Assigned based on # Comment 22

We need to find the root cause of this issue and have a permanent fix, i.e; how and why the ownership of vdsm.log is changed to root which in turn made the status of the host to "Non-responsive".

Comment 24 Dan Kenigsberg 2013-03-05 12:25:03 UTC

(In reply to comment #23)
>
> We need to find the root cause of this issue and have a permanent fix, i.e;
> how and why the ownership of vdsm.log is changed to root which in turn made
> the status of the host to "Non-responsive".

See bug 912308.

Comment 25 Prasanth 2013-03-06 05:31:40 UTC

(In reply to comment #24)
> (In reply to comment #23)
> >
> > We need to find the root cause of this issue and have a permanent fix, i.e;
> > how and why the ownership of vdsm.log is changed to root which in turn made
> > the status of the host to "Non-responsive".
> 
> See bug 912308.

Thanks for providing the bug. Now I see the root cause of this issue!

Comment 26 Dan Kenigsberg 2013-03-17 15:25:24 UTC

Please help in verifying that upstream 

  http://gerrit.ovirt.org/12940
  Seperating supervdsm log to supervdsm.log file

solves this issue!

Comment 28 Dustin Tsang 2013-05-23 14:35:33 UTC

marking as verified in bb1..reopen or create new if reoccurs

Comment 29 Scott Haines 2013-09-23 22:25:58 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.