Created attachment 854474 [details] logs Description of problem: host failed install with following error: 2014-01-23 17:55:34 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/tmp/ovirt-NTFpH4YDW0/pythonlib/otopi/context.py", line 142, in _executeMethod method['method']() File "/tmp/ovirt-NTFpH4YDW0/otopi-plugins/ovirt-host-deploy/vdsm/packages.py", line 219, in _start self.services.state('vdsmd', True) File "/tmp/ovirt-NTFpH4YDW0/otopi-plugins/otopi/services/rhel.py", line 188, in state 'start' if state else 'stop' File "/tmp/ovirt-NTFpH4YDW0/otopi-plugins/otopi/services/rhel.py", line 96, in _executeServiceCommand raiseOnError=raiseOnError File "/tmp/ovirt-NTFpH4YDW0/pythonlib/otopi/plugin.py", line 451, in execute command=args[0], RuntimeError: Command '/sbin/service' failed to execute 2014-01-23 17:55:34 ERROR otopi.context context._executeMethod:161 Failed to execute stage 'Closing up': Command '/sbin/service' failed to execute Alon check it and determined the following: host-deploy executes: /usr/bin/vdsm-tool --force -> this succeed, Done configuring modules to VDSM with no error but Modules sanlock are not configured -> vdsm cannot be started. Version-Release number of selected component (if applicable): [root@cougar06 ~]# rpm -qa |grep vdsm vdsm-python-4.14.1-2.el6.x86_64 vdsm-cli-4.14.1-2.el6.noarch vdsm-python-zombiereaper-4.14.1-2.el6.noarch vdsm-xmlrpc-4.14.1-2.el6.noarch vdsm-4.14.1-2.el6.x86_64 [root@vm-18-201 host-deploy]# rpm -qa |grep otopi otopi-java-1.2.0-0.1.beta.el6.noarch otopi-1.2.0-0.1.beta.el6.noarch How reproducible: Steps to Reproduce: 1. install vdsm on the host (yum install vdsm) 2. install host from UI 3. Actual results: host fails to be installed Expected results: if we fail to configure sanlock we should fail installation. Additional info: opening on medium since I rebooted the host and re-installed and it solved the issue
Setting target release to current version for consideration and review. please do not push non-RFE bugs to an undefined target release to make sure bugs are reviewed for relevancy, fix, closure, etc.
*** Bug 1059254 has been marked as a duplicate of this bug. ***
We have found this problem here: https://bugzilla.redhat.com/show_bug.cgi?id=1055153 too
Looking at the logs: 2014-01-23 17:55:34 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:446 execute-output: ('/sbin/service', 'vdsmd', 'start') stderr: initctl: Job is already running: libvirtd Traceback (most recent call last): File "/usr/bin/vdsm-tool", line 145, in <module> sys.exit(main()) File "/usr/bin/vdsm-tool", line 142, in main return tool_command[cmd]["command"](*args[1:]) File "/usr/lib64/python2.6/site-packages/vdsm/tool/configurator.py", line 261, in isconfigured raise RuntimeError(msg) RuntimeError: I see that initctl is called (I don't think it should), and when the output is "initctl: Job is already running: libvirtd" the exit code is 1: # initctl start libvirtd initctl: Job is already running: libvirtd # echo $? 1 Are we still using initctl also on fedora? Anyway is it possible that it's the initctl failure that triggers this issue? If so sanlock is not involved at all.
(In reply to Federico Simoncelli from comment #4) > Looking at the logs: > > 2014-01-23 17:55:34 DEBUG otopi.plugins.otopi.services.rhel > plugin.execute:446 execute-output: ('/sbin/service', 'vdsmd', 'start') > stderr: > initctl: Job is already running: libvirtd > Traceback (most recent call last): > File "/usr/bin/vdsm-tool", line 145, in <module> > sys.exit(main()) > File "/usr/bin/vdsm-tool", line 142, in main > return tool_command[cmd]["command"](*args[1:]) > File "/usr/lib64/python2.6/site-packages/vdsm/tool/configurator.py", line > 261, in isconfigured > raise RuntimeError(msg) > RuntimeError: > > I see that initctl is called (I don't think it should), and when the output > is "initctl: Job is already running: libvirtd" the exit code is 1: > > # initctl start libvirtd > initctl: Job is already running: libvirtd > # echo $? > 1 > > Are we still using initctl also on fedora? No, Fedora 19 doesn't contain it. However you got the log probably from the CentOS user. > > Anyway is it possible that it's the initctl failure that triggers this > issue? If so sanlock is not involved at all. I don't think so, we are facing this on CentOS and Fedora. Please see this as well: https://bugzilla.redhat.com/show_bug.cgi?id=1055153
I suspect that the problem is that when sanlock is down we assume configured, and you see in the log [1] the print "sanlock service is not running" and then "sanlock service is already configured". those prints relate to "vdsm-tool configure --force" or "vdsmd reconfigure" output that check isconfigured on each module and configure if not. then vdsmd starts sanlock service as required service and isconfigured fails on the group check (part of the pre-start tasks before vdsm starts). which is odd because vdsmd installation should add all the groups info (iiuc). so the problem is somewhere in the spec, where vdsm configures the groups for sanlock process. Federico , am i missing something? [1] ovirt-20140123105534-cougar06.scl.lab.tlv.redhat.com-74d5bd50.log.xz
ok, now its all clear. in patch [1] i moved the "/usr/sbin/usermod -a -G %{qemu_group},%{vdsm_group} %{snlk_user}" from spec %pre section to the configurator module. BUT, when we check isconfigured as I stated in previous comment we assume configured when sanlock is not running. then we end up with fail start of vdsm because sanlock configure did not run at first 'vdsm-tool configure' run. this bug relates only to vdsm 3.4 which includes this commit and exposes a regression. it happens 100% if sanlock is stopped when host deploy runs and install vdsm on host. its either [2] or moving the usermod action to the spec back as it was. which alternative is preferred? [1] http://gerrit.ovirt.org/20100 [2] http://gerrit.ovirt.org/24007
*** Bug 1063886 has been marked as a duplicate of this bug. ***
Reproduced with ovirt3.4.0-0.5beta1, otopi-1.2.0-0.1.beta.el6.noarch , vdsm-4.14.1-2.el6.x86_64 Verified with ovirt-engine-3.4.0-0.11.beta3.el6.noarch, otopi-1.2.0-0.3.beta2.el6.noarch.rpm, vdsm-4.14.3-0.el6.x86_64
this is an automated message: moving to Closed CURRENT RELEASE since oVirt 3.4.0 has been released
Issue persists. Installed as per (rather poorly) documented methods. * http://www.ovirt.org/Quick_Start_Guide#Install_oVirt_Engine_.28Fedora_.2F_Red_Hat_Enterprise_Linux_.2F_CentOS.29 * http://www.ovirt.org/Feature/AllInOne # uname -a Linux $REDACTED 2.6.32-431.20.5.el6.centos.plus.x86_64 #1 SMP Fri Jul 25 07:10:09 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux # cat /etc/redhat-release CentOS release 6.5 (Final) # rpm -qa |grep -i ovirt ovirt-release34-1.0.1-1.noarch ovirt-host-deploy-1.2.2-1.el6.noarch ovirt-engine-restapi-3.4.3-1.el6.noarch ovirt-engine-sdk-python-3.4.3.0-1.el6.noarch ovirt-engine-websocket-proxy-3.4.3-1.el6.noarch ovirt-engine-cli-3.4.0.5-1.el6.noarch ovirt-engine-setup-plugin-ovirt-engine-common-3.4.3-1.el6.noarch ovirt-engine-webadmin-portal-3.4.3-1.el6.noarch ovirt-engine-userportal-3.4.3-1.el6.noarch ovirt-engine-3.4.3-1.el6.noarch ovirt-engine-setup-plugin-websocket-proxy-3.4.3-1.el6.noarch ovirt-host-deploy-java-1.2.2-1.el6.noarch ovirt-engine-setup-plugin-ovirt-engine-3.4.3-1.el6.noarch ovirt-engine-backend-3.4.3-1.el6.noarch ovirt-host-deploy-offline-1.2.2-1.el6.x86_64 ovirt-engine-lib-3.4.3-1.el6.noarch ovirt-engine-setup-base-3.4.3-1.el6.noarch ovirt-iso-uploader-3.4.3-1.el6.noarch ovirt-engine-setup-3.4.3-1.el6.noarch ovirt-engine-dbscripts-3.4.3-1.el6.noarch ovirt-image-uploader-3.4.2-1.el6.noarch ovirt-engine-tools-3.4.3-1.el6.noarch --== END OF SUMMARY ==-- [ INFO ] Starting engine service [ ERROR ] Failed to execute stage 'Closing up': Command '/sbin/service' failed to execute [ INFO ] Stage: Clean up Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20140727140628-g0z922.log [ INFO ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20140727141521-setup.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Execution of setup failed 2014-07-27 13:25:01 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/otopi/context.py", line 142, in _executeMethod method['method']() File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/system/engine.py", line 51, in _closeup state=True, File "/usr/share/otopi/plugins/otopi/services/rhel.py", line 188, in state 'start' if state else 'stop' File "/usr/share/otopi/plugins/otopi/services/rhel.py", line 96, in _executeServiceCommand raiseOnError=raiseOnError File "/usr/lib/python2.6/site-packages/otopi/plugin.py", line 871, in execute command=args[0], RuntimeError: Command '/sbin/service' failed to execute 2014-07-27 13:25:01 ERROR otopi.context context._executeMethod:161 Failed to execute stage 'Closing up': Command '/sbin/service' failed to execute
Tried upgrading to the 3.5 development release - same issue. 2014-07-27 17:40:58 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:866 execute-output: ('/sbin/service', 'ovirt-engine', 'start') stderr: 2014-07-27 17:40:58 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/otopi/context.py", line 142, in _executeMethod method['method']() File "/usr/share/ovirt-engine/setup/bin/../plugins/ovirt-engine-setup/ovirt-engine/system/engine.py", line 57, in _closeup state=True, File "/usr/share/otopi/plugins/otopi/services/rhel.py", line 188, in state 'start' if state else 'stop' File "/usr/share/otopi/plugins/otopi/services/rhel.py", line 96, in _executeServiceCommand raiseOnError=raiseOnError File "/usr/lib/python2.6/site-packages/otopi/plugin.py", line 871, in execute command=args[0], RuntimeError: Command '/sbin/service' failed to execute 2014-07-27 17:40:58 ERROR otopi.context context._executeMethod:161 Failed to execute stage 'Closing up': Command '/sbin/service' failed to execute 2014-07-27 17:40:58 DEBUG otopi.context context.dumpEnvironment:468 ENVIRONMENT DUMP - BEGIN 2014-07-27 17:40:58 DEBUG otopi.context context.dumpEnvironment:478 ENV BASE/error=bool:'True' 2014-07-27 17:40:58 DEBUG otopi.context context.dumpEnvironment:478 ENV BASE/exceptionInfo=list:'[(<type 'exceptions.RuntimeError'>, RuntimeError("Command '/sbin/service' failed to execute",), <traceback object at 0x27fdf80>)]' 2014-07-27 17:40:58 DEBUG otopi.context context.dumpEnvironment:482 ENVIRONMENT DUMP - END 2014-07-27 17:40:58 INFO otopi.context context.runSequence:395 Stage: Clean up 2014-07-27 17:40:58 DEBUG otopi.context context.runSequence:399 STAGE cleanup 2014-07-27 17:40:58 DEBUG otopi.context context._executeMethod:138 Stage cleanup METHOD otopi.plugins.ovirt_engine_common.base.core.misc.Plugin._cleanup
hey, we need logs to investigate - host-deploy log, syslog (messages) and vdsm.log should be enough. please attach if it still occurs