Description of problem: I installed a first hosted engine (ovirt 3.4.2) on my first machine, it worked well. I wanted to add a second host for HA, but i am unable to do so (worked in 3.4.0). Everytime i launch hosted-engine --deploy, i get this : [ INFO ] Waiting for VDSM hardware info [ ERROR ] Failed to execute stage 'Environment setup': [Errno 111] Connection refused I tried everything, from wiping ovirt, reinstalling the os, ... nothing seems to work. Here is the interesting part of logs : 2014-07-15 17:07:27 INFO otopi.plugins.ovirt_hosted_engine_setup.system.vdsmenv vdsmenv._connect:86 Waiting for VDSM hardware info 2014-07-15 17:07:28 DEBUG otopi.context context.dumpEnvironment:468 ENVIRONMENT DUMP - BEGIN 2014-07-15 17:07:28 DEBUG otopi.context context.dumpEnvironment:478 ENV OVEHOSTED_VDSM/vdsClient=instance:'<vdsClient.service instance at 0x41c17a0>' 2014-07-15 17:07:28 DEBUG otopi.context context.dumpEnvironment:482 ENVIRONMENT DUMP - END 2014-07-15 17:07:28 DEBUG otopi.context context._executeMethod:138 Stage late_setup METHOD otopi.plugins.ovirt_hosted_engine_setup.vm.configurevm.Plugin._late_setup 2014-07-15 17:07:28 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/otopi/context.py", line 142, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/vm/configurevm.py", line 99, in _late_setup response = serv.s.list() File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__ return self.__send(self.__name, args) File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request verbose=self.__verbose File "/usr/lib64/python2.6/xmlrpclib.py", line 1235, in request self.send_content(h, request_body) File "/usr/lib64/python2.6/xmlrpclib.py", line 1349, in send_content connection.endheaders() File "/usr/lib64/python2.6/httplib.py", line 908, in endheaders self._send_output() File "/usr/lib64/python2.6/httplib.py", line 780, in _send_output self.send(msg) File "/usr/lib64/python2.6/httplib.py", line 739, in send self.connect() File "/usr/lib64/python2.6/site-packages/vdsm/SecureXMLRPCServer.py", line 188, in connect sock = socket.create_connection((self.host, self.port), self.timeout) File "/usr/lib64/python2.6/socket.py", line 567, in create_connection raise error, msg error: [Errno 111] Connection refused 2014-07-15 17:07:28 ERROR otopi.context context._executeMethod:161 Failed to execute stage 'Environment setup': [Errno 111] Connection refused 2014-07-15 17:07:28 DEBUG otopi.context context.dumpEnvironment:468 ENVIRONMENT DUMP - BEGIN 2014-07-15 17:07:28 DEBUG otopi.context context.dumpEnvironment:478 ENV BASE/error=bool:'True' 2014-07-15 17:07:28 DEBUG otopi.context context.dumpEnvironment:478 ENV BASE/exceptionInfo=list:'[(<class 'socket.error'>, error(111, 'Connection refused'), <traceback object at 0x4295ef0>)]' Version-Release number of selected component (if applicable): 3.4.2
Could you attach vdsm.log and supervdsm.log of the failing vdsm? Does it fail to start repeatedly, even when attempeted manually?
Created attachment 918341 [details] Hosted engine install log I can't start vdsmd manually, it seems to start fine (tell me it is ok) but when i restart or status i get vdsm not running (but the watchdog is running). So here is the install log (since my vdsm.log is empty, beacause it fails to start)
Created attachment 918342 [details] supervdsm log
Would you stop vdsm and try running: # su - vdsm -s /bin/bash $ /usr/share/vdsm/vdsm The output may hold hints on why its startup fails.
You were right : Traceback (most recent call last): File "/usr/share/vdsm/vdsm", line 39, in <module> pthreading.monkey_patch() File "/usr/lib/python2.6/site-packages/pthreading.py", line 154, in monkey_patch raise RuntimeError("You must monkey_patch before importing thread or " RuntimeError: You must monkey_patch before importing thread or threading modules Seems like the monkey patch function is not called soon enough
Just got an update of python-pthreding which fix the issue, but now another one appear : <WorkerThread(Thread-10, started daemon 140458351118080)> <WorkerThread(Thread-4, started daemon 140458885437184)> <Thread(Thread-11, started daemon 140458340628224)> <Thread(storageRefresh, started daemon 140458330138368)> <WorkerThread(Thread-8, started daemon 140458372097792)> Shutting down MOM <MomThread(MOM, started 140457966757632)> <HostStatsThread(Thread-12, started daemon 140457977247488)> <WorkerThread(Thread-2, started daemon 140458906416896)> VM channels listener was stopped. <Listener(VM Channels Listener, started daemon 140457945777920)> <WorkerThread(Thread-1, started daemon 140458916906752)> <WorkerThread(Thread-9, started daemon 140458361607936)> <WorkerThread(Thread-3, started daemon 140458895927040)> Operation 'lvm invalidate operation' got the operation mutex Operation 'lvm invalidate operation' released the operation mutex Operation 'lvm invalidate operation' got the operation mutex Operation 'lvm invalidate operation' released the operation mutex Operation 'lvm invalidate operation' got the operation mutex Operation 'lvm invalidate operation' released the operation mutex Returning last result Operation 'lvm reload operation' got the operation mutex Exception in thread MOM: Traceback (most recent call last): File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner self.run() File "/usr/lib64/python2.6/threading.py", line 484, in run self.__target(*self.__args, **self.__kwargs) File "/usr/lib/python2.6/site-packages/mom/__init__.py", line 28, in run hypervisor_iface = self.get_hypervisor_interface() File "/usr/lib/python2.6/site-packages/mom/__init__.py", line 211, in get_hypervisor_interface return module.instance(self.config) File "/usr/lib/python2.6/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 233, in instance return vdsmInterface() File "/usr/lib/python2.6/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 38, in __init__ self.vdsm_api = API.Global() File "/usr/share/vdsm/API.py", line 1091, in __init__ APIBase.__init__(self) File "/usr/share/vdsm/API.py", line 69, in __init__ self._cif = clientIF.getInstance() File "/usr/share/vdsm/clientIF.py", line 141, in getInstance raise Exception("Logging facility is required to create " Exception: Logging facility is required to create the single clientIF instance
But the installation progress seems to works now
It still blocks at this stage : [ INFO ] Stage: Transaction commit [ INFO ] Stage: Closing up To continue make a selection from the options below: (1) Continue setup - engine installation is complete (2) Power off and restart the VM (3) Abort setup I dont have any info to connect to the vm, and when I choose 2, I get this : [ ERROR ] Failed to execute stage 'Closing up': Command '/usr/bin/vdsClient' failed to execute [ INFO ] Stage: Clean up [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination
I suspect that vdsm is still failing to start, or at least to respond to /usr/bin/vdsClient. Could you attach vdsm.log now?
Created attachment 918983 [details] Vdsm log Here it is, seems to be related with storage. My setup is glusterized but worked well so far.
It seems that I was wrong: vdsm is running and responding to client requests. Could you copy the ovirt-host-deploy.log of your second attempt (https://bugzilla.redhat.com/show_bug.cgi?id=1119828#c8). Maybe it has more hints regarding the detected error?
We need to get a clearer picture from the integration side (on the hosted-engine deploy stage). Hence moving to integration.
Created attachment 919549 [details] Hosted engine install log Here it is !
From the logs in attachment #919549 [details] I see that this is an additional host setup. The VM is running on hostid #1 and a dialog like the following is showed by another bug, see bug #1106561 To continue make a selection from the options below: (1) Continue setup - engine installation is complete (2) Power off and restart the VM (3) Abort setup Reporter choose option 2, so the setup tried to shutdown a vm not running here and failed. Closing this as duplicate of bug #1106561 *** This bug has been marked as a duplicate of bug 1106561 ***
Yes but there is still a bug, because if I select 1) i get a message in red saying the engine do not seems to be running...