Bug 1119828

Summary: Vdsm refuse to start during ovirt-hosted-engine setup
Product: [Retired] oVirt Reporter: Maël Lavault <mael.lavault>
Component: ovirt-hosted-engine-setupAssignee: Sandro Bonazzola <sbonazzo>
Status: CLOSED DUPLICATE QA Contact: meital avital <mavital>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4CC: acathrow, bazulay, bugs, ecohen, gklein, iheim, mael.lavault, mgoldboi, pstehlik, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.5.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: integration
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-07-25 13:53:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Hosted engine install log
none
supervdsm log
none
Vdsm log
none
Hosted engine install log none

Description Maël Lavault 2014-07-15 15:25:50 UTC
Description of problem:

I installed a first hosted engine (ovirt 3.4.2) on my first machine, it worked well.

I wanted to add a second host for HA, but i am unable to do so (worked in 3.4.0).

Everytime i launch hosted-engine --deploy, i get this :

[ INFO  ] Waiting for VDSM hardware info
[ ERROR ] Failed to execute stage 'Environment setup': [Errno 111] Connection refused

I tried everything, from wiping ovirt, reinstalling the os, ... nothing seems to work.

Here is the interesting part of logs :


2014-07-15 17:07:27 INFO otopi.plugins.ovirt_hosted_engine_setup.system.vdsmenv vdsmenv._connect:86 Waiting for VDSM hardware info
2014-07-15 17:07:28 DEBUG otopi.context context.dumpEnvironment:468 ENVIRONMENT DUMP - BEGIN
2014-07-15 17:07:28 DEBUG otopi.context context.dumpEnvironment:478 ENV OVEHOSTED_VDSM/vdsClient=instance:'<vdsClient.service instance at 0x41c17a0>'
2014-07-15 17:07:28 DEBUG otopi.context context.dumpEnvironment:482 ENVIRONMENT DUMP - END
2014-07-15 17:07:28 DEBUG otopi.context context._executeMethod:138 Stage late_setup METHOD otopi.plugins.ovirt_hosted_engine_setup.vm.configurevm.Plugin._late_setup
2014-07-15 17:07:28 DEBUG otopi.context context._executeMethod:152 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/otopi/context.py", line 142, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/vm/configurevm.py", line 99, in _late_setup
    response = serv.s.list()
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1199, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1489, in __request
    verbose=self.__verbose
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1235, in request
    self.send_content(h, request_body)
  File "/usr/lib64/python2.6/xmlrpclib.py", line 1349, in send_content
    connection.endheaders()
  File "/usr/lib64/python2.6/httplib.py", line 908, in endheaders
    self._send_output()
  File "/usr/lib64/python2.6/httplib.py", line 780, in _send_output
    self.send(msg)
  File "/usr/lib64/python2.6/httplib.py", line 739, in send
    self.connect()
  File "/usr/lib64/python2.6/site-packages/vdsm/SecureXMLRPCServer.py", line 188, in connect
    sock = socket.create_connection((self.host, self.port), self.timeout)
  File "/usr/lib64/python2.6/socket.py", line 567, in create_connection
    raise error, msg
error: [Errno 111] Connection refused
2014-07-15 17:07:28 ERROR otopi.context context._executeMethod:161 Failed to execute stage 'Environment setup': [Errno 111] Connection refused
2014-07-15 17:07:28 DEBUG otopi.context context.dumpEnvironment:468 ENVIRONMENT DUMP - BEGIN
2014-07-15 17:07:28 DEBUG otopi.context context.dumpEnvironment:478 ENV BASE/error=bool:'True'
2014-07-15 17:07:28 DEBUG otopi.context context.dumpEnvironment:478 ENV BASE/exceptionInfo=list:'[(<class 'socket.error'>, error(111, 'Connection refused'), <traceback object at 0x4295ef0>)]'


Version-Release number of selected component (if applicable):
3.4.2

Comment 1 Dan Kenigsberg 2014-07-15 16:25:59 UTC
Could you attach vdsm.log and supervdsm.log of the failing vdsm?

Does it fail to start repeatedly, even when attempeted manually?

Comment 2 Maël Lavault 2014-07-16 07:44:15 UTC
Created attachment 918341 [details]
Hosted engine install log

I can't start vdsmd manually, it seems to start fine (tell me it is ok) but when i restart or status i get vdsm not running (but the watchdog is running).

So here is the install log (since my vdsm.log is empty, beacause it fails to start)

Comment 3 Maël Lavault 2014-07-16 07:44:45 UTC
Created attachment 918342 [details]
supervdsm log

Comment 4 Dan Kenigsberg 2014-07-16 11:18:31 UTC
Would you stop vdsm and try running:

# su - vdsm -s /bin/bash
$ /usr/share/vdsm/vdsm

The output may hold hints on why its startup fails.

Comment 5 Maël Lavault 2014-07-16 11:21:21 UTC
You were right :

Traceback (most recent call last):
  File "/usr/share/vdsm/vdsm", line 39, in <module>
    pthreading.monkey_patch()
  File "/usr/lib/python2.6/site-packages/pthreading.py", line 154, in monkey_patch
    raise RuntimeError("You must monkey_patch before importing thread or "
RuntimeError: You must monkey_patch before importing thread or threading modules

Seems like the monkey patch function is not called soon enough

Comment 6 Maël Lavault 2014-07-16 11:29:44 UTC
Just got an update of python-pthreding which fix the issue, but now another one appear :

<WorkerThread(Thread-10, started daemon 140458351118080)>
<WorkerThread(Thread-4, started daemon 140458885437184)>
<Thread(Thread-11, started daemon 140458340628224)>
<Thread(storageRefresh, started daemon 140458330138368)>
<WorkerThread(Thread-8, started daemon 140458372097792)>
Shutting down MOM
<MomThread(MOM, started 140457966757632)>
<HostStatsThread(Thread-12, started daemon 140457977247488)>
<WorkerThread(Thread-2, started daemon 140458906416896)>
VM channels listener was stopped.
<Listener(VM Channels Listener, started daemon 140457945777920)>
<WorkerThread(Thread-1, started daemon 140458916906752)>
<WorkerThread(Thread-9, started daemon 140458361607936)>
<WorkerThread(Thread-3, started daemon 140458895927040)>
Operation 'lvm invalidate operation' got the operation mutex
Operation 'lvm invalidate operation' released the operation mutex
Operation 'lvm invalidate operation' got the operation mutex
Operation 'lvm invalidate operation' released the operation mutex
Operation 'lvm invalidate operation' got the operation mutex
Operation 'lvm invalidate operation' released the operation mutex
Returning last result
Operation 'lvm reload operation' got the operation mutex
Exception in thread MOM:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.6/threading.py", line 484, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.6/site-packages/mom/__init__.py", line 28, in run
    hypervisor_iface = self.get_hypervisor_interface()
  File "/usr/lib/python2.6/site-packages/mom/__init__.py", line 211, in get_hypervisor_interface
    return module.instance(self.config)
  File "/usr/lib/python2.6/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 233, in instance
    return vdsmInterface()
  File "/usr/lib/python2.6/site-packages/mom/HypervisorInterfaces/vdsmInterface.py", line 38, in __init__
    self.vdsm_api = API.Global()
  File "/usr/share/vdsm/API.py", line 1091, in __init__
    APIBase.__init__(self)
  File "/usr/share/vdsm/API.py", line 69, in __init__
    self._cif = clientIF.getInstance()
  File "/usr/share/vdsm/clientIF.py", line 141, in getInstance
    raise Exception("Logging facility is required to create "
Exception: Logging facility is required to create the single clientIF instance

Comment 7 Maël Lavault 2014-07-16 11:33:08 UTC
But the installation progress seems to works now

Comment 8 Maël Lavault 2014-07-16 13:37:27 UTC
It still blocks at this stage :

[ INFO  ] Stage: Transaction commit
[ INFO  ] Stage: Closing up
          To continue make a selection from the options below:
          (1) Continue setup - engine installation is complete
          (2) Power off and restart the VM
          (3) Abort setup

I dont have any info to connect to the vm, and when I choose 2, I get this :

[ ERROR ] Failed to execute stage 'Closing up': Command '/usr/bin/vdsClient' failed to execute
[ INFO  ] Stage: Clean up
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination

Comment 9 Dan Kenigsberg 2014-07-17 15:03:35 UTC
I suspect that vdsm is still failing to start, or at least to respond to /usr/bin/vdsClient. Could you attach vdsm.log now?

Comment 10 Maël Lavault 2014-07-18 07:29:39 UTC
Created attachment 918983 [details]
Vdsm log

Here it is, seems to be related with storage. My setup is glusterized but worked well so far.

Comment 11 Dan Kenigsberg 2014-07-18 11:57:17 UTC
It seems that I was wrong: vdsm is running and responding to client requests. Could you copy the ovirt-host-deploy.log of your second attempt (https://bugzilla.redhat.com/show_bug.cgi?id=1119828#c8). Maybe it has more hints regarding the detected error?

Comment 12 Barak 2014-07-20 11:32:43 UTC
We need to get a clearer picture from the integration side (on the hosted-engine deploy stage).
Hence moving to integration.

Comment 13 Maël Lavault 2014-07-21 08:01:38 UTC
Created attachment 919549 [details]
Hosted engine install log

Here it is !

Comment 14 Sandro Bonazzola 2014-07-25 13:53:42 UTC
From the logs in attachment #919549 [details] I see that this is an additional host setup.
The VM is running on hostid #1 and a dialog like the following is showed by another bug, see bug #1106561

To continue make a selection from the options below:
(1) Continue setup - engine installation is complete
(2) Power off and restart the VM
(3) Abort setup

Reporter choose option 2, so the setup tried to shutdown a vm not running here and failed.
Closing this as duplicate of bug #1106561

*** This bug has been marked as a duplicate of bug 1106561 ***

Comment 15 Maël Lavault 2014-07-25 14:14:11 UTC
Yes but there is still a bug, because if I select 1) i get a message in red saying the engine do not seems to be running...