Bug 1124372

Summary: vdsm failed when installing hosted engine
Product: [Retired] oVirt Reporter: Jiri Moskovcak <jmoskovc>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED CURRENTRELEASE QA Contact: Gil Klein <gklein>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.5CC: acathrow, bazulay, bugs, danken, dfediuck, ecohen, gklein, iheim, jmoskovc, mgoldboi, oourfali, pstehlik, ybronhei, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-27 07:37:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1036731, 1153278    
Attachments:
Description Flags
vdsm log
none
supervdsm log
none
libvirt log
none
messages none

Description Jiri Moskovcak 2014-07-29 11:18:51 UTC
Created attachment 922110 [details]
vdsm log

Description of problem:
I get: [ ERROR ] Failed to execute stage 'Misc configuration': Command '/usr/bin/vdsClient' failed to execute when running hosted-engine --deploy

Version-Release number of selected component (if applicable):
vdsm-4.16.1-1.git13255aa.el6.x86_64
ovirt-hosted-engine-setup-1.2.0-0.1.master.el6.noarch
ovirt-hosted-engine-ha-1.2.1-0.2.master.20140725101535.el6.noarch

How reproducible:
100%

Steps to Reproduce:
1. run hosted-engine --deploy
2. follow the wizard
3. setup fails after the management bridge setup step

Actual results:
in description

Expected results:
no error, working hosted engine

Additional info:
see the attached log

Comment 1 Dan Kenigsberg 2014-07-29 11:47:33 UTC
Whenever

   File "/usr/share/vdsm/supervdsm.py", line 50, in __call__

is seen in vdsm.log, it means that the juicy stuff is in supervdsm.log. Would you attach it, too?

Comment 2 Jiri Moskovcak 2014-07-29 11:50:35 UTC
Created attachment 922127 [details]
supervdsm log

Comment 3 Dan Kenigsberg 2014-07-29 12:01:15 UTC
I see that libvirtd has been stopped since supervdsmd was started:

    allNets = ((net, net.name()) for net in conn.listAllNetworks(0))
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 129, in wrapper
    __connections.get(id(target)).pingLibvirt()
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 3389, in getLibVersion
    if ret == -1: raise libvirtError ('virConnectGetLibVersion() failed', conn=self)
libvirtError: internal error client socket is closed

Could you attach the log of ovirt-host-deploy, to understand why?

Comment 4 Yaniv Bronhaim 2014-07-29 14:41:26 UTC
since supervdsmd was started ? no.. supervdsmd restarted because libvirt connection got broken somehow, maybe because libvirtd was restarted.. can we share also libvirt log?

(MainProcess|Thread-16::WARNING::2014-07-29 12:36:13,005::libvirtconnection::135::root::(wrapper) connection to libvirt broken. ecode: 1 edom: 7
MainProcess|Thread-16::CRITICAL::2014-07-29 12:36:13,005::libvirtconnection::137::root::(wrapper) taking calling process down.
MainProcess|Thread-16::DEBUG::2014-07-29 12:36:13,007::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 1 edom: 7 level: 2 message: internal error client socket is closed
MainThread::DEBUG::2014-07-29 12:36:13,007::supervdsmServer::445::SuperVdsm.Server::(main) Terminated normally

^^ this leads to SIGTERM to the called, which is supervdsmd)

can it be related to the recent change http://gerrit.ovirt.org/30659 ? I think not, as in the description you mention only el6 packages..

Comment 5 Jiri Moskovcak 2014-07-29 15:31:53 UTC
Created attachment 922187 [details]
libvirt log

Comment 6 Yaniv Bronhaim 2014-07-31 07:27:29 UTC
please add also the deploy log as dan requested and /var/log/messages for me 

thanks.

Comment 7 Jiri Moskovcak 2014-07-31 09:39:02 UTC
Created attachment 922863 [details]
messages

Comment 8 Jiri Moskovcak 2014-07-31 09:40:02 UTC
(In reply to Yaniv Bronhaim from comment #6)
> please add also the deploy log as dan requested and /var/log/messages for me 
> 
> thanks.

- I'm sorry, but there is no deploy log, at that point the host-deploy wasn't in the game yet...

Comment 9 Dan Kenigsberg 2014-07-31 11:02:40 UTC
libvirtd is crashing:

Jul 29 15:49:21 dev-27 init: libvirtd main process (2151) terminated with status 1
Jul 29 15:49:21 dev-27 init: libvirtd main process ended, respawning
Jul 29 15:49:21 dev-27 init: libvirtd main process (2159) terminated with status 1
Jul 29 15:49:21 dev-27 init: libvirtd main process ended, respawning
Jul 29 15:49:21 dev-27 init: libvirtd main process (2167) terminated with status 1
Jul 29 15:49:21 dev-27 init: libvirtd respawning too fast, stopped


2014-07-29 13:49:21.853+0000: 2159: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable


Can you start libvirt from the command line (as root)?

# libvirtd --listen

Comment 10 Jiri Moskovcak 2014-07-31 12:14:18 UTC
(In reply to Dan Kenigsberg from comment #9)
> libvirtd is crashing:
> 
> Jul 29 15:49:21 dev-27 init: libvirtd main process (2151) terminated with
> status 1
> Jul 29 15:49:21 dev-27 init: libvirtd main process ended, respawning
> Jul 29 15:49:21 dev-27 init: libvirtd main process (2159) terminated with
> status 1
> Jul 29 15:49:21 dev-27 init: libvirtd main process ended, respawning
> Jul 29 15:49:21 dev-27 init: libvirtd main process (2167) terminated with
> status 1
> Jul 29 15:49:21 dev-27 init: libvirtd respawning too fast, stopped
> 
> 
> 2014-07-29 13:49:21.853+0000: 2159: error : virPidFileAcquirePath:410 :
> Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily
> unavailable
> 
> 
> Can you start libvirt from the command line (as root)?
> 
> # libvirtd --listen

I've restarted the machine and it doesn't happen anymore. libvirtd and vdsm seems up'n'running.

Comment 11 Dan Kenigsberg 2014-07-31 13:28:30 UTC
Michal Privoznik explains here https://www.redhat.com/archives/libvirt-users/2012-October/msg00146.html that this condition can happen when there is a stale libvirtd process.

When this reproduces, please run `lsof /var/run/libvirtd.pid` so we can trace which is the other process that locks the pid file. Make sure to extract as much info about that process, in particular `ps -o stime,ppid` so we can guess when it was started and by whom.

Comment 12 Oved Ourfali 2014-08-03 05:27:12 UTC
Dan - anything we can do with that?

Comment 13 Yaniv Bronhaim 2014-08-03 08:07:27 UTC
I follow that up, with dan observation and comments :)
Jiri, let me know if you successfully reproduced it and ran lsof on the pid file?

Comment 14 Yaniv Bronhaim 2014-08-26 14:50:14 UTC
Jiri, can you reply ? with no reproducer we can't proceed

Comment 15 Jiri Moskovcak 2014-08-27 07:37:34 UTC
I can't reproduce it anymore with vdsm-4.16.2-1.gite8cba75.el6.x86_64. Let's close this bug to unblock the depending bz and I'll reopen it if I run into that problem again.