Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1106564

Summary: Vdsm failed to create rhevm bridge(happened during hosted-engine --deploy)
Product: Red Hat Enterprise Virtualization Manager Reporter: Artyom <alukiano>
Component: vdsmAssignee: Nobody <nobody>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Aharon Canan <acanan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: alukiano, bazulay, danken, gklein, iheim, jmoskovc, lpeer, mavital, sbonazzo, ybronhei, yeylon
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: integration
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-23 07:44:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm, supervdsm and libvirt logs
none
hosted-engine-setup.log none

Description Artyom 2014-06-09 15:20:35 UTC
Created attachment 904762 [details]
vdsm, supervdsm and libvirt logs

Description of problem:
Vdsm failed to create rhevm bridge, this happened during hosted-engine --deploy on 'Configuring the management bridge'.

Version-Release number of selected component (if applicable):
vdsm-4.14.7-3.el6ev.x86_64
libvirt-0.10.2-29.el6_5.8.x86_64

How reproducible:
50%

Steps to Reproduce:
1. On clean host(without any vdsm or libvirt packages) install hosted-engine,
yum install ovirt-hosted-engine-setup.noarch -y
2. Run hosted-engine --deploy, and continue until 'Configuring the management bridge' stage
3.

Actual results:
Setup failed with error

Expected results:
Setup success to configure rhevm bridge without any errors

Additional info:

Comment 1 Dan Kenigsberg 2014-06-11 08:36:41 UTC
supervdsmd died, since its libvirt connection broke, since libvirtd was restarted.

Could it be that `hosted-engine --deploy` calls `vdsm-tool configure` while vdsm is running? If so, it should better stop vdsm, supervdsm, and libvirt first, configure them, and re-start them.


MainProcess|Thread-16::DEBUG::2014-06-09 14:56:37,586::supervdsmServer::96::SuperVdsm.ServerCallback::(wrapper) call addNetwork with ('rhevm', {'force': 'False', 'nics': ['eth0'], 'bootproto': 'dhcp', 'bridged': 'True', 'blockingdhcp': 'true', 'ONBOOT': 'yes'}) {}
MainProcess|Thread-16::DEBUG::2014-06-09 14:56:37,587::utils::642::root::(execCmd) '/sbin/ip route show to 0.0.0.0/0 table all' (cwd None)
MainProcess|Thread-16::DEBUG::2014-06-09 14:56:37,605::utils::662::root::(execCmd) SUCCESS: <err> = ''; <rc> = 0
MainProcess|Thread-16::WARNING::2014-06-09 14:56:37,608::libvirtconnection::116::root::(wrapper) connection to libvirt broken. ecode: 1 edom: 7
MainProcess|Thread-16::CRITICAL::2014-06-09 14:56:37,608::libvirtconnection::118::root::(wrapper) taking calling process down.
MainThread::DEBUG::2014-06-09 14:56:37,609::supervdsmServer::424::SuperVdsm.Server::(main) Terminated normally

Comment 2 Jiri Moskovcak 2014-06-11 08:47:26 UTC
This is a question for the HE setup developers.

Comment 3 Sandro Bonazzola 2014-06-11 09:56:05 UTC
(In reply to Dan Kenigsberg from comment #1)
> supervdsmd died, since its libvirt connection broke, since libvirtd was
> restarted.
> 
> Could it be that `hosted-engine --deploy` calls `vdsm-tool configure` while
> vdsm is running? If so, it should better stop vdsm, supervdsm, and libvirt
> first, configure them, and re-start them.
> 


It may be. But services are ensured to be configured and started before trying to add the bridge.
However, we can change setup for shutting down the services before calling configure. I also was pretty sure that configure took care of shutting them down if found running.

Comment 4 Sandro Bonazzola 2014-06-11 11:19:41 UTC
Artyom can you reproduce? Can you also attach hosted engine logs?

Comment 5 Dan Kenigsberg 2014-06-11 12:19:50 UTC
(In reply to Sandro Bonazzola from comment #3)
> I also was pretty sure that configure took care of shutting them
> down if found running.

You are right - if this is indeed the case, it should be solved there.

Comment 6 Dan Kenigsberg 2014-06-11 12:29:27 UTC
Note that libvirtd was down for 12 long minutes.

2014-06-09 11:43:30.862+0000: 9947: debug : virConnectCompareCPU:17135 : conn=0x7ff28c05aff0, xmlDesc=<cpu match="minimum"><model>SandyBridge</model><vendor>Intel</vendor></cpu>, flags=0
2014-06-09 11:56:27.278+0000: 9929: debug : virHookCheck:119 : No hook script /etc/libvirt/hooks/daemon
2014-06-09 11:56:27.278+0000: 9929: debug : virHookCheck:119 : No hook script /etc/libvirt/hooks/qemu
2014-06-09 11:56:27.278+0000: 9929: debug : virHookCheck:119 : No hook script /etc/libvirt/hooks/lxc
2014-06-09 11:56:27.278+0000: 9929: info : virNetlinkEventServiceStopAll:420 : stopping all netlink event services
2014-06-09 11:56:27.450+0000: 20090: info : libvirt version: 0.10.2, package: 29.el6_5.8 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2014-05-14-10:17:57, x86-027.build.eng.bos.redhat.com)

Artyom, supervdsm.log suggests that it was running since 2014-06-06 14:27:25,199. Are you 100% sure that the host had no vdsm on in before deployment?

Comment 7 Artyom 2014-06-12 15:02:12 UTC
I was failed to reproduce exactly same error, but I ran on the same host, with the same parameters deployment process and encounter with problem that in time of rhevm bridge configuration, from some reason host failed to receive ip from dhcp.
I will attach hosted-engine-setup log I hope it will help to understand what happen, because it must be the same parameters, if I will success to catch this error I will update setup log.

Comment 8 Artyom 2014-06-12 15:02:54 UTC
Created attachment 908183 [details]
hosted-engine-setup.log

Comment 9 Artyom 2014-06-12 15:04:04 UTC
(In reply to Dan Kenigsberg from comment #6)
> Note that libvirtd was down for 12 long minutes.
> 
> 2014-06-09 11:43:30.862+0000: 9947: debug : virConnectCompareCPU:17135 :
> conn=0x7ff28c05aff0, xmlDesc=<cpu
> match="minimum"><model>SandyBridge</model><vendor>Intel</vendor></cpu>,
> flags=0
> 2014-06-09 11:56:27.278+0000: 9929: debug : virHookCheck:119 : No hook
> script /etc/libvirt/hooks/daemon
> 2014-06-09 11:56:27.278+0000: 9929: debug : virHookCheck:119 : No hook
> script /etc/libvirt/hooks/qemu
> 2014-06-09 11:56:27.278+0000: 9929: debug : virHookCheck:119 : No hook
> script /etc/libvirt/hooks/lxc
> 2014-06-09 11:56:27.278+0000: 9929: info : virNetlinkEventServiceStopAll:420
> : stopping all netlink event services
> 2014-06-09 11:56:27.450+0000: 20090: info : libvirt version: 0.10.2,
> package: 29.el6_5.8 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>,
> 2014-05-14-10:17:57, x86-027.build.eng.bos.redhat.com)
> 
> Artyom, supervdsm.log suggests that it was running since 2014-06-06
> 14:27:25,199. Are you 100% sure that the host had no vdsm on in before
> deployment?

Yes, it was clean host, from some reason vdsm log have difference with libvirt log in 3 hours.

Comment 10 Artyom 2014-06-12 15:07:54 UTC
I also tried to run clean deployment on other hosts and it successfully finished, so maybe it problem that related to this specific host.

Comment 11 Yaniv Bronhaim 2014-06-22 12:42:49 UTC
Sandro - we must call configure with the --force flag in that case, which stops the running services that relates to vdsm before restarting anything else.

Dan - I doubt that configure stop libvirt and not supervdsm, its on the same logic. I think something else happen to libvirt there

Comment 12 Dan Kenigsberg 2014-06-23 07:44:31 UTC
(In reply to Artyom from comment #9)

> > 2014-06-09 11:56:27.450+0000: 20090: info : libvirt version: 0.10.2,
> > package: 29.el6_5.8 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>,
> > 2014-05-14-10:17:57, x86-027.build.eng.bos.redhat.com)
> > 
> > Artyom, supervdsm.log suggests that it was running since 2014-06-06
> > 14:27:25,199. Are you 100% sure that the host had no vdsm on in before
> > deployment?
> 
> Yes, it was clean host, from some reason vdsm log have difference with
> libvirt log in 3 hours.

libvirt log is in UTC, your vdsm log is in Israel summer time (GMT+3), but this has nothing to do with the fact that your host was not clean when you started deployement. According to the logs, supervdsm has been running there since 3 days earlier.

Please reopen this bug when it reproduces, and include libvirtd.log, as Yaniv suspects that libvirt has crashed and not restarted.