Bug 980493

Summary: vdsm: host install fails as vdsmd is not up until manual restart
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Yaniv Bronhaim <ybronhei>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Meni Yakove <myakove>
Severity: high Docs Contact:
Priority: high    
Version: 3.3.0CC: acathrow, awinter, bazulay, danken, dron, eedri, gcheresh, hateya, iheim, jkt, lpeer, myakove, obasan, pstehlik, yeylon
Target Milestone: ---Keywords: Reopened, Triaged
Target Release: 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-19 21:40:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2013-07-02 14:12:53 UTC
Created attachment 767755 [details]
logs

Description of problem:

I tried installing in rhevm a clean rhel6.4 host and host installation fails with the following error: 

libvir: Network Filter Driver error : Network filter not found: no nwfilter with matching name 'vdsm-no-mac-spoofing'

Version-Release number of selected component (if applicable):

vdsm-4.11.0-69.gitd70e3d5.el6.x86_64

How reproducible:

100%

Steps to Reproduce:
1. install rhel6.4 on a host
2. install the host in rhevm
3.

Actual results:

host fails to install with the following error: 
libvir: Network Filter Driver error : Network filter not found: no nwfilter with matching name 'vdsm-no-mac-spoofing'

and rhevm bridge is not installed 

Expected results:

we should succeed to install the host

Additional info: logs

Comment 1 Dan Kenigsberg 2013-07-06 22:09:39 UTC
I believe that the "no nwfilter" text is a no more than a log-noise distraction to the real bug here, since DESPITE that message, vdsmd was started successfully. 

2013-07-02 16:45:03 DEBUG otopi.plugins.otopi.services.rhel plugin.executeRaw:382 execute-result: ('/sbin/service', 'vdsmd', 'start'), rc=0
2013-07-02 16:45:03 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:440 execute-output: ('/sbin/service', 'vdsmd', 'start') stdout:
vdsm: libvirt already configured for vdsm [  OK  ]
Starting iscsid...
Starting multipathd...
Starting wdmd...
Starting sanlock...
Starting supervdsmd...
Starting up vdsm daemon: 
[  OK  ]
vdsm start[  OK  ]

2013-07-02 16:45:03 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:445 execute-output: ('/sbin/service', 'vdsmd', 'start') stderr:
libvir: Network Filter Driver error : Network filter not found: no nwfilter with matching name 'vdsm-no-mac-spoofing'

Comment 2 Dan Kenigsberg 2013-07-11 08:08:01 UTC
It seems that Engine fails to access Vdsm over xmlrpc after vdsm is installed. However, vdsm.log shows no problem of starting up. Is vdsm manually responsive after installation? Could there be some firewall or routing issue to the host?

Comment 3 Dan Kenigsberg 2013-07-11 08:37:30 UTC
Genady, could you try to reproduce this on another system, and reply to the questions of comment 2?

Comment 4 Meni Yakove 2013-07-11 14:22:56 UTC
Have another issue https://bugzilla.redhat.com/show_bug.cgi?id=975759.
can't reproduce it now.

Comment 5 Dafna Ron 2013-07-14 08:16:10 UTC
(In reply to Dan Kenigsberg from comment #2)
> It seems that Engine fails to access Vdsm over xmlrpc after vdsm is
> installed. However, vdsm.log shows no problem of starting up. Is vdsm
> manually responsive after installation? Could there be some firewall or
> routing issue to the host?

Dan, vdsm is down... I know this because others have encountered the same issue on a reinstall of host and the host was non-responsive. a vdsm restart solved the issues (this is on a reinstalled). 
in my case, since it was a clean host install and we had to create the bridge, it appeared that the bridge was not created.

Comment 6 Dan Kenigsberg 2013-07-15 12:48:42 UTC
We need a reproducer, before vdsm was manually restarted, so we can review the logs and find more clues. Dafna's attached logs show no proof of failure to add the bridge.

Meni, was the re-assigning intentional? It is unhelpful given Toni's PTO.

Comment 8 Yaniv Bronhaim 2013-07-23 12:06:47 UTC
From the logs it doesn't look like the same bug we saw in the ci tests, although the title is same. There we could see restarts of libvirt service many tries during its run. Before fixing bug 984267 supervdsm didn't treat the broken connection to libvirt as needed, now it solved but still we need to figure why libvirt was restarted so constantly.

Keep waiting for reproducer.

Comment 9 Yaniv Bronhaim 2013-07-23 12:27:59 UTC
*** Bug 984416 has been marked as a duplicate of this bug. ***

Comment 10 Barak 2013-07-29 12:32:20 UTC
Meni, do we have a reproduce scenario ?

Comment 11 awinter 2013-07-29 15:41:09 UTC
I have successfully installed the host.
The installation succeeded without manual restart.

Comment 12 Yaniv Bronhaim 2013-07-31 08:56:10 UTC
Dan, I understand that the issue related to setupNetwork verb that leads to supervdsm crash. Please provide more details and a fix for the crash.
The attached patch fixes the supervdsm recover issue, not the crash itself.

Comment 13 Yaniv Bronhaim 2013-08-06 11:49:07 UTC
Please provide new engine.log and vdsm*.log with the issue, can't reproduce it.

Comment 14 Yaniv Bronhaim 2013-08-13 08:14:25 UTC
No reproducer was provided. Currently the installation flow works as expected. The attached patch is merged, but it doesn't point on specific fail that could cause the bug description. The cause for the bug appearance is not clear.