Bug 980493 - vdsm: host install fails as vdsmd is not up until manual restart
vdsm: host install fails as vdsmd is not up until manual restart
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.3.0
x86_64 Linux
high Severity high
: ---
: 3.3.0
Assigned To: Yaniv Bronhaim
Meni Yakove
infra
: Reopened, Triaged
: 984416 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-02 10:12 EDT by Dafna Ron
Modified: 2016-02-10 14:36 EST (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-19 17:40:17 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs (53.50 KB, application/x-gzip)
2013-07-02 10:12 EDT, Dafna Ron
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 17066 None None None Never

  None (edit)
Description Dafna Ron 2013-07-02 10:12:53 EDT
Created attachment 767755 [details]
logs

Description of problem:

I tried installing in rhevm a clean rhel6.4 host and host installation fails with the following error: 

libvir: Network Filter Driver error : Network filter not found: no nwfilter with matching name 'vdsm-no-mac-spoofing'

Version-Release number of selected component (if applicable):

vdsm-4.11.0-69.gitd70e3d5.el6.x86_64

How reproducible:

100%

Steps to Reproduce:
1. install rhel6.4 on a host
2. install the host in rhevm
3.

Actual results:

host fails to install with the following error: 
libvir: Network Filter Driver error : Network filter not found: no nwfilter with matching name 'vdsm-no-mac-spoofing'

and rhevm bridge is not installed 

Expected results:

we should succeed to install the host

Additional info: logs
Comment 1 Dan Kenigsberg 2013-07-06 18:09:39 EDT
I believe that the "no nwfilter" text is a no more than a log-noise distraction to the real bug here, since DESPITE that message, vdsmd was started successfully. 

2013-07-02 16:45:03 DEBUG otopi.plugins.otopi.services.rhel plugin.executeRaw:382 execute-result: ('/sbin/service', 'vdsmd', 'start'), rc=0
2013-07-02 16:45:03 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:440 execute-output: ('/sbin/service', 'vdsmd', 'start') stdout:
vdsm: libvirt already configured for vdsm [  OK  ]
Starting iscsid...
Starting multipathd...
Starting wdmd...
Starting sanlock...
Starting supervdsmd...
Starting up vdsm daemon: 
[  OK  ]
vdsm start[  OK  ]

2013-07-02 16:45:03 DEBUG otopi.plugins.otopi.services.rhel plugin.execute:445 execute-output: ('/sbin/service', 'vdsmd', 'start') stderr:
libvir: Network Filter Driver error : Network filter not found: no nwfilter with matching name 'vdsm-no-mac-spoofing'
Comment 2 Dan Kenigsberg 2013-07-11 04:08:01 EDT
It seems that Engine fails to access Vdsm over xmlrpc after vdsm is installed. However, vdsm.log shows no problem of starting up. Is vdsm manually responsive after installation? Could there be some firewall or routing issue to the host?
Comment 3 Dan Kenigsberg 2013-07-11 04:37:30 EDT
Genady, could you try to reproduce this on another system, and reply to the questions of comment 2?
Comment 4 Meni Yakove 2013-07-11 10:22:56 EDT
Have another issue https://bugzilla.redhat.com/show_bug.cgi?id=975759.
can't reproduce it now.
Comment 5 Dafna Ron 2013-07-14 04:16:10 EDT
(In reply to Dan Kenigsberg from comment #2)
> It seems that Engine fails to access Vdsm over xmlrpc after vdsm is
> installed. However, vdsm.log shows no problem of starting up. Is vdsm
> manually responsive after installation? Could there be some firewall or
> routing issue to the host?

Dan, vdsm is down... I know this because others have encountered the same issue on a reinstall of host and the host was non-responsive. a vdsm restart solved the issues (this is on a reinstalled). 
in my case, since it was a clean host install and we had to create the bridge, it appeared that the bridge was not created.
Comment 6 Dan Kenigsberg 2013-07-15 08:48:42 EDT
We need a reproducer, before vdsm was manually restarted, so we can review the logs and find more clues. Dafna's attached logs show no proof of failure to add the bridge.

Meni, was the re-assigning intentional? It is unhelpful given Toni's PTO.
Comment 8 Yaniv Bronhaim 2013-07-23 08:06:47 EDT
From the logs it doesn't look like the same bug we saw in the ci tests, although the title is same. There we could see restarts of libvirt service many tries during its run. Before fixing bug 984267 supervdsm didn't treat the broken connection to libvirt as needed, now it solved but still we need to figure why libvirt was restarted so constantly.

Keep waiting for reproducer.
Comment 9 Yaniv Bronhaim 2013-07-23 08:27:59 EDT
*** Bug 984416 has been marked as a duplicate of this bug. ***
Comment 10 Barak 2013-07-29 08:32:20 EDT
Meni, do we have a reproduce scenario ?
Comment 11 awinter 2013-07-29 11:41:09 EDT
I have successfully installed the host.
The installation succeeded without manual restart.
Comment 12 Yaniv Bronhaim 2013-07-31 04:56:10 EDT
Dan, I understand that the issue related to setupNetwork verb that leads to supervdsm crash. Please provide more details and a fix for the crash.
The attached patch fixes the supervdsm recover issue, not the crash itself.
Comment 13 Yaniv Bronhaim 2013-08-06 07:49:07 EDT
Please provide new engine.log and vdsm*.log with the issue, can't reproduce it.
Comment 14 Yaniv Bronhaim 2013-08-13 04:14:25 EDT
No reproducer was provided. Currently the installation flow works as expected. The attached patch is merged, but it doesn't point on specific fail that could cause the bug description. The cause for the bug appearance is not clear.

Note You need to log in before you can comment on or make changes to this bug.