Bug 1260892 - vdsmd fails to come up because networking prevents libvirtd from coming up
Summary: vdsmd fails to come up because networking prevents libvirtd from coming up
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ovirt-3.5.7
: 3.5.7
Assignee: Ido Barkan
QA Contact: Meni Yakove
URL:
Whiteboard: network
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-08 07:29 UTC by Pavel Zhukov
Modified: 2019-09-12 08:56 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-25 10:16:52 UTC
oVirt Team: Network
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Pavel Zhukov 2015-09-08 07:29:26 UTC
Description of problem:
RHEVH iso
After system boot both libvirtd and vdsmd are in failed state because libvirtd is unable to initialize socket. 

Version-Release number of selected component (if applicable):
Red Hat Enterprise Virtualization Hypervisor release 6.7 (20150828.0.el6ev)
Red Hat Enterprise Virtualization Hypervisor release 6.6 (20150603.0.el6ev)


How reproducible:
100% on some systems

Actual results:
libvirtd is down
with error message:
Starting libvirtd daemon: libvirtd: error: Unable to initialize network sockets. Check /var/log/messages or run without --daemon for more info.

Expected results:
vdsmd and libvirtd should be running

Additional info:
I was able to get this behaviour with following:

"""
service vdsmd stop
service libvirtd stop
rm -rf /var/run/libvirt
service vdsmd start
"""

Libvirtd was running before
Sep 07 10:53:53 Completed ovirt-cim
libvirtd start/running, process 15706
[  OK  ]
supervdsm start[  OK  ]
supervdsm start[  OK  ]

So looks like something removed/remounted run directory before vdsmd started.

Comment 2 Pavel Zhukov 2015-09-08 07:31:09 UTC
Created attachment 1071216 [details]
libvirt log

Comment 4 Fabian Deutsch 2015-09-08 07:43:49 UTC
Moving this to vdsm, as it looks like the network is not getting restored, from the case: "The network is not configured because vdsm is not started because libvirtd is failed to start. "

Comment 5 Pavel Zhukov 2015-09-08 08:01:41 UTC
(In reply to Fabian Deutsch from comment #4)
> Moving this to vdsm, as it looks like the network is not getting restored,
> from the case: "The network is not configured because vdsm is not started
> because libvirtd is failed to start. "

Fabian, 
Sorry but This is not networking issue but unix socket's one. I've opened bug against libvirt BZ#1260885 to change the error message).

Comment 6 Fabian Deutsch 2015-09-09 09:11:06 UTC
It has been discussed if libvirt is behaving right of not coming up when there is no network available.

But independent of this behavior, a problem here is that the networking did not come up as stated in comment 4.

Comment 7 Fabian Deutsch 2015-09-09 09:12:22 UTC
The libvirt behavior is actually nicely explained in bug 1260885 comment 2.

Comment 8 Michael Burman 2015-09-09 09:24:01 UTC
Pavel please add the exact steps to reproduce this bug.
Was the server installed in rhev-m?
Are networks configured via Setup Networks?
Was the server registrated to engine via TUI?
Was there any upgrade involved here?
Your description is not clear.

Thanks.

Comment 9 Dan Kenigsberg 2015-09-09 13:41:24 UTC
Please provide supervdsm.log and the content of /var/lib/vdsm and /etc/sysconfig/network-scripts

Is this an upgrade to 20150828.0.el6ev ?
Shouldn't the Version field be set to 3.5.4?

Comment 10 Michael Burman 2015-09-09 14:24:14 UTC
Can't reproduce this report with RHEV Hypervisor - 6.7 - 20150828.0.el6evvdsm-4.16.26-1.el6ev

Comment 12 Ido Barkan 2015-10-28 08:57:25 UTC
from supervdsm.log I see an older vdsm version:
# Generated by VDSM version 4.16.13.1-1.el6ev
which is 3.5.1.
And if it is, a lot has change in this are since 3.5.1 to latest 3.5.4 (where vdsm is tagged 4.16.26).

*** This bug has been marked as a duplicate of bug 1203422 ***

Comment 13 Pavel Zhukov 2015-11-01 10:47:40 UTC
(In reply to Ido Barkan from comment #12)
> from supervdsm.log I see an older vdsm version:
> # Generated by VDSM version 4.16.13.1-1.el6ev
> which is 3.5.1.
> And if it is, a lot has change in this are since 3.5.1 to latest 3.5.4
It's not true. You've pasted old logs record (see timestamp).

Comment 14 Eyal Edri 2015-11-01 14:16:36 UTC
this bug missed the build date of 3.5.6.
if you believe this is a blocker for the release, please set blocker flag and get relevant acks.

Comment 15 Ido Barkan 2015-11-02 07:22:01 UTC
(In reply to Pavel Zhukov from comment #13)
> (In reply to Ido Barkan from comment #12)
> > from supervdsm.log I see an older vdsm version:
> > # Generated by VDSM version 4.16.13.1-1.el6ev
> > which is 3.5.1.
> > And if it is, a lot has change in this are since 3.5.1 to latest 3.5.4
> It's not true. You've pasted old logs record (see timestamp).

okay,
In that case we need more info. Pavel can you please add the info requested in comment 8  and comment 9 ?

Comment 18 Pavel Zhukov 2015-11-02 08:07:04 UTC
(In reply to Michael Burman from comment #8)
> Pavel please add the exact steps to reproduce this bug.
I don't have hardware to reproduce it at home. Not reproducible with simple network configuration in nested env.
> Was the server installed in rhev-m?
Can you please elaborate? It was registered in rhevm.
> Are networks configured via Setup Networks?
It's upgraded hypervisor
> Was the server registrated to engine via TUI?
It's upgraded hypervisor
> Was there any upgrade involved here?
For sure it was. They hit BZ#1203422 and tried to upgrade to fix the issue.
> Your description is not clear.
> 
> Thanks.

Comment 19 Ido Barkan 2015-11-02 12:03:14 UTC
Ok, so now I understand the versions. sorry about that:

An upgrade of rhev-h 20150603.0.el6ev to rehv-h 20150828.0.el6ev
is an upgrade from vdsm 4.16.20-1 to 4.16.26-1
which is an upgrade from rhev 3.5.3 to rhev 3.5.4.

Since all I see in supervdsm.log is a lonely restart message I can only guess that somehow the restoration process failed to start.

Sadly, until 3.5.4 the ifcfg files where not persisted by rhev-h, so after boot, the node would go up without any ifcfg files owned by vdsm and vdsm would be recreating them according to the stored persistence. This was finally fixed in 3dd0baa (which is only part of 3.5.4- v4.16.24).
If, for some reason, during boot, vdsm failed to call the restoration script, or failed to load at all (libvirt being down is a possible reason), you are left with no networks at all. In your case, only ifcfg-eth0, which was there before vdsm, is there.

We can try to investigate further, but what happened between 3.5.1 and 3.5.4 in the network area is bad because of many reasons, which I hope most of them are fixed already.

Can you please ask the customer to restore his damaged host by hand on 3.5.4, persist the networks, and see if things are lost again in upgrade for latest 3.5? If if all is ok, there is nothing we can really do to help here.


Note You need to log in before you can comment on or make changes to this bug.