Description of problem: RHEVH iso After system boot both libvirtd and vdsmd are in failed state because libvirtd is unable to initialize socket. Version-Release number of selected component (if applicable): Red Hat Enterprise Virtualization Hypervisor release 6.7 (20150828.0.el6ev) Red Hat Enterprise Virtualization Hypervisor release 6.6 (20150603.0.el6ev) How reproducible: 100% on some systems Actual results: libvirtd is down with error message: Starting libvirtd daemon: libvirtd: error: Unable to initialize network sockets. Check /var/log/messages or run without --daemon for more info. Expected results: vdsmd and libvirtd should be running Additional info: I was able to get this behaviour with following: """ service vdsmd stop service libvirtd stop rm -rf /var/run/libvirt service vdsmd start """ Libvirtd was running before Sep 07 10:53:53 Completed ovirt-cim libvirtd start/running, process 15706 [ OK ] supervdsm start[ OK ] supervdsm start[ OK ] So looks like something removed/remounted run directory before vdsmd started.
Created attachment 1071216 [details] libvirt log
Moving this to vdsm, as it looks like the network is not getting restored, from the case: "The network is not configured because vdsm is not started because libvirtd is failed to start. "
(In reply to Fabian Deutsch from comment #4) > Moving this to vdsm, as it looks like the network is not getting restored, > from the case: "The network is not configured because vdsm is not started > because libvirtd is failed to start. " Fabian, Sorry but This is not networking issue but unix socket's one. I've opened bug against libvirt BZ#1260885 to change the error message).
It has been discussed if libvirt is behaving right of not coming up when there is no network available. But independent of this behavior, a problem here is that the networking did not come up as stated in comment 4.
The libvirt behavior is actually nicely explained in bug 1260885 comment 2.
Pavel please add the exact steps to reproduce this bug. Was the server installed in rhev-m? Are networks configured via Setup Networks? Was the server registrated to engine via TUI? Was there any upgrade involved here? Your description is not clear. Thanks.
Please provide supervdsm.log and the content of /var/lib/vdsm and /etc/sysconfig/network-scripts Is this an upgrade to 20150828.0.el6ev ? Shouldn't the Version field be set to 3.5.4?
Can't reproduce this report with RHEV Hypervisor - 6.7 - 20150828.0.el6evvdsm-4.16.26-1.el6ev
from supervdsm.log I see an older vdsm version: # Generated by VDSM version 4.16.13.1-1.el6ev which is 3.5.1. And if it is, a lot has change in this are since 3.5.1 to latest 3.5.4 (where vdsm is tagged 4.16.26). *** This bug has been marked as a duplicate of bug 1203422 ***
(In reply to Ido Barkan from comment #12) > from supervdsm.log I see an older vdsm version: > # Generated by VDSM version 4.16.13.1-1.el6ev > which is 3.5.1. > And if it is, a lot has change in this are since 3.5.1 to latest 3.5.4 It's not true. You've pasted old logs record (see timestamp).
this bug missed the build date of 3.5.6. if you believe this is a blocker for the release, please set blocker flag and get relevant acks.
(In reply to Pavel Zhukov from comment #13) > (In reply to Ido Barkan from comment #12) > > from supervdsm.log I see an older vdsm version: > > # Generated by VDSM version 4.16.13.1-1.el6ev > > which is 3.5.1. > > And if it is, a lot has change in this are since 3.5.1 to latest 3.5.4 > It's not true. You've pasted old logs record (see timestamp). okay, In that case we need more info. Pavel can you please add the info requested in comment 8 and comment 9 ?
(In reply to Michael Burman from comment #8) > Pavel please add the exact steps to reproduce this bug. I don't have hardware to reproduce it at home. Not reproducible with simple network configuration in nested env. > Was the server installed in rhev-m? Can you please elaborate? It was registered in rhevm. > Are networks configured via Setup Networks? It's upgraded hypervisor > Was the server registrated to engine via TUI? It's upgraded hypervisor > Was there any upgrade involved here? For sure it was. They hit BZ#1203422 and tried to upgrade to fix the issue. > Your description is not clear. > > Thanks.
Ok, so now I understand the versions. sorry about that: An upgrade of rhev-h 20150603.0.el6ev to rehv-h 20150828.0.el6ev is an upgrade from vdsm 4.16.20-1 to 4.16.26-1 which is an upgrade from rhev 3.5.3 to rhev 3.5.4. Since all I see in supervdsm.log is a lonely restart message I can only guess that somehow the restoration process failed to start. Sadly, until 3.5.4 the ifcfg files where not persisted by rhev-h, so after boot, the node would go up without any ifcfg files owned by vdsm and vdsm would be recreating them according to the stored persistence. This was finally fixed in 3dd0baa (which is only part of 3.5.4- v4.16.24). If, for some reason, during boot, vdsm failed to call the restoration script, or failed to load at all (libvirt being down is a possible reason), you are left with no networks at all. In your case, only ifcfg-eth0, which was there before vdsm, is there. We can try to investigate further, but what happened between 3.5.1 and 3.5.4 in the network area is bad because of many reasons, which I hope most of them are fixed already. Can you please ask the customer to restore his damaged host by hand on 3.5.4, persist the networks, and see if things are lost again in upgrade for latest 3.5? If if all is ok, there is nothing we can really do to help here.