Description of problem: I've been experiencing peculiar and annoying networking behavior on my oVirt development hosts and I'm hoping someone familiar with vdsm networking configuration can help me get to the bottom of it. My setup is two mini-Dells acting as virt hosts and ovirt engine running on my laptop. The dells get their network config from a cobbler instance running on my laptop which also provides PXE services. After freshly installing the dells, I get a nice, stable network connection. After installing vdsm, the connection seems to drop occasionally. I have visit the machine, log into the console, and execute 'dhclient ovirtmgmt'. This fixes the problem again for awhile. Version-Release number of selected component (if applicable): vdsm-4.16.0-522.git4a3768f.fc20.x86_64 How reproducible: For me, always (after a non-deterministic period of time elapses) Steps to Reproduce: 1. Start vdsm 2. Use vdsm for awhile Actual results: The ovirtmgmt interface loses its IP address Expected results: Connectivity is not interrupted Additional info:
Created attachment 956412 [details] supervdsm.log
Created attachment 956413 [details] vdsm.log
As suggested by Ondřej Svoboda, I tried disabling NetworkManager and that did not seem to resolve the problem.
Adam, did you lose connection after VDSM and superVDSM restarted (not necessarily)? Could you look in engine logs? supervdsm.log MainThread::DEBUG::2014-11-11 14:19:31,399::supervdsmServer::451::SuperVdsm.Server::(main) Terminated normally vdsm.log MainThread::DEBUG::2014-11-11 14:19:26,600::vdsm::58::vds::(sigtermHandler) Received signal 15 There are a couple of not really nice warnings in supervdsm.log when VDSM creates the management network (bridge expected too early -- looks harmless; libvirt network not there -- I don't like this) and also further on (sourceroutethread trying to add the same route over and over again). What puzzles me though is that lines such as the one below indicate some kind of DHCP activity. sourceRoute::DEBUG::2014-11-11 15:24:28,939::sourceroutethread::38::root::(process_IN_CLOSE_WRITE_filePath) Responding to DHCP response in /var/run/vdsm/sourceRoutes/1415737468 I CC'd Toni and Ido. Guys, can you see anything in the logs?
I mean, the last observation seems to contradict the connection loss.
Created attachment 956855 [details] engine log I didn't see anything fishy in the engine.log but here it is for completeness
(In reply to Ondřej Svoboda from comment #4) > Adam, > > did you lose connection after VDSM and superVDSM restarted (not > necessarily)? Could you look in engine logs? Unless I'm missing something, could this be the new soft-fencing in response to storage connection failures? > > supervdsm.log > MainThread::DEBUG::2014-11-11 > 14:19:31,399::supervdsmServer::451::SuperVdsm.Server::(main) Terminated > normally > > vdsm.log > MainThread::DEBUG::2014-11-11 > 14:19:26,600::vdsm::58::vds::(sigtermHandler) Received signal 15 > > There are a couple of not really nice warnings in supervdsm.log when VDSM > creates the management network (bridge expected too early -- looks harmless; > libvirt network not there -- I don't like this) and also further on > (sourceroutethread trying to add the same route over and over again). > > What puzzles me though is that lines such as the one below indicate some > kind of DHCP activity. > > sourceRoute::DEBUG::2014-11-11 > 15:24:28,939::sourceroutethread::38::root::(process_IN_CLOSE_WRITE_filePath) > Responding to DHCP response in /var/run/vdsm/sourceRoutes/1415737468 At this point I may have logged into the box and executed "dhclient ovirtmgmt" in order to rescue the connection.
After talking with Adam and asking him to try the patch for https://bugzilla.redhat.com/1142082 the issue has not happened again. Please Adam, if by Monday it still keeps the address mark this bug as duplicate of the one above.
Adam - Just to be on the safe side what OS did you run on your mini-dells ? Bug 1116004 was fixed for RHEL 7.1 and 7.0.z (Bug 1148345)
(In reply to Barak from comment #9) > Adam - Just to be on the safe side what OS did you run on your mini-dells ? I tried with CentOS 7 and Fedora 20. > Bug 1116004 was fixed for RHEL 7.1 and 7.0.z (Bug 1148345) *** This bug has been marked as a duplicate of bug 1116004 ***