Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1162844

Summary: Network connectivity drops after vdsm runs
Product: [Retired] oVirt Reporter: Adam Litke <alitke>
Component: vdsmAssignee: Ido Barkan <ibarkan>
Status: CLOSED DUPLICATE QA Contact: Gil Klein <gklein>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.5CC: alitke, asegurap, bazulay, bugs, danken, ecohen, gklein, ibarkan, iheim, lsurette, mgoldboi, osvoboda, rbalakri, s.kieske, yeylon
Target Milestone: ---   
Target Release: 3.5.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: network
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-11-18 15:30:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
supervdsm.log
none
vdsm.log
none
engine log none

Description Adam Litke 2014-11-11 20:35:04 UTC
Description of problem: 
I've been experiencing peculiar and annoying networking behavior on my
oVirt development hosts and I'm hoping someone familiar with vdsm
networking configuration can help me get to the bottom of it.

My setup is two mini-Dells acting as virt hosts and ovirt engine
running on my laptop.  The dells get their network config from a
cobbler instance running on my laptop which also provides PXE
services.

After freshly installing the dells, I get a nice, stable network
connection.  After installing vdsm, the connection seems to drop
occasionally.  I have visit the machine, log into the console, and
execute 'dhclient ovirtmgmt'.  This fixes the problem again for
awhile.

Version-Release number of selected component (if applicable):
vdsm-4.16.0-522.git4a3768f.fc20.x86_64

How reproducible: For me, always (after a non-deterministic period of time elapses)

Steps to Reproduce:
1. Start vdsm
2. Use vdsm for awhile


Actual results:
The ovirtmgmt interface loses its IP address

Expected results:
Connectivity is not interrupted

Additional info:

Comment 1 Adam Litke 2014-11-11 20:35:38 UTC
Created attachment 956412 [details]
supervdsm.log

Comment 2 Adam Litke 2014-11-11 20:36:23 UTC
Created attachment 956413 [details]
vdsm.log

Comment 3 Adam Litke 2014-11-11 20:37:12 UTC
As suggested by Ondřej Svoboda, I tried disabling NetworkManager and that did not seem to resolve the problem.

Comment 4 Ondřej Svoboda 2014-11-11 22:16:03 UTC
Adam,

did you lose connection after VDSM and superVDSM restarted (not necessarily)? Could you look in engine logs?

supervdsm.log
  MainThread::DEBUG::2014-11-11 14:19:31,399::supervdsmServer::451::SuperVdsm.Server::(main) Terminated normally

vdsm.log
  MainThread::DEBUG::2014-11-11 14:19:26,600::vdsm::58::vds::(sigtermHandler) Received signal 15

There are a couple of not really nice warnings in supervdsm.log when VDSM creates the management network (bridge expected too early -- looks harmless; libvirt network not there -- I don't like this) and also further on (sourceroutethread trying to add the same route over and over again).

What puzzles me though is that lines such as the one below indicate some kind of DHCP activity.

  sourceRoute::DEBUG::2014-11-11 15:24:28,939::sourceroutethread::38::root::(process_IN_CLOSE_WRITE_filePath) Responding to DHCP response in /var/run/vdsm/sourceRoutes/1415737468

I CC'd Toni and Ido. Guys, can you see anything in the logs?

Comment 5 Ondřej Svoboda 2014-11-11 22:18:31 UTC
I mean, the last observation seems to contradict the connection loss.

Comment 6 Adam Litke 2014-11-12 19:16:30 UTC
Created attachment 956855 [details]
engine log

I didn't see anything fishy in the engine.log but here it is for completeness

Comment 7 Adam Litke 2014-11-12 19:18:37 UTC
(In reply to Ondřej Svoboda from comment #4)
> Adam,
> 
> did you lose connection after VDSM and superVDSM restarted (not
> necessarily)? Could you look in engine logs?

Unless I'm missing something, could this be the new soft-fencing in response to storage connection failures?

> 
> supervdsm.log
>   MainThread::DEBUG::2014-11-11
> 14:19:31,399::supervdsmServer::451::SuperVdsm.Server::(main) Terminated
> normally
> 
> vdsm.log
>   MainThread::DEBUG::2014-11-11
> 14:19:26,600::vdsm::58::vds::(sigtermHandler) Received signal 15
> 
> There are a couple of not really nice warnings in supervdsm.log when VDSM
> creates the management network (bridge expected too early -- looks harmless;
> libvirt network not there -- I don't like this) and also further on
> (sourceroutethread trying to add the same route over and over again).
> 
> What puzzles me though is that lines such as the one below indicate some
> kind of DHCP activity.
> 
>   sourceRoute::DEBUG::2014-11-11
> 15:24:28,939::sourceroutethread::38::root::(process_IN_CLOSE_WRITE_filePath)
> Responding to DHCP response in /var/run/vdsm/sourceRoutes/1415737468

At this point I may have logged into the box and executed "dhclient ovirtmgmt" in order to rescue the connection.

Comment 8 Antoni Segura Puimedon 2014-11-14 23:59:19 UTC
After talking with Adam and asking him to try the patch for https://bugzilla.redhat.com/1142082 the issue has not happened again. Please Adam, if by Monday it still keeps the address mark this bug as duplicate of the one above.

Comment 9 Barak 2014-11-16 13:17:52 UTC
Adam - Just to be on the safe side what OS did you run on your mini-dells ?

Bug 1116004 was fixed for RHEL 7.1 and 7.0.z (Bug 1148345)

Comment 10 Adam Litke 2014-11-18 15:30:11 UTC
(In reply to Barak from comment #9)
> Adam - Just to be on the safe side what OS did you run on your mini-dells ?

I tried with CentOS 7 and Fedora 20.

> Bug 1116004 was fixed for RHEL 7.1 and 7.0.z (Bug 1148345)

*** This bug has been marked as a duplicate of bug 1116004 ***