Bug 961920

Summary: Software Development Workstation installed in VM does not set up networking correctly
Product: Red Hat Enterprise Linux 6 Reporter: Gary Benson <gbenson>
Component: libvirtAssignee: Laine Stump <laine>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.3CC: acathrow, cwei, dyuan, honzhang, jdenemar, jiahu, mzhan, ydu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-03-31 14:55:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gary Benson 2013-05-10 18:31:26 UTC
Description of problem:
If you install RHEL6.3 with the Software Development Workstation option in a VM with the virtual network set to the default 192.168.122.0/24 then networking will not work during firstboot.  Further, if you correct that problem in the most obvious way, networking will sometimes still fail on boot.

The root cause is a race between NetworkManager bringing up eth0 and libvirt bringing up virbr0.  If eth0 comes up first then libvirt will see a device using the network it wants and will not start virbr0.  If eth0 is not running when libvirt starts then virbr0 will be started and eth0 will not work properly when it comes up.

For some reason it seems to always fail on firstboot, so you cannot register the VM with RHN.  If you log in after firstboot and edit the "System eth0" connection in NetworkManager's GUI you will see that the "Connect automatically" box is unchecked.   Checking that box and rebooting "fixes" the installation, but there will still be times (maybe 1 in 10 in my experience) when libvirt "wins" the race and sets up virbr0.

Version-Release number of selected component (if applicable):
Tested with RHEL 6.3 with the Software Development Workstation packageset, but will likely fail with any RHEL or Fedora running virtualization, and any guest OS that tries to set up virtual networking on 192.168.122.0.

How reproducible:
Always

Steps to Reproduce:
1. Install RHEL6.3 with the Software Development Workstation packageset.
2. Log in and open Virtual Machine Manager.
3. Install RHEL6.3 with the Software Development Workstation packageset on a guest.
  
Actual results:
Networking in the guest broken.

Expected results:
Networking in the guest working fine.

Additional info:
I don't know that this problem has an easy fix.  The neatest fix that the user can do is to change the network away from the default on the host, so maybe documentation changes and/or some kind of alert if the user opens VMM with the network set to the default.  Another option might be some way for NetworkManager to communicate that it is still setting up interfaces so that libvirt could wait and see.  A third option might be for the installer to detect that it is running in a VM and either not install or not start the libvirt service in the guest.  That way users who want libvirt running in the guest can enable it themselves.  I'm not sure you'd want behaviour to change in that way though (I kind of think that identical installs for real and virtual hardware would be desirable).

Comment 2 Laine Stump 2013-05-14 14:39:23 UTC
This problem has a very long history, beginning with Bug 235961 (even earlier on IRC and the mailing list, but that's more difficult to search for).

The patch applied in response to Bug 235961 really does about as much as can be done within libvirtd at runtime to reduce the frequency of the problem. There is unfortunately no reasonable way for libvirt to automatically change its behavior at runtime to avoid this problem 100% of the time, though. At the bottom of the issue is the fact that NetworkManager starts up its interface asynchronously (which makes sense, because otherwise many startup tasks that don't depend on networking would be stalled waiting for a dhcp response); libvirt checks for a network collision when it starts up each interface, but the guest's eth0 may or may not have its IP address at the time libvirt starts - if it has its address, libvirt will fail starting up its virtual network, if not then it will succeed.


Note that the "default" network installed as part of libvirt is now in an optional sub-package: libvirt-daemon-config-network.

Simple solution: remove the package libvirt-daemon-config-network (in the *guest*) or run "virsh net-edit and change the subnet being used by libvirt to something else (e.g. 192.168.123.0).

(I wouldn't recommend eliminating this package from the manifest of a normal libvirt install though - having guest networking "just work" straight out of the box is too much of a usability win to break it for everyone).

Possibly more useful solution: if the installation script for libvirt-daemon-config-network could reliably detect that it is running in a guest (some sort of detection that would work on hardware nested-virt would be necessary), it could setup the default network with 192.168.123.0 instead of 192.168.122.0. This would *still* lead to problems in the case where someone had coincidentally also changed the default network config on the host to that address, but the likelyhood of that is even lower.

Any other ideas?

Comment 3 Gary Benson 2013-05-21 10:43:38 UTC
Could Anaconda detect that it is running in a guest and not install the libvirt-daemon-config-network package?  This would mean guest networking worked out of the box for everyone except people doing nested-virt.  They would need to perform an extra step, but here the extra step is required of people who probably know what they are doing (as opposed to people who may not know or care about virtualization to that level).

Comment 9 Laine Stump 2013-07-02 04:28:43 UTC
Pushing this out to RHEL6.6 due to lack of capacity and unclear road to a solution.

Comment 12 Jiri Denemark 2014-03-31 14:55:13 UTC

*** This bug has been marked as a duplicate of bug 956891 ***