Bug 677729

Summary: Random failures starting a guest with VEPA
Product: Red Hat Enterprise Linux 6 Reporter: Daniel Berrangé <berrange>
Component: libvirtAssignee: Laine Stump <laine>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.1CC: dallan, eblake, llim, nzhang, xen-maint, yoyzhang
Target Milestone: rcKeywords: TestOnly
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
The port allocation/de-allocation of the libnl library, which is used by libvirt for macvtap (for example, vepa and vnlink) interfaces, was not threadsafe and the logic was incorrect. This resulted in a failure to initialize libnl, and a subsequent failure of the associated libvirt functionality. In particular, the first guest vepa interface started on a host would work, but all subsequent vepa interfaces would fail. Port allocation/de-allocation and logic is now fixed in libnl, and the failures in libvirt no longer occur.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 13:27:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 677724, 677725    
Bug Blocks:    

Description Daniel Berrangé 2011-02-15 17:20:17 UTC
Description of problem:
From upstream list... when trying to start / stop a domain with macvtap device (direct type of interface) having a device description like this one here

<interface type='direct'>
<source dev='static' mode='vepa'/>
</interface>

then I see netlink related errors if there is a 'virsh edit' session that is happening at the same time.

http://www.redhat.com/archives/libvir-list/2011-February/msg00466.html

This problem is caused by two bugs in the libnl library, a logic bug, and a (possibly) thread safety bug

Version-Release number of selected component (if applicable):
0.8.7-6.el6

How reproducible:
Sometimes

Steps to Reproduce:
1.virsh edit <macvtap domain>       -> do not terminate the edit sessions
2. virsh start <macvtap domain>      -> works

3. virsh destroy <macvtap domain> -> leaves a macvtap device due to nl_connect failing

4. virsh start <macvtap domain>      -> does not start anymore


Actual results:


Expected results:


Additional info:

Comment 2 Laine Stump 2011-03-15 19:40:42 UTC
Note that Bug 677724 includes a patch to libnl, and also a comment stating that a libnl built with that patch solves the problem.

Comment 3 RHEL Program Management 2011-04-04 01:47:27 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 4 Dave Allan 2011-04-05 02:31:44 UTC
This BZ is TestOnly, and should have been marked exception, as the fix for 677724 is ON_QA, so this BZ is ready to test as well.

Comment 6 Nan Zhang 2011-04-14 07:02:42 UTC
Tested with libvirt-0.8.7-16.el6.x86_64, the guest can be normally started with VEPA. Move to VERIFIED.

# virsh edit foo

    <interface type='direct'>
      <mac address='52:54:00:00:71:13'/>
      <source dev='macvtap0' mode='vepa'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>

# virsh start foo
Domain foo started

# virsh destroy foo
Domain foo destroyed

# virsh start foo
Domain foo started

Comment 7 Laine Stump 2011-05-03 15:28:23 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
The port allocation/de-allocation of the libnl library, which is used by libvirt for macvtap (eg vepa and vnlink) interfaces, was not threadsafe and the logic was incorrect, which resulted in a failure to initialize libnl, and the subsequent failure of the associated libvirt functionality. In particular, the first guest vepa interface started on a host would work, but all subsequent vepa interfaces would fail. Port allocation/de-allocation and logic is now fixed in libnl, and the failures in libvirt no longer occur.

Comment 10 Laura Bailey 2011-05-04 02:34:18 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1 @@
-The port allocation/de-allocation of the libnl library, which is used by libvirt for macvtap (eg vepa and vnlink) interfaces, was not threadsafe and the logic was incorrect, which resulted in a failure to initialize libnl, and the subsequent failure of the associated libvirt functionality. In particular, the first guest vepa interface started on a host would work, but all subsequent vepa interfaces would fail. Port allocation/de-allocation and logic is now fixed in libnl, and the failures in libvirt no longer occur.+The port allocation/de-allocation of the libnl library, which is used by libvirt for macvtap (for example, vepa and vnlink) interfaces, was not threadsafe and the logic was incorrect. This resulted in a failure to initialize libnl, and a subsequent failure of the associated libvirt functionality. In particular, the first guest vepa interface started on a host would work, but all subsequent vepa interfaces would fail. Port allocation/de-allocation and logic is now fixed in libnl, and the failures in libvirt no longer occur.

Comment 11 errata-xmlrpc 2011-05-19 13:27:41 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html