Bug 689001

Summary: macvtap: nlComm function doesn't log certain errors, leading to "unspecified error" logs
Product: Red Hat Enterprise Linux 6 Reporter: Laine Stump <laine>
Component: libvirtAssignee: Laine Stump <laine>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.1CC: eblake, jdenemar, mjenner, yoyzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.8.7-14.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 13:29:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Laine Stump 2011-03-18 18:46:24 UTC
Callers of the function nlComm expect it to generate a log message if it fails, but on three different error conditions it will instead returns an error, but not log anything. If virsh calls something that results in this (eg starting a guest with a "type='direct'" interface that fails to create the required macvtap interface), it will be forced to explain the situation as "unspecified error".

A patch has already been pushed upstream to remedy this:

commit 12775d9491f0d98de6eb4593be4cacfaff1c4e47
Author: Laine Stump <laine>
Date:   Tue Mar 15 16:22:25 2011 -0400

macvtap: log an error if on failure to connect to netlink socket

Comment 2 Laine Stump 2011-03-18 19:15:33 UTC
rebased patch sent to rhvirt-patches:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-March/msg00463.html

Comment 4 Laine Stump 2011-03-21 15:29:06 UTC
That's a bit difficult, because the new code is only executed when there is an error opening a macvtap device.

"Fortunately", a current bug in libnl makes it possible (although not predictably) to get such an error - just modify a guest's domain interface config to specify the interface type as "direct", using the physical interface as its direct connection, start virt-manager running, then start/stop that guest several times using virsh.

<interface type='direct'>
   <mac address='.....'/>
  <source dev='eth0' mode='bridge'/>
</interface>

Eventually the guest will fail to start. Prior to this fix, virsh would report "unspecified error". With the fix it will report something like "cannot allocate nlhandle for netlink" (or some other message related to netlink).

Once the libnl bug is fixed, you will no longer have an easy way to verify this bug.

Comment 6 zhanghaiyan 2011-03-23 08:15:11 UTC
Verified this bug pass with libvirt-0.8.7-14.el6.x86_64 with linux guest
1. Modify a guest's domain interface, config to specify the interface type as "direct", using the physical interface as its direct connection
<interface type='direct'>
   <mac address='.....'/>
  <source dev='eth0' mode='bridge'/>
</interface>
2. Open virt-manager and connect to the hypervisor
3. # for i in `seq 1 100`; do virsh start rhel61_i386_11; virsh destroy rhel61_i386_11; done
Domain rhel61_i386_11 started
Domain rhel61_i386_11 destroyed
error: Failed to start domain rhel61_i386_11
error: cannot connect to netlink socket: Address already in use


Reproduced this bug with older package libvirt-0.8.7-13.el6.x86_64 with linux guest
1. Modify a guest's domain interface, config to specify the interface type as "direct", using the physical interface as its direct connection
<interface type='direct'>
   <mac address='.....'/>
  <source dev='eth0' mode='bridge'/>
</interface>
2. Open virt-manager and connect to the hypervisor
3. # for i in `seq 1 1000`; do virsh start rhel61_i386_11; virsh destroy rhel61_i386_11; done
Domain rhel61_i386_11 started
Domain rhel61_i386_11 destroyed
error: Failed to start domain rhel61_i386_11
error: Unknown failure

Comment 7 zhanghaiyan 2011-03-23 08:33:07 UTC
libnl-1.1-13.el6.x86_64 is used for the test in comment 6

Comment 11 errata-xmlrpc 2011-05-19 13:29:19 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html