1185501 – Sometimes cannot start/migrate vm with macvtap interface

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1185501 - Sometimes cannot start/migrate vm with macvtap interface

Summary: Sometimes cannot start/migrate vm with macvtap interface

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	6.5
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Laine Stump
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-01-23 23:25 UTC by Marina Kalinin
Modified:	2019-05-20 11:27 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-04-01 22:45:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
vm fex hook version (6.40 KB, text/x-python) 2015-01-27 00:50 UTC, Marina Kalinin	no flags	Details
vm fex hook for migration (3.97 KB, text/x-python) 2015-01-28 22:56 UTC, Marina Kalinin	no flags	Details
libvirtd.log from the new case - vm create scenario (13.71 MB, text/plain) 2015-02-25 23:27 UTC, Marina Kalinin	no flags	Details
new hook (6.29 KB, text/x-python) 2015-03-19 16:18 UTC, Marina Kalinin	no flags	Details
new hook readme file (1.47 KB, text/plain) 2015-03-19 16:21 UTC, Marina Kalinin	no flags	Details
libvirt qemu hook (212 bytes, application/x-shellscript) 2015-03-20 22:04 UTC, Marina Kalinin	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1186142	medium	CLOSED	Fail to Migrate with Bridged network, eth + macvtap ,with different interface name on two hosts	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1203367	medium	CLOSED	Fail to Migrate with Bridged network, eth + macvtap ,with different interface name on two hosts	2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution)	367293	None	None	None	Never
Red Hat Knowledge Base (Solution)	1299243	None	None	None	Never

Internal Links: 1186142 1203367

Description Marina Kalinin 2015-01-23 23:25:52 UTC

Description of problem:
Sometimes cannot start/migrate vm with macvtap interface.
Error in libvirtd.log:
~~~
error : virNetDevMacVLanCreate:180 : error creating macvtap type of interface: Invalid argument
~~~
Error in messages:
~~~
Jan 16 15:26:31  kernel: device eth25 entered promiscuous mode
Jan 16 15:26:31  kernel: ADDRCONF(NETDEV_UP): macvtap3: link is not ready
Jan 16 15:26:31  kernel: device eth25 left promiscuous mode
~~~

Version-Release number of selected component (if applicable):
RHEL 6.5
libvirt-0.10.2-29.el6_5.12
kvm: 0.12.1.2 - 2.415.el6_5.14
kernel: 2.6.32 - 431.29.2.el6.x86_64
vdsm-4.14.11-5.el6ev (all the VMs are part of RHEV environment 3.4.1).


How reproducible:
Not clear.
Sometimes migration / creation succeeds, some times it fails.


Additional info:
UCS blades used, with vmfex technology:
BIOS Information
        Vendor: Cisco Systems, Inc.
        Version: B200M3.2.2.2.0.042820141643
        Release Date: 04/28/2014

RHEV provides vmfex functionality via vmfex hook.

Errors in migration flow:
~~~
2015-01-16 20:26:31.120+0000: 8850: debug : virNetDevMacVLanCreateWithVPortProfile:840 : virNetDevMacVLanCreateWithVPortProfile: VM OPERATION: migrate in start
2015-01-16 20:26:31.170+0000: 8850: error : virNetDevMacVLanCreate:180 : error creating macvtap type of interface: Invalid argument
~~~

Errors in creation flow:
~~~
2015-01-16 17:27:44.786+0000: 8677: debug : virNetDevMacVLanCreateWithVPortProfile:840 : virNetDevMacVLanCreateWithVPortProfile: VM OPERATION: create
2015-01-16 17:27:44.805+0000: 8677: error : virNetDevMacVLanCreate:180 : error creating macvtap type of interface: Invalid argument
~~~

P.S. would be really great to have guest name in the error, so the logs would be more readable. This bug might be related to the logging:
https://bugzilla.redhat.com/show_bug.cgi?id=1052835

Comment 2 Marina Kalinin 2015-01-23 23:39:11 UTC

Additional information:
Those VMs mostly have 2 nics that require macvtap device.
Not sure it is related.

Especially, since sometimes it works.

Comment 4 Marina Kalinin 2015-01-23 23:41:32 UTC

Network definition in libvirt xml:
~~~
    <interface type='direct'>
      <mac address='00:2c:dd:1e:b8:f3'/>
      <source dev='eth25' mode='passthrough'/>
      <virtualport type='802.1Qbh'>
        <parameters profileid='PROF1'/>
      </virtualport>
      <target dev='macvtap12'/>
      <model type='virtio'/>
      <link state='up'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='direct'>
      <mac address='00:2c:dd:1e:b8:f6'/>
      <source dev='eth26' mode='passthrough'/>
      <virtualport type='802.1Qbh'>
        <parameters profileid='PROF2'/>
      </virtualport>
      <target dev='macvtap13'/>
      <model type='virtio'/>
      <link state='up'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </interface>
~~~

Comment 5 Martin Kletzander 2015-01-26 13:26:31 UTC

I don't think this is libvirt's fault.  I don't know about anything that we would do that could *sometimes* fail to start the network.  Quick googling suggests network cables, although I don't know whether that's applicable in this case.  Can't hurt to try, though.

Comment 6 Michal Privoznik 2015-01-26 14:53:07 UTC

Marina, looks like libnl bug to me? I mean, if kernel replies with Invalid argument, then somebody may have mangled the message sent to it. Does upgrading libnl and kernel help?

Comment 7 Marina Kalinin 2015-01-26 18:05:15 UTC

Hi guys, 
It is impossible to upgrade for the customer right now - multiple reasons: every upgrade would require scheduling process + I see what the customer has are the latest packages available for RHEL 6.5. And RHEV is not compatible with RHEL 6.6 yet.

Are there any other ways to troubleshoot this problem or pinpoint to some hardware problem?

Thank you!

Comment 8 Robb Manes 2015-01-26 19:30:14 UTC

These errors stand out to me:

TIME+DATE: error : virNetDevMacVLanCreate:180 : error creating macvtap type of interface: Invalid argument

I'm curious as to what the argument it's failing on is.  If we see how virNetDevMacVLanCreate() deals with it's arguments and where it prints the error:

int
virNetDevMacVLanCreate(const char *ifname,
                       const char *type,
                       const virMacAddrPtr macaddress,
                       const char *srcdev,
                       uint32_t macvlan_mode,
                       int *retry)
{
- - - - - - - 8< - - - - - - 
    unsigned char *recvbuf = NULL;
- - - - - - - 8< - - - - - - 
    if (virNetlinkCommand(nl_msg, &recvbuf, &recvbuflen, 0, 0,
                          NETLINK_ROUTE, 0) < 0) {
        goto cleanup;
    }

    if (recvbuflen < NLMSG_LENGTH(0) || recvbuf == NULL)
        goto malformed_resp;

    resp = (struct nlmsghdr *)recvbuf;
- - - - - - - 8< - - - - - - 
    switch (resp->nlmsg_type) {
    case NLMSG_ERROR:
        err = (struct nlmsgerr *)NLMSG_DATA(resp);
- - - - - - - 8< - - - - - - 
        default:
            virReportSystemError(-err->error,
                                 _("error creating %s type of interface"),
                                 type);
            goto cleanup;
        }

It's called only by virNetDevMacVLanCreateWithVPortProfile():

int virNetDevMacVLanCreateWithVPortProfile(const char *tgifname,
                                           const virMacAddrPtr macaddress,
                                           const char *linkdev,
                                           enum virNetDevMacVLanMode mode,
                                           bool withTap,
                                           int vnet_hdr,
                                           const unsigned char *vmuuid,
                                           virNetDevVPortProfilePtr virtPortProfile,
                                           char **res_ifname,
                                           enum virNetDevVPortProfileOp vmOp,
                                           char *stateDir,
                                           virNetDevBandwidthPtr bandwidth)
{
    const char *type = withTap ? "macvtap" : "macvlan";
- - - - - - - 8< - - - - - - 
    if (tgifname) {
- - - - - - - 8< - - - - - - 
        rc = virNetDevMacVLanCreate(tgifname, type, macaddress, linkdev,
                                    macvtapMode, &do_retry);
        if (rc < 0)
            return -1;

And the full path is likely:

qemuBuildCommandLine()
	--->qemuPhysIfaceConnect()
		--->virNetDevMacVLanCreateWithVPortProfile()
			--->virNetDevMacVLanCreate()

So at what point can an invalid argument to virNetDevMacVLanCreate() be supplied?  I have little knowledge of the hooks in this case; can they supply us with bad arguments in whatever configuration file we use that eventually are passed to virNetDevMacVLanCreate() via this or similar codepaths?

Short of adding an assert() and going through it in gdb upon crash&core or adding debug messages verbosely throughout, I can't think of a good way to figure out what argument was invalid.  Thoughts?

Comment 9 Laine Stump 2015-01-26 23:21:16 UTC

The EINVAL is the result of netlink sending an error message (nlmsg_type == NLMSG_ERR) in response to the request message sent by libvirt to create the macvtap device.

There are really only 3 arguments that would change from one invocation of virNetDevMacVLanCreate() to the next:

1) srcdev (the name of the physical device)
2) ifname (the name that we want the macvtap device to have,
3) mac address.

1) I'm guessing that the vmfex hooks must search for a free physical network device (that would be srcdev in the call to virNetDevMacVLanCreate()), otherwise each guest would need to have an specific physdev reserved on every host machine at all times in order to guarantee successful migration. Of course we know that srcdev exists, otherwise the call to virNetDevGetIndex() at the top of virNetDevMacVLanCreate() would fail.

One possibility: Could there be a race condition with these vmfex hooks, where vmfex picks device X for the guest A, but before that guest can actually use it, guest B asks for a device and is given the same device? Who can we ask about these hooks?

2) The name of the macvtap device is determined by iteratively attempting to create a device with the name "macvtap%d" where %d is 0 - 8192, until one is found that both doesn't already exist when checked, and also can be successfully created. During the retries, if any errno other than EEXIST is encountered (e.g. the EINVAL that we are getting here) the retries immediately cease and an error is returned.

3) mac address - this comes directly from the config. I would only expect this to be a problem if it was a bad mac address (e.g. had the multicast bit set) or was already in use by another interface on the same host (possibly the 2nd isn't even a problem to the driver)

My guess would be one of the following:

1) See the above query about the possibility of a race in the vmfex hooks that determines which physical device to use.

2) Possibly the host system has reached some limit in the number of macvtap devices that can be created.

3) Maybe the driver is returning EINVAL instead of EEXIST when there is an attempt to create a new macvtap device with the same name as one that already exists.

libvirt could be modified to be more verbose in the case of this error message, logging the physical device name and ifindex as well as the intended macvtap device name and mac address. That wouldn't solve the problem of course, but would hopefully either lead us toward a root cause or at least rule out some things.

Michael, can you provide any more details on what would cause an EINVAL response from a request to create a new macvtap device?

Comment 10 Marina Kalinin 2015-01-27 00:50:06 UTC

Created attachment 984464 [details]
vm fex hook version

Comment 11 Marina Kalinin 2015-01-27 01:15:44 UTC

Ok, finally I found the hook code. See attached 50_vmfex.
Basically, what the hook is doing is modifying vm's xml on vdsm level before transferring the xml to libvirt.

How does it do it?
It takes existing xml, as received by vdsm from RHEV-Manager and changes its networking definition for those mac addresses requested in the vm custom property. So, if such a mac address found in current vm xml, it will remove the bridge and will put direct-pool instead + add vmfex definitions for this mac address. To see an example, check either the code of the hook or comment 4.

The hook would also create a 'direct-pool' network in libvirt, with the list of free devices that can be used for vms. Free device is defined by zero mac address: '00:00:00:00:00:00'.

After xml is modified, it is transferred to libvirt to create the qemu-kvm process based on the definitions passed on the xml. Pure libvirt to me here. And if it is the same xml, then hook works fine, in my eyes. And the problem is beyond the hook. And the xml looks exact same on both source and destination hosts.

Re: mac address. It is allocated by RHEV-M per VM, ones the nic is created. The mac address is not changing and it stays with the VM throughout its life. I.e. during live migration, it keeps its mac address, and it should be allocated to some next free available device and associated with the profile id, defined on the vm customer property for this mac address.

In my understanding of RHEV side, there should be no race condition initiated by RHEV / hook. Since once the xml is formed, it is transferred to libvirt. However, there might be some race condition due to the fact that those vms have two nics that require vmfex hook. But I do not understand yet how libvirt handles that.

Also, I am looking for UCS machines in the lab to configure the reproducer.

Comment 12 Marina Kalinin 2015-01-27 01:21:07 UTC

This is ovirt wiki page that describes the hook:
http://wiki.ovirt.org/VDSM-Hooks/vmfex

Comment 13 Laine Stump 2015-01-27 07:53:32 UTC

Okay, now I understand what is done for vmfex better, thanks for digging up that info. It looks like it intends libvirt's network pools to be used to allocate a physdev on each host as the guest is migrated. Since I know that code has fairly simple locking, I no longer suspect a race at that level. However, I'm not convinced that is actually being used, since the xml that is logged during the migrate doesn't show the network pool created by the vmfex hook being used, but instead just shows the actual physdev by name (i.e. <interface type='direct'> <source dev='eth26' ....").

For a minute I thought I had found a smoking gun, as I believe the patches backported to RHEL6 for Bug 1064831 (adding network hooks to RHEL6), and in particular upstream commit 7d5bf484747979ce842fea9ae3ae673ab09bf935, would cause the domain status XML to be sent during migration rather than the domain config. 

However, that patch was backported to RHEL6 for 6.6, and wasn't present in 6.5, so doesn't apply here. (I did just file Bug 1186142 to assure the proper fix is made for that in RHEL6.6 and beyond though)

Still, I'm puzzled that the xml given to virDomainMigratePrepare3 has what appears to be the <interface> *status* rather than its config (i.e. it has "<interface type='direct'> <source dev='eth25' mode='passthrough'/> ..." rather than "<interface type='network'> <source network='direct-pool'/>...". - if that really is what is being sent to the migration destination, then the identical physdev would need to be free on the destination as is used on the source (rather than just grabbing any unused physdev from the pool), and in an active cluster that would lead to exactly the symptoms you're descirbing - migration sometimes works but sometimes fails.

Jiri, is this a red herring, or is it an actual problem? Is the xml shown in virDomainMigratePrepare3 the xml that is being sent from source to destination, or does it have some other use, and we really are sending "<interface type='network'> <source network='direct-pool'/>"?

Comment 14 Vlad Yasevich 2015-01-28 20:55:40 UTC

Adding some comments from the kernel side.  The kernel will return EINVAL is
there is an attempt to configure more then one macvtap on a pass-through device.
Looking at some other causes or EINVAL, they are unlikely in this scenario.
This lends a bit more weight to what Lain describes in Comment 13.  It is likely
the error coming from trying to configure a pass-through macvtap on a device
that already has macvtaps configured on it.

Comment 15 Marina Kalinin 2015-01-28 22:56:50 UTC

Created attachment 985407 [details]
vm fex hook for migration

Comment 17 Laine Stump 2015-02-02 22:23:58 UTC

Upstream libvirt already logs a more verbose error message - it tells the srcdev, which is almost surely what is causing the problem (see Vlad's Comment 14). Although the patch was made 2 years ago, it was in libvirt after release 1.0.1, so it's not in RHEL6. We could make a scratch build with that patch applied. but it seems unlikely that anyone would want to run a scratch build in a production system.

In the meantime, it may be useful to see the output of "ip -d link show" at the moment of failure, if that is possible - along with the XML collected (which should be showing the same srcdev as used by virNetDevMacVlanCreate()

Comment 18 Laine Stump 2015-02-02 22:25:03 UTC

I forgot to mention the commit id of the upstream patch that adds srcdev to the error log message: e11daa2b602e69708d86ada8f7167bf68ee102cd

Comment 19 Marina Kalinin 2015-02-02 23:38:19 UTC

Hi Laine,
Thank you for the update.
I am almost done setting the reproducer here.
Hopefully, it reproduces here. And then we can continue from there.

Comment 20 Marina Kalinin 2015-02-11 21:50:41 UTC

Hi Laine,

1. Libvirt version on my host is 0.10.2-46.el6_6.3.x86_64 (RHEL 6.5).
Libvirt version on customer site is 0.10.2-29.el6_5.12    (RHEL 6.5).
So, I do not think this is the case, described in the bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1186142
But maybe you know better.

2. Trying to reproduce locally, first the basic configuration, I see my VM xml on libvirt (created by vdsm after applying the hook changes) is (from virsh dumpxml):
~~~
    <interface type='bridge'>
      <mac address='00:1a:4a:9f:01:8b'/>
      <source bridge='rhevm'/>
      <bandwidth>
      </bandwidth>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
      <link state='up'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='direct'>
      <mac address='00:1a:4a:9f:01:99'/>
      <source dev='eth2' mode='passthrough'/>
      <virtualport type='802.1Qbh'>
        <parameters profileid='VMFEX_RHEV_pol'/>
      </virtualport>
      <target dev='macvtap0'/>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
      <link state='up'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </interface>
~~~
Which does not seem to match what the ovirt pages says about the hook:
http://www.ovirt.org/VDSM-Hooks/vmfex



However, in vdsm log I see this xml, that vdsm forms and transfers to libvirt, and it looks correct.
So, maybe it is something that libvirt does different these days?
~~~
                <interface type="bridge">
                        <address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci"/>
                        <mac address="00:1a:4a:9f:01:8b"/>
                        <model type="virtio"/>
                        <source bridge="rhevm"/>
                        <filterref filter="vdsm-no-mac-spoofing"/>
                        <link state="up"/>
                        <bandwidth/>
                </interface>
                <interface type="network">
                        <address bus="0x00" domain="0x0000" function="0x0" slot="0x04" type="pci"/>
                        <mac address="00:1a:4a:9f:01:99"/>
                        <model type="virtio"/>
                        <source network="direct-pool"/>
                        <filterref filter="vdsm-no-mac-spoofing"/>
                        <link state="up"/>
                <virtualport type="802.1Qbh"><parameters profileid="VMFEX_RHEV_pol"/></virtualport></interface>
~~~

This is network definition in libvirt on the host:
~~~
[root@cisco-b22m3-01 ~]# virsh -r net-list
Name                 State      Autostart     Persistent
--------------------------------------------------
;vdsmdummy;          active     no            no
direct-pool          active     yes           yes
vdsm-rhevm           active     yes           yes
 
[root@cisco-b22m3-01 ~]# virsh -r net-dumpxml direct-pool
<network connections='1'>
  <name>direct-pool</name>
  <uuid>16ae2055-7c3a-af75-b5a7-d9b1a44e2559</uuid>
  <forward dev='eth2' mode='passthrough'>
    <interface dev='eth2' connections='1'/>
    <interface dev='eth3'/>
    <interface dev='eth4'/>
    <interface dev='eth5'/>
    <interface dev='eth6'/>
    <interface dev='eth7'/>
    <interface dev='eth8'/>
    <interface dev='eth9'/>
    <interface dev='eth10'/>
    <interface dev='eth11'/>
  </forward>
</network>
~~~

Comment 23 Marina Kalinin 2015-02-25 23:27:47 UTC

Created attachment 995386 [details]
libvirtd.log from the new case - vm create scenario

Comment 26 Jiri Denemark 2015-03-06 10:21:38 UTC

(In reply to Laine Stump from comment #13)

> Jiri, is this a red herring, or is it an actual problem? Is the xml shown in
> virDomainMigratePrepare3 the xml that is being sent from source to
> destination, or does it have some other use, and we really are sending
> "<interface type='network'> <source network='direct-pool'/>"?

Sorry for such a long delay... yes, the XML logged by virDomainMigratePrepare is the one that is used to create the domain on the destination host. The XML is created by qemuMigrationBegin on the source. Do we generate the XML or was it provided as xmlin to virDomainMigrateBegin3?

Comment 27 Dan Kenigsberg 2015-03-07 23:04:00 UTC

Jiri, I'm not sure that I understand your question, but in 3.4, oVirt uses plain migrateToURI2(), with no manipulation of the underlying xml on source or destination, and no hooks.

https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=blob;f=vdsm/vm.py;h=3dccdf8ec884580fa3c71e226f00ee20bec068f9;hb=refs/heads/ovirt-3.4#l425

Comment 28 Jiri Denemark 2015-03-09 08:38:47 UTC

Yeah, that's what I was asking for :-) Thanks, it looks like a bug in libvirt then.

Comment 30 Laine Stump 2015-03-12 21:50:43 UTC

Unfortunately none of the above points at a culprit.

There are now at least three different versions of libvirt discussed here. Just to make sure I understand:

1) there are 2 different customers reporting the problem, and additionally Marina has a test rig setup (but *can't* reproduce the problem). All three are using RHEV with VMFEX and macvtap in passthrough mode. Here are the libvirt versions in use:

Customer 1: libvirt-0.10.2-29.el6_5.12
Marina: libvirt-0.10.2-46.el6_6.3
Customer 2: libvirt-0.10.2-46.el6_6.2

2) Customer 1 can't run a scratch build for testing. Customer 2 *can*.

3) Am I correct that in both customers they are only able to get a failure if a domain has 2 or more vmfex interfaces? Also - in your test setup it looks like the domain only has a single vmfex interface, while the other is a standard bridge connection (per the interface XML at the top of Comment 20).

Now two questions whose answers might provide useful info:

1) is it possible that this problem doesn't happen until/unless libvirtd is restarted on the destination host? (What comes to my mind is the possibility that libvirt could get confused about which physical devices are in use during a libvirtd restart. That is supposed to work properly, but if it didn't, it would lead to libvirt attempting to re-use a device that was already in use).

2) Once a target host has failed an incoming migration, is it no longer possible to even start a new guest on that host without first either migrating off some other guests, or rebooting the host? On a system that has failed migration, can you run:

virsh dumpxml direct-pool; ip -d link show

====
I will say that I wouldn't have expected any el6_6 build to properly migrate a domain that has <interface type='network'>, due to Bug 1186142. The problem is that during migration, rather than the configuration of the interface being sent to the destination (e.g. <interface type='network'> <source network='direct-pool'/>...), it would instead send the connection type that was actually used (in this case, type='direct'). The fact that the migration destination still shows type='network' was a bit confusing to me, but I think I understand why - the vmfex hook script in rhev is actually *throwing away* all of the interface config except for the MAC address! I *think* this means that the issue in Bug 1186142 is irrelevant.

With all that in mind, we *might* be able to catch something useful with a scratch build that outputs the following information whenever there is a failure:

1) the name of the macvtap device that we were trying to create
2) the name of the physical device it was going to be attached to
3) the output of "ip -d link" at exactly that time
4) a list of all active guest interfaces at that time, showing the macvtap and physical device names, the guest name, and whether or not it came from a network pool.
5) a running log of all allocate/release operations on the network pool

From this we should be able to see if there is an ethernet being used that hasn't been registered with libvirt.

Comment 34 Marina Kalinin 2015-03-16 15:50:31 UTC

Created attachment 1002390 [details]
logs with debug package

Comment 48 Marina Kalinin 2015-03-18 21:04:15 UTC

Created attachment 1003450 [details]
libvirt logs from destination host after test2 build

Comment 51 Marina Kalinin 2015-03-18 22:03:24 UTC

I see, Laine.
In our test we stopped only 2 guests.
Now this explains the failure.

Comment 52 Laine Stump 2015-03-19 14:30:10 UTC

Bug 1203367 is the 6.6.z BZ based on Bug 1186142, currently believed to be the root cause of this error.

Comment 53 Marina Kalinin 2015-03-19 16:18:44 UTC

Created attachment 1003893 [details]
new hook

Comment 54 Marina Kalinin 2015-03-19 16:21:33 UTC

Created attachment 1003896 [details]
new hook readme file

Comment 57 Laine Stump 2015-03-20 17:57:45 UTC

A libvirt qemu hook looking for this command:

  /etc/libvirt/hooks/qemu guest_name migrate begin -

should work (although I haven't tested it). Unless *all* of the interfaces in the cluster are vmfex interfaces (or at least none of them intentionally use <interface type='direct'>), it would need to be made intelligent enough to recognize which interfaces should be vmfex (based on MAC address, as far as I understand) and only modify those.

On the assumption that all guest interfaces *are* vmfex (or type='bridge'), and based on the xml that I see all the way back in Comment 4 (i.e. the type has been switched to 'direct' but the <virtualport> element is in place), it *should* be enough for the hook to do the following to the XML that is presented on stdin (with all results going to stdout):


   s/interface type='direct'/interface type='network'/
   s/source dev='eth/source network='direct-pool' dev='eth/

i.e. every line that is:

    <interface type='direct'>

should be changed to:

    <interface type='network'>

and every line that is:

    <source dev='ethXX' mode='passthrough'/>

should be changed to:

    <source network='direct-pool' dev='ethXX' mode='passthrough'/>

(the dev and mode attributes would be ignored, and are only being left in to make the filtering requirements simpler). Something like this in /etc/libvirt/hooks/qemu (and chmod +x I believe is also necessary) should do it:

   #!/bin/sh

   if [ "${2}" = "migrate" -a "${3}" = "begin" ]; then

     sed -e "s/interface type='direct'/interface type='network'/" \
         -e "s/source dev='eth/source network='direct-pool' dev='eth/"
   fi

If there are some guests that are *intentially* using interface mode='direct', then the script would need to be more intelligent.

Comment 58 Laine Stump 2015-03-20 18:28:22 UTC

...or I suppose you could use this as the sed line just to get rid of the dev='ethXX' mode='passthrough' for the sake of tidiness:

  sed -e "s/interface type='direct'/interface type='network'/" \
      -e "s/source dev='eth[0-9]*'/source network='direct-pool'/" \
      -e "s/ mode='passthrough'//"

(you have to be careful to not be too general with the "source" line, lest you end up modifying the source elements of unrelated devices)

Comment 59 Marina Kalinin 2015-03-20 19:57:33 UTC

And enlightenment regarding VMs running on RHEL 6.5, that still didn't have bz#1203367 introdcued.
Since the customer has mixed clusters of 6.5 and 6.6, the guests has been migrated between each other and got affected by 6.6 bug.
--> the libvirt hook workaround should be applicable for all cases then.

Comment 60 Marina Kalinin 2015-03-20 22:04:35 UTC

Created attachment 1004684 [details]
libvirt qemu hook

Comment 61 Marina Kalinin 2015-03-20 22:10:10 UTC

Good news: I installed latest libvirt build and tested the libvirt hook suggested by Laine and works.

I can see, that the VMs that didn't have correct network definition, after migration to a host with this hook, got correct network assignment for vmfex nics to a device from direct-pool.

Here are some print-outs on a destination host:
# rpm -q libvirt
libvirt-0.10.2-46.el6_6.4.x86_64
# rpm -q vdsm
vdsm-4.14.18-4.el6ev.x86_64

# virsh -r dumpxml vmfex_2 | grep -A8 'interface type='
    <interface type='direct'>
      <mac address='00:1a:4a:9f:01:8a'/>
      <source dev='eth4' mode='passthrough'/>
      <virtualport type='802.1Qbh'>
        <parameters profileid='VMFEX_portprofile_1'/>
      </virtualport>
      <target dev='macvtap2'/>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
--
    <interface type='direct'>
      <mac address='00:1a:4a:9f:01:8f'/>
      <source dev='eth5' mode='passthrough'/>
      <virtualport type='802.1Qbh'>
        <parameters profileid='VMFEX_portprofile_2'/>
      </virtualport>
      <target dev='macvtap3'/>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
--
    <interface type='bridge'>
      <mac address='00:1a:4a:9f:01:95'/>
      <source bridge='rhevm'/>
      <bandwidth>
      </bandwidth>
      <target dev='vnet1'/>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
      <link state='up'/>


[root@cisco-b22m3-01 hooks]# virsh -r net-dumpxml direct-pool
<network connections='4'>
  <name>direct-pool</name>
  <uuid>3123ee12-a10d-18a5-ab68-d6f564521445</uuid>
  <forward dev='eth2' mode='passthrough'>
    <interface dev='eth2' connections='1'/>
    <interface dev='eth3' connections='1'/>
    <interface dev='eth4' connections='1'/>
    <interface dev='eth5' connections='1'/>
    <interface dev='eth6'/>
    <interface dev='eth7'/>
    <interface dev='eth8'/>
    <interface dev='eth9'/>
    <interface dev='eth10'/>
    <interface dev='eth11'/>
  </forward>
</network>

Comment 62 Marina Kalinin 2015-03-20 22:13:54 UTC

I am sorry, the correct way to show VM's definition would be using --inactive argument. Here it is:

# virsh -r dumpxml vmfex_2 --inactive | grep -A8 'interface type='
    <interface type='network'>
      <mac address='00:1a:4a:9f:01:8a'/>
      <source network='direct-pool'/>
      <virtualport type='802.1Qbh'>
        <parameters profileid='VMFEX_portprofile_1'/>
      </virtualport>
      <target dev='macvtap2'/>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
--
    <interface type='network'>
      <mac address='00:1a:4a:9f:01:8f'/>
      <source network='direct-pool'/>
      <virtualport type='802.1Qbh'>
        <parameters profileid='VMFEX_portprofile_2'/>
      </virtualport>
      <target dev='macvtap3'/>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
--
    <interface type='bridge'>
      <mac address='00:1a:4a:9f:01:95'/>
      <source bridge='rhevm'/>
      <bandwidth>
      </bandwidth>
      <model type='virtio'/>
      <filterref filter='vdsm-no-mac-spoofing'/>
      <link state='up'/>
      <boot order='3'/>

Comment 69 Marina Kalinin 2015-03-23 22:28:10 UTC

Ok, it is probably a bug in net-dumpxml output.
It says total 21 connections, but reports actually 27.
And all the macvtap devices have assigned nic from the pool.

$ grep -o 'macvtap.*eth..' host02_ipd.out 
macvtap42@eth5:
macvtap43@eth6:
macvtap14@eth15
macvtap39@eth16
macvtap3@eth25
macvtap4@eth26
macvtap5@eth27
macvtap12@eth28
macvtap21@eth31
macvtap31@eth40
macvtap33@eth41
macvtap34@eth42
macvtap35@eth43
macvtap45@eth48
macvtap46@eth49
macvtap41@eth46
macvtap44@eth47
macvtap8@eth9:
macvtap9@eth10
macvtap51@eth54
macvtap52@eth55
macvtap15@eth22
macvtap16@eth29
macvtap22@eth7:
macvtap23@eth8:
macvtap7@eth11
macvtap10@eth12

total 27.

$ grep connections host02_netdump.out
<network connections='21'>
    <interface dev='eth5' connections='1'/>
    <interface dev='eth6' connections='1'/>
    <interface dev='eth7' connections='1'/>
    <interface dev='eth8' connections='1'/>
    <interface dev='eth9' connections='1'/>
    <interface dev='eth10' connections='1'/>
    <interface dev='eth11' connections='1'/>
    <interface dev='eth12' connections='1'/>
    <interface dev='eth15' connections='1'/>
    <interface dev='eth16' connections='1'/>
    <interface dev='eth22' connections='1'/>
    <interface dev='eth25' connections='1'/>
    <interface dev='eth26' connections='1'/>
    <interface dev='eth27' connections='1'/>
    <interface dev='eth28' connections='1'/>
    <interface dev='eth29' connections='1'/>
    <interface dev='eth31' connections='1'/>
    <interface dev='eth40' connections='1'/>
    <interface dev='eth41' connections='1'/>
    <interface dev='eth42' connections='1'/>
    <interface dev='eth43' connections='1'/>
    <interface dev='eth46' connections='1'/>
    <interface dev='eth47' connections='1'/>
    <interface dev='eth48' connections='1'/>
    <interface dev='eth49' connections='1'/>
    <interface dev='eth54' connections='1'/>
    <interface dev='eth55' connections='1'/>

total 28

Comment 72 Marina Kalinin 2015-04-01 22:45:07 UTC

Since both customer cases attached to the bug are closed and verified, I will close this bug too now.

Note You need to log in before you can comment on or make changes to this bug.