1837395 – nwfilter fails trying to bind/unbind rules for tap device that was/is in use by a different domain

Bug 1837395 - nwfilter fails trying to bind/unbind rules for tap device that was/is in use by a different domain

Summary: nwfilter fails trying to bind/unbind rules for tap device that was/is in use ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	8.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	8.3
Assignee:	Laine Stump
QA Contact:	yalzhang@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-05-19 12:38 UTC by David Vaanunu
Modified:	2021-10-25 08:00 UTC (History)
CC List:	14 users (show)
Fixed In Version:	libvirt-6.6.0-5.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1874304 1874308 (view as bug list)
Environment:
Last Closed:	2020-11-17 17:48:34 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description David Vaanunu 2020-05-19 12:38:09 UTC

Description of problem:
While running VM migrate, few were failed.
Running 60Hrs test, 1200 iterations, 2 failures

Version-Release number of selected component (if applicable):

redhat-release-8.2-1.0.el8.x86_64
rhv-release-4.4.0-31-001.noarch
vdsm-4.40.13-1.el8ev.x86_64



How reproducible:

Running JMETER script, 1  thread

Steps to Reproduce:
1. Select VM
2. Select Host (Randomly)
3. Start Migrate

Actual results:

Have VM migrate failures

Expected results:

No Errors


Additional info:

vdsm.log.61: Log TimeStamp - 2020-05-15 01:46:05,809+0000 (vm_214)
vdsm.log.12: Log TimeStamp - 2020-05-17 02:34:15,550+0000 (vm_219)
MigrationErrors_4.4 - Engine ERRORS


https://drive.google.com/open?id=1Zj0UXSgpug61Dm3Rta_A88lIKJbN01sd

Comment 1 Ryan Barry 2020-05-20 01:13:53 UTC

Possibly worth a look, but a clean reproducer may be hard if it only happens .16% of the time

Comment 2 Milan Zamazal 2020-05-20 08:13:24 UTC

I can see the following errors in the log:

  libvirt.libvirtError: operation failed: binding 'vnet54' is already being removed

  libvirt.libvirtError: operation failed: binding 'vnet43' already exists

It looks like a race in networking code. The migrations don't seem to be initiated too fast after each other so it may be a real bug.

David, could you please provide the corresponding logs from the destinations so that we can see more info about the errors? (It's also often helpful to have DEBUG log level enabled.)

Comment 3 David Vaanunu 2020-05-20 15:10:11 UTC


Add vdsm.log.84.xz file. (Dest host for vm_219)

https://drive.google.com/open?id=1Zj0UXSgpug61Dm3Rta_A88lIKJbN01sd

Comment 4 Milan Zamazal 2020-05-20 16:58:45 UTC

Thank you, David. I can't see a migration error on the destination. There is VM.migrationCreate call and then these messages follow:

  2020-05-17 02:34:09,968+0000 INFO  (dhclient-monitor) [root] Received DHCP response: {'action': 'configure', 'iface': 'vnet54'} (dhclient_monitor:129)
  2020-05-17 02:34:09,970+0000 INFO  (dhclient-monitor) [root] Interface vnet54 is not a libvirt interface (dhclient_monitor:134)

This matches the message on the source:

  2020-05-17 02:34:11,530+0000 ERROR (migsrc/623f6cc1) [virt.vm] (vmId='623f6cc1-d49f-43a4-8170-b5fec01f2897') operation failed: binding 'vnet54' is already being removed (migration:278)

And the VM is destroyed from the source:

  2020-05-17 02:34:11,546+0000 INFO  (jsonrpc/7) [api.virt] START destroy(gracefulAttempts=1) from=::ffff:SOURCE-IP-ADDRESS,48984, vmId=623f6cc1-d49f-43a4-8170-b5fec01f2897 (api:48)

And the destination reports the VM is already gone:

  2020-05-17 02:34:13,422+0000 WARN  (vm/623f6cc1) [virt.vm] (vmId='623f6cc1-d49f-43a4-8170-b5fec01f2897') Couldn't destroy incoming VM: Domain not found: no domain with matching uuid '623f6cc1-d49f-43a4-8170-b5fec01f2897' (vm:3673)

The "is already being removed" message is apparently related to device removal in _dhcp_response_handler, which happens after the vnet54 INFO messages above.

Since I can't see any direct migration failure in the destination log, it looks like the migration was stopped by libvirt or QEMU rather than Vdsm. I don't know whether the vnet54 problems started in libvirt, caused the migration failure due to a bug there and made the dhclient-monitor confusion; or whether dhclient-monitor took an action at a wrong time (incoming migration initiation) and thus caused the device error in libvirt and migration failure. Aleš, could you please take a look?

Comment 5 Ales Musil 2020-05-25 06:16:00 UTC

(In reply to Milan Zamazal from comment #4)
> Thank you, David. I can't see a migration error on the destination. There is
> VM.migrationCreate call and then these messages follow:
> 
>   2020-05-17 02:34:09,968+0000 INFO  (dhclient-monitor) [root] Received DHCP
> response: {'action': 'configure', 'iface': 'vnet54'} (dhclient_monitor:129)
>   2020-05-17 02:34:09,970+0000 INFO  (dhclient-monitor) [root] Interface
> vnet54 is not a libvirt interface (dhclient_monitor:134)

This message might be confusing but it just indicates that the said interface was not configured for DHCP monitoring that's all. 
You can see the same message appearing multiple times during successful migrations e.g.

2020-05-17 02:06:15,461+0000 INFO  (dhclient-monitor) [root] Received DHCP response: {'action': 'configure', 'iface': 'vnet128'} (dhclient_monitor:129)
2020-05-17 02:06:15,462+0000 INFO  (dhclient-monitor) [root] Interface vnet128 is not a libvirt interface (dhclient_monitor:134)

> 
> This matches the message on the source:
> 
>   2020-05-17 02:34:11,530+0000 ERROR (migsrc/623f6cc1) [virt.vm]
> (vmId='623f6cc1-d49f-43a4-8170-b5fec01f2897') operation failed: binding
> 'vnet54' is already being removed (migration:278)
> 
> And the VM is destroyed from the source:
> 
>   2020-05-17 02:34:11,546+0000 INFO  (jsonrpc/7) [api.virt] START
> destroy(gracefulAttempts=1) from=::ffff:SOURCE-IP-ADDRESS,48984,
> vmId=623f6cc1-d49f-43a4-8170-b5fec01f2897 (api:48)
> 
> And the destination reports the VM is already gone:
> 
>   2020-05-17 02:34:13,422+0000 WARN  (vm/623f6cc1) [virt.vm]
> (vmId='623f6cc1-d49f-43a4-8170-b5fec01f2897') Couldn't destroy incoming VM:
> Domain not found: no domain with matching uuid
> '623f6cc1-d49f-43a4-8170-b5fec01f2897' (vm:3673)
> 
> The "is already being removed" message is apparently related to device
> removal in _dhcp_response_handler, which happens after the vnet54 INFO
> messages above.

The handler just removes file that we are using for tracking which in this case does not exist. 
AFAIK network code should not touch the libvirt tap device that is connected to VM.


> 
> Since I can't see any direct migration failure in the destination log, it
> looks like the migration was stopped by libvirt or QEMU rather than Vdsm. I
> don't know whether the vnet54 problems started in libvirt, caused the
> migration failure due to a bug there and made the dhclient-monitor
> confusion; or whether dhclient-monitor took an action at a wrong time
> (incoming migration initiation) and thus caused the device error in libvirt
> and migration failure. Aleš, could you please take a look?

IMO it seems like libivirt error because I can't think of any way how would we remove the tap device. 
Anyway David can you please include supervdsm.log around the time of the error. Just to be sure.

Comment 6 David Vaanunu 2020-05-25 09:10:41 UTC

Source Host - supervdsm.log.9.xz  (F01-H15)
Dest Host - supervdsm.log.8.xz (F02-H19)

Comment 7 Ales Musil 2020-05-25 09:52:26 UTC

Nothing relevant in the supervdsm.log that indicates even more some failure in libvirt probably. Milane WDYT?

Comment 8 Milan Zamazal 2020-05-25 10:39:34 UTC

Thank you Aleš for looking. As you describe it, we should report it to libvirt. Since we don't have a reproducer or another proof to file a bug, a question on the libvirt users mailing list may be a better option. Would you like to ask there as you are familiar with the networking stuff and what is this all about?

Another option might be to try to reproduce the bug with libvirt debug logging enabled (which we may be asked for anyway). Then we would have info for us and the libvirt developers about what's going on exactly. But I'm not sure it would be feasible and easy enough.

Comment 9 Ales Musil 2020-05-27 08:18:48 UTC

If it can be reproduced it is IMO good idea to collect the logs. It might gave us additional hint what to look for.

Comment 10 Milan Zamazal 2020-05-27 08:32:20 UTC

David, is there a chance to reproduce the bug with libvirt debug logs enabled?

Comment 12 David Vaanunu 2020-06-16 07:50:26 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1837395#c11

Comment 13 Milan Zamazal 2020-06-16 15:27:14 UTC

Hm, Aleš, what do you think? Can you ask on the libvirt users list even without additional information? If not, I'm afraid we can't do anything about the bug due to insufficient information.

Comment 14 Laine Stump 2020-06-23 19:38:47 UTC

Ales asked on libvirt-users list, which brought me here, so I'm clearing the needinfo you set for him.

>IMO it seems like libivirt error because I can't think of any way how would we remove the tap device. 

Actually libvirt doesn't remove the tap device either, that is done automatically by the kernel when the QEMU process exits.

But I believe you are correct that this is a libvirt problem - these libvirt errors:

  libvirt.libvirtError: operation failed: binding 'vnet54' is already being removed
  libvirt.libvirtError: operation failed: binding 'vnet43' already exists

are produced by libvirt's nwfilter driver. It sounds like another race similar to what's described in this email I wrote last week:

  
  https://www.redhat.com/archives/libvir-list/2020-June/msg00681.html

In the case described by the person starting that thread, the problem was that libvirt was removing a port from an OpenvSwitch switch when a guest was shutting down, but another guest had already been started and was given a tap device with the same name as the just-deleted tap device. (the tap device name is determined by the kernel, which just finds the lowest numbered unused name based on a pattern provided by libvirt. The problem is that the kernel always gives the lowest available number.

Your problem is an extension of that - the nwfilter rules are also instantiated based on the name of the tap device, and are torn down after the tap device has been removed.

This is even more evidence that we need to follow Dan Berrangé's suggestion (somewhere in that thread) of not relying on the kernel to assign tap device names, but to instead provide the names ourselves and base it on a monotonically increasing counter (with a reasonably large wrap-around to avoid overflow of the netdev name maximum length of 15). This will eliminate the possibility of immediate re-use (except for some extremely dysfunctional circumstance), and thus also eliminate the race condition.

I'm switching this BZ to rhe RHEL-AV libvirt component and re-assigning to me (since I had coincidentally already put this exact task on my personal todo list foer the week). If the product needs adjusting to fit what you're using in RHV, feel free to reset it to whatever is appropriate.

Comment 15 Laine Stump 2020-08-31 14:13:02 UTC

These patches upstream should resolve the issue. They're waiting for libvirt release freeze to be over, and then will be pushed:

https://www.redhat.com/archives/libvir-list/2020-August/msg00962.html

Comment 16 Laine Stump 2020-09-13 17:45:36 UTC

These two patches were pushed upstream to fix the problem. They are included in libvirt-6.8.0

commit d7f38beb2ee072f1f19bb91fbafc9182ce9b069e
Author: Laine Stump <laine>
Date:   Sun Aug 23 14:57:19 2020 -0400

    util: replace macvtap name reservation bitmap with a simple counter

Author: Laine Stump <laine>
Date:   Sun Aug 23 21:20:13 2020 -0400

    util: assign tap device names using a monotonically increasing integer

Comment 17 Laine Stump 2020-09-13 18:36:25 UTC

Accidentally omitted the commit id of the 2nd patch:

commit 95089f481e003d971fe0a082018216c58c1b80e5
Author: Laine Stump <laine>
Date:   Sun Aug 23 21:20:13 2020 -0400

    util: assign tap device names using a monotonically increasing integer

Comment 19 Laine Stump 2020-09-14 19:56:47 UTC

As pointed out by jtomko in Bug 1874304, this patch (fixing an improper initialization I had made in 95089f481e003 is also needed. 

commit 2b6cd855042984b87beb7e3c30b67b0f586d89bb
Author:     Ján Tomko <jtomko>
CommitDate: 2020-09-14 13:02:56 +0200

    util: virNetDevTapCreate: initialize fd to -1

Comment 22 yalzhang@redhat.com 2020-09-16 03:26:50 UTC

Hi laine, test on libvirt-6.6.0-5.module+el8.3.0+8092+f9e72d7e.x86_64, it introduced another issue, the vnet* name is kept somewhere and not flushed. Only a libvirtd restart will clear the record. If this is expected?

# virsh start rhel; ifconfig | grep vnet ; virsh destroy rhel; ifconfig | grep vnet ;  virsh start rhel; ifconfig | grep vnet ; virsh destroy rhel; ifconfig | grep vnet ;  virsh start rhel; ifconfig | grep vnet ;
Domain rhel started

vnet0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
Domain rhel destroyed

Domain rhel started

***vnet1***: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
Domain rhel destroyed

Domain rhel started

***vnet2***: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500

Before this bug fix, it should be 'vnet0'

Comment 23 Laine Stump 2020-09-16 14:51:27 UTC

I don't understand your question. What do you mean by "not flushed"? Where do you continue to see vnet0?

What should happen is that each time a new guest is started (this includes the case of re-starting a guest that had previously been started and then shutdown), a new and different tap device name is used, and that name will never be re-used as long as libvirtd is running. If libvirtd is restarted, it will note the tap device names in use by all guests when it reconnects to the guests at startup, and will continue allocating new tap devices at the index of "highest numbered device + 1".

So if you restart libvirtd when many guests have started and stopped, and there are some devices in use, with the higher numbered device currently in use being, e.g. "vnet2753", then the next new tao device will be named "vnet2754".

If you happen to shut down *all* running guests, and then restart libvirtd, since it won't find any tap devices in use, the counter will again start at 0.

Are you seeing behavior different from this?


(Looking back after writing this - are you concerned by the fact that when you shutdown your guest that had been using vnet0, and then start it again, that it now is using a differently-named tap device? If so, then that is the desired behavior.)

Comment 24 yalzhang@redhat.com 2020-09-17 05:39:24 UTC

Hi laine, Thank you for the explanation, it's very clear. I should read the comment before asking questions.

I can reproduce the error on libvirt-6.6.0-4.module+el8.3.0+7883+3d717aa8.x86_64 with the steps as below:

1. Prepare 2 guests with 2 interfaces: one nat interface with nwfilter; another interface connected to ovs-bridge;

2. On terminal one, start first guest rh; wait for a while; then execute destroy "# virsh destroy rh" at the same time start 2nd guest in another terminal; it will fail as:

termianl 1 # virsh destroy rh
Domain rh destroyed

terminal 2 # time virsh start rhel
error: Failed to start domain rhel
error: operation failed: binding 'vnet1' is already being removed

real	0m10.518s
user	0m0.014s
sys	0m0.011s

# grep -i error /var/log/libvirt/libvirtd.log | grep -v info
2020-09-17 05:36:43.824+0000: 106837: error : virNWFilterBindingObjListAddLocked:179 : operation failed: binding 'vnet1' is already being removed
2020-09-17 05:36:53.947+0000: 106837: error : virNetDevVPortProfileOpCommon:1002 : internal error: port-profile setlink timed out

Comment 25 Laine Stump 2020-09-17 14:29:30 UTC

Are you saying that the first guest is using vnet1, and the 2nd guest attempts to also use vnet1? And this is with a build that supposedly has the fix? That should not be possible. Can you verify you're running the newer build?

Comment 26 yalzhang@redhat.com 2020-09-18 01:52:49 UTC

(In reply to Laine Stump from comment #25)
> Are you saying that the first guest is using vnet1, and the 2nd guest
> attempts to also use vnet1? And this is with a build that supposedly has the
> fix? That should not be possible. Can you verify you're running the newer
> build?

comment 24 is to reproduce the bug on libvirt-6.6.0-4.module+el8.3.0+7883+3d717aa8.x86_64 without the fix. The result is as expected. I have tested on libvirt-6.6.0-5.el8 with the same steps, the issue is fixed and no error messages any more. Set the bug to be verified.

Comment 27 yalzhang@redhat.com 2020-09-18 13:05:19 UTC

Add other test scenarios:

1. Create tap device and macvtap device by ip command, then start vm with nat or macvtap interface, or hotplug interfaces to guest - PASS
2. Restart libvirtd and check the the tap/macvtap device name will begin from vnet0 and macvtap0 - PASS

details:
# systemctl restart libvirtd
# virsh dumpxml rhel | grep /interface -B9
 <interface type='network'>
      <mac address='52:54:00:f3:41:15'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:3c:4b:df'/>
      <source network='direct'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x0d' slot='0x00' function='0x0'/>
    </interface>
    <interface type='direct'>
      <mac address='52:54:00:f2:d7:18'/>
      <source dev='eno1' mode='vepa'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x0e' slot='0x00' function='0x0'/>
    </interface>

# ip tuntap add dev vnet0 mod tap
# ip tuntap add dev vnet2 mod tap
#  ip link add link eno1 macvtap0 type macvtap
#  ip link add link eno1 macvtap2 type macvtap
# virsh start rhel
Domain rhel started

# virsh domiflist rhel
 Interface   Type      Source    Model    MAC
-------------------------------------------------------------
 vnet1       network   default   virtio   52:54:00:f3:41:15
 macvtap1    direct    direct    virtio   52:54:00:3c:4b:df
 macvtap3    direct    eno1      virtio   52:54:00:f2:d7:18

# virsh attach-interface rhel network default 
Interface attached successfully
# virsh attach-interface rhel direct eno1  
Interface attached successfully

# virsh domiflist rhel
 Interface   Type      Source    Model     MAC
--------------------------------------------------------------
 vnet1       network   default   virtio    52:54:00:f3:41:15
 macvtap1    direct    direct    virtio    52:54:00:3c:4b:df
 macvtap3    direct    eno1      virtio    52:54:00:f2:d7:18
 vnet3       network   default   rtl8139   52:54:00:3a:6f:95
 macvtap4    direct    eno1      rtl8139   52:54:00:45:23:ec

# virsh destroy rhel
virsh sDomain rhel destroyed
# virsh start rhel
Domain rhel started

# virsh domiflist rhel
 Interface   Type      Source    Model    MAC
-------------------------------------------------------------
 vnet4       network   default   virtio   52:54:00:f3:41:15
 macvtap5    direct    direct    virtio   52:54:00:3c:4b:df
 macvtap6    direct    eno1      virtio   52:54:00:f2:d7:18

# virsh destroy rhel; 
# systemctl restart libvirtd
delete all the pre-created vnet* and macvtap*
# virsh start rhel
#virsh domiflist rhel
 Interface   Type      Source    Model    MAC
-------------------------------------------------------------
 vnet0       network   default   virtio   52:54:00:f3:41:15
 macvtap0    direct    direct    virtio   52:54:00:3c:4b:df
 macvtap1    direct    eno1      virtio   52:54:00:f2:d7:18

the result is as expected.

Comment 30 errata-xmlrpc 2020-11-17 17:48:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5137

Note You need to log in before you can comment on or make changes to this bug.