Hide Forgot
Description Deadlock on nwfilter when taking same concurrent jobs, when define/undefine nwfilter and hot-attach/dettach that nwfilter to domain, it's easy to deadlock Version: libvirt-1.2.8-2.el7.x86_64 qemu-kvm-rhev-2.1.0-3.el7.x86_64 kernel-3.10.0-123.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Prepare below interface and nwfilter xml [root@ibm-x3850x5-06 ~]# cat nic.xml <interface type='network'> <mac address='02:54:00:36:c6:d0'/> <source network='default'/> <target dev='jianguo'/> <model type='virtio'/> <filterref filter='clean-traffic'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> [root@ibm-x3850x5-06 ~]# cat nic1.xml <interface type='network'> <mac address='02:54:00:36:c6:d0'/> <source network='default'/> <target dev='jianguo'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> [root@ibm-x3850x5-06 ~]# cat clean-traffic.xml <filter name='clean-traffic' chain='root'> <uuid>f3d9b618-9097-4b37-86a7-e804066e7fbe</uuid> <filterref filter='no-mac-spoofing'/> <filterref filter='no-ip-spoofing'/> <rule action='accept' direction='out' priority='-650'> <mac protocolid='ipv4'/> </rule> <filterref filter='allow-incoming-ipv4'/> <filterref filter='no-arp-spoofing'/> <rule action='accept' direction='inout' priority='-500'> <mac protocolid='arp'/> </rule> <filterref filter='no-other-l2-traffic'/> <filterref filter='qemu-announce-self'/> </filter> 2. Execute below two shells in two terminals at the same time. [root@ibm-x3850x5-06 ~]# cat update-nwfilter.sh #! /bin/sh - while true do virsh update-device r7 nic1.xml virsh update-device r7 nic.xml done [root@ibm-x3850x5-06 ~]# cat define_undefine_nwfilter.sh #! /bin/sh - while true do virsh nwfilter-undefine clean-traffic virsh nwfilter-define clean-traffic.xml done 3.Check virsh command status [root@ibm-x3850x5-06 ~]# time virsh list --all ^C real 2m32.586s user 0m0.018s sys 0m0.013s [root@ibm-x3850x5-06 ~]# time virsh nwfilter-list ^C real 5m43.071s user 0m0.020s sys 0m0.020s Actual results: As shown above steps, nwfilter and domain related commands will be blocked, no any response. Other virsh commands can get response from libvirtd. [root@ibm-x3850x5-06 ~]# time virsh pool-list --all Name State Autostart ------------------------------------------- default active yes real 0m0.025s user 0m0.011s sys 0m0.007s [root@ibm-x3850x5-06 ~]# time virsh net-list --all Name State Autostart Persistent ---------------------------------------------------------- default active yes yes host-bridge active no yes real 0m0.027s user 0m0.009s sys 0m0.007s Expected results: libvirtd should give a rapid/correct response. Additional info:
Upstream patch posted: https://www.redhat.com/archives/libvir-list/2014-November/msg00108.html
Upstream commit: commit 41127244fb90f08cf5032a5d7553f5f0390d925e Author: Pavel Hrdina <phrdina@redhat.com> Date: Wed Nov 5 14:28:57 2014 +0100 nwfilter: fix deadlock caused updating network device and nwfilter Commit 6e5c79a1 tried to fix deadlock between nwfilter{Define,Undefine} and starting of guest, but this same deadlock exists for updating/attaching network device to domain. The deadlock was introduced by removing global QEMU driver lock because nwfilter was counting on this lock and ensure that all driver locks are locked inside of nwfilter{Define,Undefine}. This patch extends usage of virNWFilterReadLockFilterUpdates to prevent the deadlock for all possible paths in QEMU driver. LXC and UML drivers still have global lock. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1143780 Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
I still can reproduce it using libvirt-1.2.8-7.el7.x86_64 [root@ibm-x3850x5-06 ~]# rpm -q libvirt qemu-kvm-rhev libvirt-1.2.8-7.el7.x86_64 qemu-kvm-rhev-2.1.2-8.el7.x86_64 In the first terminal: [root@ibm-x3850x5-06 ~]# sh update-nwfilter.sh Device updated successfully Device updated successfully Device updated successfully Device updated successfully Device updated successfully error: Failed to update device from nic.xml error: End of file while reading data: Input/output error error: Failed to reconnect to the hypervisor error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused Device updated successfully Device updated successfully ... Device updated successfully Device updated successfully ^C In the second terminal: [root@ibm-x3850x5-06 ~]# sh define_undefine_nwfilter.sh Network filter clean-traffic undefined error: Failed to define network filter from clean-traffic.xml error: End of file while reading data: Input/output error error: Failed to reconnect to the hypervisor error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused Network filter clean-traffic defined from clean-traffic.xml error: Failed to undefine network filter clean-traffic error: Requested operation is not valid: nwfilter is in use Network filter clean-traffic defined from clean-traffic.xml error: Failed to undefine network filter clean-traffic error: Requested operation is not valid: nwfilter is in use Network filter clean-traffic defined from clean-traffic.xml error: Failed to undefine network filter clean-traffic error: Requested operation is not valid: nwfilter is in use ^C [root@ibm-x3850x5-06 ~]# time virsh list --all ^C real 2m19.530s user 0m0.031s sys 0m0.012s [root@ibm-x3850x5-06 ~]#
Created attachment 958656 [details] libvirt_deadlock_libvirt-1.2.8-7.el7.x86_64 Please check the log output.
Ouch, I've completely forget about this issue. It's a different bug, the issue is with removing the nwfilter from network interface and the libvirt will crash with segfault. I'll create a bug for rhel-7.1 and also I'll fix the issue upstream and downstream.
OK, thanks for your reply, the issue your pointed out will block the bug, so I don't verify the bug, just waiting for your new patch.
I run those two scripts concurrently about 2 hours, can not reproduce it any more. The first terminal: [root@ibm-x3850x5-06 ~]# sh update-nwfilter.sh ... error: Failed to update device from nic.xml error: operation failed: failed to add new filter rules to 'vnet0' - attempting to restore old rules Device updated successfully ... The second terminal: [root@ibm-x3850x5-06 ~]# sh define_undefine_nwfilter.sh ... Network filter clean-traffic defined from clean-traffic.xml Network filter clean-traffic undefined Network filter clean-traffic defined from clean-traffic.xml ... After 2 hours, check the output of virsh command. [root@ibm-x3850x5-06 ~]# time virsh list --all Id Name State ---------------------------------------------------- 37 r7 running real 0m0.037s user 0m0.025s sys 0m0.010s [root@ibm-x3850x5-06 ~]# time virsh nwfilter-list UUID Name ------------------------------------------------------------------ c09829de-5380-4608-a827-f6a10d300784 allow-arp d339ae40-c114-446b-a6e4-d89e17e8d0a0 allow-dhcp 773b6909-d01f-4223-9b62-c75910a6e0ab allow-dhcp-server e112e697-2b01-4253-b8d7-d88273ad6419 allow-incoming-ipv4 b1d49b06-7fdd-4bd4-8438-ac4cc125d09d allow-ipv4 f3d9b618-9097-4b37-86a7-e804066e7fbe clean-traffic ... real 0m0.033s user 0m0.028s sys 0m0.004s Move to Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0323.html