Bug 1143780
Summary: | Deadlock on nwfilter when taking same concurrent jobs | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Hu Jianwei <jiahu> | ||||
Component: | libvirt | Assignee: | Pavel Hrdina <phrdina> | ||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 7.1 | CC: | dyuan, honzhang, mzhan, phrdina, rbalakri | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-1.2.8-7.el7 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2015-03-05 07:44:47 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1169409, 1202703 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Hu Jianwei
2014-09-18 01:43:40 UTC
Upstream patch posted: https://www.redhat.com/archives/libvir-list/2014-November/msg00108.html Upstream commit: commit 41127244fb90f08cf5032a5d7553f5f0390d925e Author: Pavel Hrdina <phrdina> Date: Wed Nov 5 14:28:57 2014 +0100 nwfilter: fix deadlock caused updating network device and nwfilter Commit 6e5c79a1 tried to fix deadlock between nwfilter{Define,Undefine} and starting of guest, but this same deadlock exists for updating/attaching network device to domain. The deadlock was introduced by removing global QEMU driver lock because nwfilter was counting on this lock and ensure that all driver locks are locked inside of nwfilter{Define,Undefine}. This patch extends usage of virNWFilterReadLockFilterUpdates to prevent the deadlock for all possible paths in QEMU driver. LXC and UML drivers still have global lock. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1143780 Signed-off-by: Pavel Hrdina <phrdina> I still can reproduce it using libvirt-1.2.8-7.el7.x86_64 [root@ibm-x3850x5-06 ~]# rpm -q libvirt qemu-kvm-rhev libvirt-1.2.8-7.el7.x86_64 qemu-kvm-rhev-2.1.2-8.el7.x86_64 In the first terminal: [root@ibm-x3850x5-06 ~]# sh update-nwfilter.sh Device updated successfully Device updated successfully Device updated successfully Device updated successfully Device updated successfully error: Failed to update device from nic.xml error: End of file while reading data: Input/output error error: Failed to reconnect to the hypervisor error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused Device updated successfully Device updated successfully ... Device updated successfully Device updated successfully ^C In the second terminal: [root@ibm-x3850x5-06 ~]# sh define_undefine_nwfilter.sh Network filter clean-traffic undefined error: Failed to define network filter from clean-traffic.xml error: End of file while reading data: Input/output error error: Failed to reconnect to the hypervisor error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused error: failed to connect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused Network filter clean-traffic defined from clean-traffic.xml error: Failed to undefine network filter clean-traffic error: Requested operation is not valid: nwfilter is in use Network filter clean-traffic defined from clean-traffic.xml error: Failed to undefine network filter clean-traffic error: Requested operation is not valid: nwfilter is in use Network filter clean-traffic defined from clean-traffic.xml error: Failed to undefine network filter clean-traffic error: Requested operation is not valid: nwfilter is in use ^C [root@ibm-x3850x5-06 ~]# time virsh list --all ^C real 2m19.530s user 0m0.031s sys 0m0.012s [root@ibm-x3850x5-06 ~]# Created attachment 958656 [details]
libvirt_deadlock_libvirt-1.2.8-7.el7.x86_64
Please check the log output.
Ouch, I've completely forget about this issue. It's a different bug, the issue is with removing the nwfilter from network interface and the libvirt will crash with segfault. I'll create a bug for rhel-7.1 and also I'll fix the issue upstream and downstream. OK, thanks for your reply, the issue your pointed out will block the bug, so I don't verify the bug, just waiting for your new patch. I run those two scripts concurrently about 2 hours, can not reproduce it any more. The first terminal: [root@ibm-x3850x5-06 ~]# sh update-nwfilter.sh ... error: Failed to update device from nic.xml error: operation failed: failed to add new filter rules to 'vnet0' - attempting to restore old rules Device updated successfully ... The second terminal: [root@ibm-x3850x5-06 ~]# sh define_undefine_nwfilter.sh ... Network filter clean-traffic defined from clean-traffic.xml Network filter clean-traffic undefined Network filter clean-traffic defined from clean-traffic.xml ... After 2 hours, check the output of virsh command. [root@ibm-x3850x5-06 ~]# time virsh list --all Id Name State ---------------------------------------------------- 37 r7 running real 0m0.037s user 0m0.025s sys 0m0.010s [root@ibm-x3850x5-06 ~]# time virsh nwfilter-list UUID Name ------------------------------------------------------------------ c09829de-5380-4608-a827-f6a10d300784 allow-arp d339ae40-c114-446b-a6e4-d89e17e8d0a0 allow-dhcp 773b6909-d01f-4223-9b62-c75910a6e0ab allow-dhcp-server e112e697-2b01-4253-b8d7-d88273ad6419 allow-incoming-ipv4 b1d49b06-7fdd-4bd4-8438-ac4cc125d09d allow-ipv4 f3d9b618-9097-4b37-86a7-e804066e7fbe clean-traffic ... real 0m0.033s user 0m0.028s sys 0m0.004s Move to Verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0323.html |