Bug 1441687 - Yet another deadlock in nwfilter
Summary: Yet another deadlock in nwfilter
Keywords:
Status: CLOSED DUPLICATE of bug 1478636
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-12 13:22 UTC by Sergey
Modified: 2017-08-05 11:47 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-05 11:47:50 UTC
Embargoed:


Attachments (Terms of Use)
Core-dump libvirt 2.0.0 make in CentOS 7 (8.07 MB, application/x-gzip)
2017-06-22 09:36 UTC, Sergey
no flags Details
Core-dump libvirt 2.0.0 make in CentOS 7 (5.97 MB, application/x-gzip)
2017-06-22 09:38 UTC, Sergey
no flags Details
All thread info – 2nd archive (27.47 KB, text/plain)
2017-06-23 07:44 UTC, Sergey
no flags Details
All thread info – 1st archive (72.39 KB, text/plain)
2017-06-23 08:06 UTC, Sergey
no flags Details
2nd archive tar.gz (5.97 MB, application/x-gzip)
2017-06-23 08:12 UTC, Sergey
no flags Details

Description Sergey 2017-04-12 13:22:47 UTC
There are some virtual machines which should be deleted from different threads and before that I need to remove their IPv4 and IPv6 addresses. But libvirt was hung when I tried to get count of network filters using virConnectNumOfNWFilters after removing of IPv4 and IPv6 addresses. I got this log via gdb:

Thread 12 (Thread 0x7f88bbfff700 (LWP 7143)):
#0  0x00007f88d0bc41bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f88d0bbfd1d in _L_lock_840 () from /lib64/libpthread.so.0
#2  0x00007f88d0bbfc3a in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f88b17f7e50 in nwfilterConnectNumOfNWFilters () from /usr/lib64/libvirt/connection-driver/libvirt_driver_nwfilter.so
#4  0x00007f88d365da23 in virConnectNumOfNWFilters () from /lib64/libvirt.so.0
#5  0x00007f88d429d424 in remoteDispatchConnectNumOfNWFiltersHelper ()
#6  0x00007f88d36ae002 in virNetServerProgramDispatch () from /lib64/libvirt.so.0
#7  0x00007f88d42bdc6d in virNetServerHandleJob ()
#8  0x00007f88d359ad41 in virThreadPoolWorker () from /lib64/libvirt.so.0
#9  0x00007f88d359a0c8 in virThreadHelper () from /lib64/libvirt.so.0
#10 0x00007f88d0bbddc5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f88d08ec73d in clone () from /lib64/libc.so.6

CentOS Linux release 7.3.1611

Compiled against library: libvirt 2.0.0
Using library: libvirt 2.0.0
Using API: QEMU 2.0.0
Running hypervisor: QEMU 1.5.3

Comment 1 Sergey 2017-06-22 09:36:18 UTC
Created attachment 1290586 [details]
Core-dump libvirt 2.0.0 make in CentOS 7

Core file generated with debuginfo glibc and libvirt

Comment 2 Sergey 2017-06-22 09:38:38 UTC
Created attachment 1290587 [details]
Core-dump libvirt 2.0.0 make in CentOS 7

Core file generated with debuginfo glibc and libvirt

Comment 3 Laine Stump 2017-06-22 15:18:40 UTC
gdb doesn't like the first corefile, and archive manager doesn't like the 2nd. Really all that would be useful from a corefile would be the output of "thread apply all bt". Can you just run that command after attaching to the deadlocked libvirtd and add that output to the report (as an unzipped text attachment)?

Also, is there any possibility you could try the same test with a current upstream libvirt? It's easier to work with a problem that's reproducible with current code.

Comment 4 Laine Stump 2017-06-22 15:22:12 UTC
Oh, and what do you mean by "remove their IPv4 and IPv6 addresses"? Did you modify the filter definitions being used by the guests? Or did you tell the guests to release their IP addresses from DHCP? Or something else? What is the <interface> config of the guests, and what is in the nwfilter rules they are using? (do you have custom rules, or are you using the pre-made rules that are installed with libvirt?)

Comment 5 Sergey 2017-06-23 07:44:26 UTC
Created attachment 1290913 [details]
All thread info – 2nd archive

Comment 6 Sergey 2017-06-23 08:06:36 UTC
Created attachment 1290924 [details]
All thread info – 1st archive

Comment 7 Sergey 2017-06-23 08:12:07 UTC
Created attachment 1290927 [details]
2nd archive tar.gz

Comment 8 Sergey 2017-07-03 13:34:32 UTC
This issue is not reproduced on the test server, but it is always reproduced on the product server. I cannot determine the cause of this. So I made a core-dump file to analyze the problem from the inside.

Why gdb doesn't like the first corefile, maybe I should re-upload the core-dump files?
I have CentOS 7 and gdb installed from standard repo:
# yum info gdb
Loaded plugins: auto-update-debuginfo, fastestmirror
Loading mirror speeds from cached hostfile
 * base: centos-mirror.rbc.ru
 * epel-debuginfo: fedora-mirror01.rbc.ru
 * extras: centos-mirror.rbc.ru
 * updates: centos-mirror.rbc.ru
Installed Packages
Name        : gdb
Arch        : x86_64
Version     : 7.6.1
Release     : 94.el7
Size        : 7.0 M
Repo        : installed
From repo   : base
Summary     : A GNU source-level debugger for C, C++, Fortran, Go and other languages
URL         : http://gnu.org/software/gdb/
License     : GPLv3+ and GPLv3+ with exceptions and GPLv2+ and GPLv2+ with exceptions and GPL+ and LGPLv2+ and BSD and Public Domain
Description : GDB, the GNU debugger, allows you to debug programs written in C, C++,
            : Java, and other languages, by executing them in a controlled fashion
            : and printing their data.

Comment 9 Sergey 2017-07-03 13:38:24 UTC
I tried to install the libvirt version 3.3 but the problem persists

Comment 10 Sergey 2017-08-05 11:47:50 UTC

*** This bug has been marked as a duplicate of bug 1478636 ***


Note You need to log in before you can comment on or make changes to this bug.