Cloning to RHEL because the same problem exists with use of libnl.so +++ This bug was initially created as a clone of Bug #886454 +++ Description of problem: I was running the libguestfs test suite, and it died here: /home/rjones/d/libguestfs/run --test ./test-max-disks.pl libvir: XML-RPC error : Cannot recv data: Connection reset by peer could not connect to libvirt (URI = NULL): Cannot recv data: Connection reset by peer [code=38 domain=7] at /home/rjones/d/libguestfs/tests/disks/test-max-disks.pl line 46. max_disks is 255 /home/rjones/d/libguestfs/run: command failed with exit code 104 FAIL: test-max-disks.pl Version-Release number of selected component: libvirt-daemon-0.10.2.1-3.fc18 Additional info: backtrace_rating: 4 cmdline: /usr/sbin/libvirtd --timeout=30 crash_function: nl_object_put executable: /usr/sbin/libvirtd kernel: 3.6.9-4.fc18.x86_64 remote_result: NOTFOUND uid: 1000 Truncated backtrace: Thread no. 1 (10 frames) #4 nl_object_put at object.c:197 #5 nl_object_free at object.c:158 #6 nl_cache_remove at cache.c:484 #7 nl_cache_clear at cache.c:347 #8 nl_cache_free at cache.c:364 #9 netlink_close at dutil_linux.c:864 #10 drv_close at drv_redhat.c:384 #11 ncf_close at netcf.c:101 #12 interfaceCloseInterface at interface/interface_backend_netcf.c:170 #13 virConnectDispose at datatypes.c:134 --- Additional comment from Richard W.M. Jones on 2012-12-12 10:29:35 GMT --- Created attachment 662242 [details] File: backtrace --- Additional comment from Richard W.M. Jones on 2012-12-12 10:29:37 GMT --- Created attachment 662243 [details] File: cgroup --- Additional comment from Richard W.M. Jones on 2012-12-12 10:29:39 GMT --- Created attachment 662244 [details] File: core_backtrace --- Additional comment from Richard W.M. Jones on 2012-12-12 10:29:41 GMT --- Created attachment 662245 [details] File: dso_list --- Additional comment from Richard W.M. Jones on 2012-12-12 10:29:43 GMT --- Created attachment 662246 [details] File: environ --- Additional comment from Richard W.M. Jones on 2012-12-12 10:29:45 GMT --- Created attachment 662247 [details] File: limits --- Additional comment from Richard W.M. Jones on 2012-12-12 10:30:01 GMT --- Created attachment 662248 [details] File: maps --- Additional comment from Richard W.M. Jones on 2012-12-12 10:30:03 GMT --- Created attachment 662249 [details] File: open_fds --- Additional comment from Richard W.M. Jones on 2012-12-12 10:30:05 GMT --- Created attachment 662251 [details] File: proc_pid_status --- Additional comment from Richard W.M. Jones on 2012-12-12 10:30:08 GMT --- Created attachment 662252 [details] File: var_log_messages --- Additional comment from Daniel Berrange on 2012-12-12 10:34:26 GMT --- Superficially the stack trace points towards libnl as being at fault. Can you tell me what 'libnl3' and 'netcf' library versions are installed --- Additional comment from Daniel Berrange on 2012-12-13 11:30:33 GMT --- The libnl3 code that's causing the crash is this if (obj->ce_refcnt < 0) BUG(); so libnl's ref counting seems to have a bug somewhere :-( --- Additional comment from Daniel Berrange on 2012-12-13 12:03:14 GMT --- It appears that netcf_init/netcf_close are not thread-safe :-( $ cat nc.c #include <netcf.h> #include <pthread.h> #include <stdlib.h> void *worker(void *data) { for (;;) { struct netcf *netcf; if (ncf_init(&netcf, NULL) != 0) abort(); ncf_close(netcf); } } int main (int argc, char **argv) { int nthreads = 20; pthread_t threads[nthreads]; size_t i; for (i = 0 ; i < nthreads ; i++) { pthread_create(&threads[i], NULL, worker, NULL); } for (i = 0 ; i < nthreads ; i++) { pthread_join(threads[i], NULL); } return 0; } $ ./nc Relax-NG types library 'http://relaxng.org/ns/structure/1.0' already registered Segmentation fault [berrange@mustard ~]$ ./nc Relax-NG types library 'http://www.w3.org/2001/XMLSchema-datatypes' already registered Relax-NG types library 'http://relaxng.org/ns/structure/1.0' already registered Relax-NG types library failed to register 'http://www.w3.org/2001/XMLSchema-datatypes' Relax-NG types library 'http://www.w3.org/2001/XMLSchema-datatypes' already registered Relax-NG types library 'http://relaxng.org/ns/structure/1.0' already registered Relax-NG types library 'http://relaxng.org/ns/structure/1.0' already registered BUG: object.c:197 nc: object.c:197: nl_object_put: Assertion `0' failed. Aborted [berrange@mustard ~]$ ./nc BUG: object.c:197 nc: object.c:197: nl_object_put: Assertion `0' failed. Aborted [berrange@mustard ~]$ [berrange@mustard ~]$ ^C [berrange@mustard ~]$ ./nc Relax-NG types library 'http://www.w3.org/2001/XMLSchema-datatypes' already registered Relax-NG types library 'http://relaxng.org/ns/structure/1.0' already registered Relax-NG types library failed to register 'http://www.w3.org/2001/XMLSchema-datatypes' Relax-NG types library failed to register 'http://www.w3.org/2001/XMLSchema-datatypes' Relax-NG types library 'http://relaxng.org/ns/structure/1.0' already registered Relax-NG types library 'http://www.w3.org/2001/XMLSchema-datatypes' already registered Relax-NG types library 'http://relaxng.org/ns/structure/1.0' already registered Relax-NG types library 'http://relaxng.org/ns/structure/1.0' already registered BUG: object.c:197 nc: object.c:197: nl_object_put: Assertion `0' failed. Aborted There appear to be two problems here. One appears to be libxml related - the RNG schema warnings. The second is libnl3 related. The second problem can be isolated with the following test #include <netlink/netlink.h> #include <pthread.h> #include <stdlib.h> #include <netlink/route/addr.h> #include <netlink/route/link.h> void *worker(void *data) { for (;;) { struct nl_sock *nl_sock; struct nl_cache *link_cache; struct nl_cache *addr_cache; if (!(nl_sock = nl_socket_alloc())) { perror("nl_sock_alloc"); abort(); } if (nl_connect(nl_sock, NETLINK_ROUTE) < 0) { perror("nl_connect"); abort(); } if (rtnl_link_alloc_cache(nl_sock, AF_UNSPEC, &link_cache) < 0) { perror("nl_link_alloc_cache"); abort(); } nl_cache_mngt_provide(link_cache); if (rtnl_addr_alloc_cache(nl_sock, &addr_cache) < 0) { perror("nl_addr_alloc_cache"); abort(); } nl_cache_mngt_provide(addr_cache); nl_cache_free(addr_cache); nl_cache_free(link_cache); nl_close(nl_sock); nl_socket_free(nl_sock); } } int main (int argc, char **argv) { int nthreads = 20; pthread_t threads[nthreads]; size_t i; for (i = 0 ; i < nthreads ; i++) { pthread_create(&threads[i], NULL, worker, NULL); } for (i = 0 ; i < nthreads ; i++) { pthread_join(threads[i], NULL); } return 0; } This test program will crash. If you comment out the two nl_cache_mngt_provide calls then the crashes go away. Looking at the libnl3 code this is not surprising /** * Provide a cache for global use * @arg cache cache to provide * * Offers the specified cache to be used by other modules. * Only one cache per type may be shared at a time, * a previsouly provided caches will be overwritten. */ void nl_cache_mngt_provide(struct nl_cache *cache) { struct nl_cache_ops *ops; ops = cache_ops_lookup_for_obj(cache->c_ops->co_obj_ops); if (!ops) BUG(); else ops->co_major_cache = cache; } Note the comment that only a single cache can be used at a time - this is a process wide global cache, held in a static global variable static struct nl_cache_ops *cache_ops; This is really awful design from libnl3. The caches really need to be scoped to the nl_sock. It is not sufficient for netcf to simply do a one-time init of the caches itself, because other parts of libvirt also use libnl, so netcf can't assume it is the only owner of the caches. AFAICT, the only viable option is to *not* register the caches at all. I'm not sure what that will do to performance though
I fixed this in the git tree (released as 3.2.16) by protecting the cache operations with a mutex. Therefore cache provisioning is now thread safe. I can no longer reproduce the SIGSEGV with the latest git tree. When this functionality was initially implemented the use case in mind was one similar to iproute2 where there would never be multiple threads. A backport to libnl1.1 should not be hard. I would be happy to maintain a stable branch if really required.
See also: https://bugzilla.redhat.com/show_bug.cgi?id=886454#c15 (In reply to comment #2) > I fixed this in the git tree (released as 3.2.16) by protecting the cache > operations with a mutex. Therefore cache provisioning is now thread safe. I > can no longer reproduce the SIGSEGV with the latest git tree. Can you push this into Fedora 18 updates-testing, so that it gets more testing with libguestfs?
CCing Dan Williams as he is maintaning the libnl packages. @Dan, could you push 3.2.16 into F18?
Hi Daniel, I'm using the C file you gave to do test on RHEL6.4, and in certain probability will get core dumped. $ cat nc.c #include <netcf.h> #include <pthread.h> #include <stdlib.h> void *worker(void *data) { for (;;) { struct netcf *netcf; if (ncf_init(&netcf, NULL) != 0) abort(); ncf_close(netcf); } } int main (int argc, char **argv) { int nthreads = 20; pthread_t threads[nthreads]; size_t i; for (i = 0 ; i < nthreads ; i++) { pthread_create(&threads[i], NULL, worker, NULL); } for (i = 0 ; i < nthreads ; i++) { pthread_join(threads[i], NULL); } return 0; } $ ./nc Relax-NG types library failed to register 'http://www.w3.org/2001/XMLSchema-datatypes' Relax-NG types library 'http://relaxng.org/ns/structure/1.0' already registered Aborted (core dumped) Do you think it's enough to reproduce this bug? BTW, when compile the second C file, it will fail: $ gcc -o crash_test -lpthread -lnl crash_test.c crash_test.c: In function ‘worker’: crash_test.c:14: warning: assignment makes pointer from integer without a cast crash_test.c:19: warning: passing argument 1 of ‘nl_connect’ from incompatible pointer type /usr/include/netlink/netlink.h:40: note: expected ‘struct nl_handle *’ but argument is of type ‘struct nl_sock *’ crash_test.c:24: warning: passing argument 1 of ‘rtnl_link_alloc_cache’ from incompatible pointer type /usr/include/netlink/route/link.h:66: note: expected ‘struct nl_handle *’ but argument is of type ‘struct nl_sock *’ crash_test.c:24: error: too many arguments to function ‘rtnl_link_alloc_cache’ crash_test.c:30: warning: passing argument 1 of ‘rtnl_addr_alloc_cache’ from incompatible pointer type /usr/include/netlink/route/addr.h:31: note: expected ‘struct nl_handle *’ but argument is of type ‘struct nl_sock *’ crash_test.c:30: error: too many arguments to function ‘rtnl_addr_alloc_cache’ crash_test.c:39: warning: passing argument 1 of ‘nl_close’ from incompatible pointer type /usr/include/netlink/netlink.h:41: note: expected ‘struct nl_handle *’ but argument is of type ‘struct nl_sock *’ And on RHEL6, it only use libnl-1 not libnl-3, so do u know when the fix will backport to RHEL6? Thanks.
(In reply to comment #6) > And on RHEL6, it only use libnl-1 not libnl-3, so do u know when the fix > will backport to RHEL6? I'm creating a stable branch for libnl1 and do a new release today that can be used for RHEL6.
Yes, the first program is enough to demonstrate the crash. The second program was merely to isolate the problem further
A stable branch is being maintained here now: https://github.com/tgraf/libnl-1.1-stable I have backported all relevant fixes
Created attachment 668424 [details] run nc.c in a loop Install the new libnl from https://github.com/tgraf/libnl-1.1-stable and update netcf to netcf-0.1.9-3.el6.x86_64. # virsh iface-list --all Name State MAC Address -------------------------------------------- eth0 active 00:23:ae:8f:f2:b3 lo active 00:00:00:00:00:00 # virsh iface-dumpxml eth0 <interface type='ethernet' name='eth0'> <mac address='00:23:ae:8f:f2:b3'/> <protocol family='ipv4'> <ip address='10.66.82.251' prefix='23'/> </protocol> <protocol family='ipv6'> <ip address='fe80::223:aeff:fe8f:f2b3' prefix='64'/> </protocol> </interface> But when run the nc program in a loop, in very few cases(1/100), it will get following error: #for i in {1..770}; do ./nc & (sleep 1; kill -9 `pidof ./nc`); sleep 1; done &> nc.log ...... Relax-NG types library 'http://relaxng.org/ns/structure/1.0' already registered error : not a socket error : not a socket error : not a socket /usr/share/netcf/xml/interface.rng:1: parser error : /usr/share/netcf/xml/interface.rng:1: parser error : Document is empty ^ /usr/share/netcf/xml/interface.rng:1: parser error : Start tag expected, '<' not found ^ error : not a socket /usr/share/netcf/xml/interface.rng:1: parser error : Document is empty ^ /usr/share/netcf/xml/interface.rng:1: parser error : Start tag expected, '<' not found ^ Segmentation fault (core dumped) ...... In /var/log/messages, there's few logs as following: ...... Dec 24 16:45:00 intel-q9400-4-2 kernel: nc[16206]: segfault at 378d ip 000000339a87b6ec sp 00007f2361e3fcc8 error 4 in libc-2.12.so[339a800000+18a000] ...... Dec 24 17:22:14 intel-q9400-4-2 kernel: nc[31134]: segfault at 378d ip 000000339a87b6ec sp 00007fed5ebfccc8 error 4 in libc-2.12.so[339a800000+18a000] Dec 24 17:22:14 intel-q9400-4-2 kernel: nc[31133]: segfault at 378d ip 000000339a87b6ec sp 00007fed5f5fdcc8 error 4 in libc-2.12.so[339a800000+18a000] ......
(In reply to comment #13) > Segmentation fault (core dumped) Can you find the core dump and get a stack trace from it.
(gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /home/ydu/nc [Thread debugging using libthread_db enabled] [New Thread 0x7ffff7fe7700 (LWP 8250)] [New Thread 0x7ffff75e6700 (LWP 8251)] [New Thread 0x7ffff6be5700 (LWP 8252)] error : not a socket error : not a socket /usr/share/netcf/xml/interface.rng:1: [New Thread 0x7ffff61e4700 (LWP 8253)] parser error : Document is empty ^ /usr/share/netcf/xml/interface.rng:1: /usr/share/netcf/xml/interface.rng:1: [New Thread 0x7ffff57e3700 (LWP 8254)] parser error : Start tag expected, '<' not found ^ parser error : Document is empty ^ /usr/share/netcf/xml/interface.rng:1: parser error : Start tag expected, '<' not found ^ *** glibc detected *** /home/ydu/nc: free(): invalid pointer: 0x00007ffff7fe6cc4 *** *** glibc detected *** /home/ydu/nc: free(): invalid pointer: 0x00007ffff75e5cc4[New Thread 0x7ffff4de2700 (LWP 8255)] error : Bad file descriptor error : Bad file descriptor /usr/share/netcf/xml/interface.rng:1: parser error : Document is empty ^ /usr/share/netcf/xml/interface.rng:1: parser error : Start tag expected, '<' not found ^ /usr/share/netcf/xml/interface.rng:1: *** glibc detected *** /home/ydu/ncparser error : Document is empty ^ [New Thread 0x7fffdffff700 (LWP 8256)] error : not a socket /usr/share/netcf/xml/interface.rng:1: parser error : Start tag expected, '<' not found ^ *** glibc detected *** /home/ydu/nc: free(): invalid pointer: 0x00007ffff61e3cc4 *** /usr/share/netcf/xml/interface.rng:1: parser error : Document is empty ^ /usr/share/netcf/xml/interface.rng:1: parser error : Start tag expected, '<' not found ^ *** glibc detected *** /home/ydu/nc: free(): invalid pointer: 0x00007ffff57e2cc4 *** [New Thread 0x7fffdf5fe700 (LWP 8257)] error : not a socket /usr/share/netcf/xml/interface.rng:1: parser error : Document is empty ^ error : /usr/share/netcf/xml/interface.rng:1: not a socket parser ======= Backtrace: ========= error : Start tag expected, '<' not found ^ /lib64/libc.so.6/usr/share/netcf/xml/interface.rng:1: parser error : Document is empty ^ /usr/share/netcf/xml/interface.rng:1: parser error : Start tag expected, '<' not found ^ *** glibc detected *** /home/ydu/nc: free(): invalid pointer: 0x00007fffdfffecc4 *** ======= Backtrace: ========= /usr/lib64/libxml2.so.2(xmlNanoFTPFreeCtxt+0x3c)[0x33a409b00c] /lib64/libc.so.6*** glibc detected *** /home/ydu/nc: free(): invalid pointer: 0x00007ffff4de1cc4 *** /usr/lib64/libxml2.so.2(xmlNanoFTPFreeCtxt+0x3c)[0x======= Backtrace: ========= /usr/lib64/libxml2.so.2(xmlNanoFTPClose+0x56)[0x33a409b1d6] /usr/lib64/libxml2.so.2(xmlNanoFTPClose+0x56)[0x33a409b1d6] /usr/lib64/libxml2.so.2(xmlFreeParserInputBuffer+0x3b)[0x33a405e55b] /usr/lib64/libxml2.so.2(xmlFreeInputStream+0x66)[0x33a4033c66] /usr/lib64/libxml2.so.2(xmlFreeParserCtxt+0x20)[0x33a4033cb0] /usr/lib64/libxml2.so.2[0x33a404df21] /usr/lib64/libxml2.so.2(xmlRelaxNGParse+0x31)[0x33a40ef651] /usr/lib64/libnetcf.so.1[0x3e4fa07c4d] /usr/lib64/libnetcf.so.1(ncf_init+0xb6)[0x3e4fa049a6] /home/ydu/nc[0x400761] /lib64/libpthread.so.0[0x339ac07851] /lib64/libc.so.6(clone+0x6d)[0x339a8e890d] ======= Memory map: ======== 00400000-00401000 r-xp 00000000 08:01 393567 /home/ydu/nc 00600000-00601000 rw-p 00000000 08:01 393567 /home/ydu/nc 00601000-00622000 rw-p 00000000 00:00 0 [heap] 339a000000-339a020000 r-xp 00000000 08:01 931139 /lib64/ld-2.12.so 339a21f000-339a220000 r--p 0001f000 08:01 931139 /lib64/ld-2.12.so 339a220000-339a221000 rw-p 00020000 08:01 931139 /lib64/ld-2.12.so 339a221000-339a222000 rw-p 00000000 00:00 0 339a400000-339a402000 r-xp 00000000 08:01 931149 /lib64/libdl-2.12.so 339a402000-339a602000 ---p 00002000 08:01 931149 /lib64/libdl-2.12.so 339a602000-339a603000 r--p 00002000 08:01 931149 /lib64/libdl-2.12.so 339a603000-339a604000 rw-p 00003000 08:01 931149 /lib64/libdl-2.12.so 339a800000-339a98a000 r-xp 00000000 08:01 931141 /lib64/libc-2.12.so 339a98a000-339ab89000 ---p 0018a000 08:01 931141 /lib64/libc-2.12.so 339ab89000-339ab8d000 r--p 00189000 08:01 931141 /lib64/libc-2.12.so 339ab8d000-339ab8e000 rw-p 0018d000 08:01 931141 /lib64/libc-2.12.so 339ab8e000-339ab93000 rw-p 00000000 00:00 0 339ac00000-339ac17000 r-xp 00000000 08:01 931147 /lib64/libpthread-2.12.so 339ac17000-339ae17000 ---p 00017000 08:01 931147 /lib64/libpthread-2.12.so 339ae17000-339ae18000 r--p 00017000 08:01 931147 /lib64/libpthread-2.12.so 339ae18000-339ae19000 rw-p 00018000 08:01 931147 /lib64/libpthread-2.12.so 339ae19000-339ae1d000 rw-p 00000000 00:00 0 339b000000-339b083000 r-xp 00000000 08:01 931143 /lib64/libm-2.12.so 339b083000-339b282000 ---p 00083000 08:01 931143 /lib64/libm-2.12.so 339b282000-339b283000 r--p 00082000 08:01 931143 /lib64/libm-2.12.so 339b283000-339b284000 rw-p 00083000 08:01 931143 /lib64/libm-2.12.so 339bc00000-339bc15000 r-xp 00000000 08:01 931144 /lib64/libz.so.1.2.3 339bc15000-339be14000 ---p 00015000 08:01 931144 /lib64/libz.so.1.2.3 339be14000-339be15000 r--p 00014000 08:01 931144 /lib64/libz.so.1.2.3 339be15000-339be16000 rw-p 00015000 08:01 931144 /lib64/libz.so.1.2.3 339c000000-339c01d000 r-xp 00000000 08:01 931157 /lib64/libselinux.so.1 339c01d000-339c21c000 ---p 0001d000 08:01 931157 /lib64/libselinux.so.1 339c21c000-339c21d000 r--p 0001c000 08:01 931157 /lib64/libselinux.so.1 339c21d000-339c21e000 rw-p 0001d000 08:01 931157 /lib64/libselinux.so.1 339c21e000-339c21f000 rw-p 00000000 00:00 0 339dc00000-339dc41000 r-xp 00000000 08:01 659576 /usr/lib64/libaugeas.so.0.14.0 339dc41000-339de41000 ---p 00041000 08:01 659576 /usr/lib64/libaugeas.so.0.14.0 339de41000-339de43000 rw-p 00041000 08:01 659576 /usr/lib64/libaugeas.so.0.14.0 339e000000-339e011000 r-xp 00000000 08:01 661903 /usr/lib64/libfa.so.1.3.4 339e011000-339e210000 ---p 00011000 08:01 661903 /usr/lib64/libfa.so.1.3.4 339e210000-339e211000 rw-p 00010000 08:01 661903 /usr/lib64/libfa.so.1.3.4 33a4000000-33a4148000 r-xp 00000000 08:01 662483 /usr/lib64/libxml2.so.2.7.6 33a4148000-33a4347000 ---p 00148000 08:01 662483 /usr/lib64/libxml2.so.2.7.6 33a4347000-33a4351000 rw-p 00147000 08:01 662483 /usr/lib64/libxml2.so.2.7.6 33a4351000-33a4352000 rw-p 00000000 00:00 0 33a6c00000-33a6c16000 r-xp 00000000 08:01 931146 /lib64/libgcc_s-4.4.7-20120601.so.1 33a6c16000-33a6e15000 ---p 00016000 08:01 931146 /lib64/libgcc_s-4.4.7-20120601.so.1 33a6e15000-33a6e16000 rw-p 00015000 08:01 931146 /lib64/libgcc_s-4.4.7-20120601.so.1 33ab400000-33ab403000 r-xp 00000000 08:01 931171 /lib64/libgpg-error.so.0.5.0 33ab403000-33ab602000 ---p 00003000 08:01 931171 /lib64/libgpg-error.so.0.5.0 33ab602000-33ab603000 r--p 00002000 08:01 931171 /lib64/libgpg-error.so.0.5.0 33ab603000-33ab604000 rw-p 00003000 08:01 931171 /lib64/libgpg-error.so.0.5.0 33ac400000-33ac472000 r-xp 00000000 08:01 931172 /lib64/libgcrypt.so.11.5.3 33ac472000-33ac671000 ---p 00072000 08:01 931172 /lib64/libgcrypt.so.11.5.3 33ac671000-33ac672000 r--p 00071000 08:01 931172 /lib64/libgcrypt.so.11.5.3 33ac672000-33ac675000 rw-p 00072000 08:01 931172 /lib64/libgcrypt.so.11.5.3 33d8600000-33d8613000 r-xp 00000000 08:01 677463 /usr/lib64/libexslt.so.0.8.15 33d8613000-33d8813000 ---p 00013000 08:01 677463 /usr/lib64/libexslt.so.0.8.15 33d8813000-33d8814000 rw-p 00013000 08:01 677463 /usr/lib64/libexslt.so.0.8.15 33d9200000-33d923b000 r-xp 00000000 08:01 660217 /usr/lib64/libxslt.so.1.1.26 33d923b000-33d943b000 ---p 0003b000 08:01 660217 /usr/lib64/libxslt.so.1.1.26 33d943b000-33d943d000 rw-p 0003b000 08:01 660217 /usr/lib64/libxslt.so.1.1.26 3e4f600000-3e4f64c000 r-xp 00000000 08:01 930343 /lib64/libnl.so.1.1.1 3e4f64c000-3e4f84c000 ---p 0004c000 08:01 930343 /lib64/libnl.so.1.1.1 3e4f84c000-3e4f851000 rw-p 0004c000 08:01 930343 /lib64/libnl.so.1.1.1 3e4fa00000-3e4fa0e000 r-xp 00000000 08:01 676243 /usr/lib64/libnetcf.so.1.4.0 3e4fa0e000-3e4fc0d000 ---p 0000e000 08:01 676243 /usr/lib64/libnetcf.so.1.4.0 3e4fc0d000-3e4fc0e000 rw-p 0000d000 08:01 676243 /usr/lib64/libnetcf.so.1.4.0/usr/lib64/libxml2.so.2(xmlFreeInputStream+0x66)[0x33a4033c66] /usr/lib64/libxml2.so.2(xmlNanoFTPClose+0x56)[0x33a409b1d6] /lib64/libc.so.6[0x339a8760e6] /usr/lib64/libxml2.so.2(xmlFreeParserCtxt+0x20)[0x33a4033cb0] /usr/lib64/libxml2.so.2[0x33a404df21] /usr/lib64/libxml2.so.2(xmlRelaxNGParse+0x31) Program received signal SIGABRT, Aborted. [Switching to Thread 0x7ffff57e3700 (LWP 8254)] 0x000000339a8328a5 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install libgcc-4.4.7-3.el6.x86_64 (gdb) thread apply all bt
Created attachment 668718 [details] thread apply all bt
I would say this looks like a different bug in libxml2's nano-FTP library. Since this is RHEL 6, it may also be a bug that has since been fixed upstream, although I seem to recall that danpb saw something similar in Fedora.
HI, Daniel As the problem mentioned in comment 15 & comment 17, what should i do with this bug, should i close this one as VERIFIED and open a new one against another component? Please help, thank! Berrange
(In reply to comment #13) > Install the new libnl from https://github.com/tgraf/libnl-1.1-stable > and update netcf to netcf-0.1.9-3.el6.x86_64. You *should not* install a new libnl from some external source to test this bz. That version of libnl will not be included in RHEL6.4 (because libnl isn't in the ACL). We were able to get netcf added to the ACL, and it has been patched to avoid the offending code in libnl. So the proper test is to install the new official netcf build *only*, then run the nc program in a loop. (In reply to comment #18) > As the problem mentioned in comment 15 & comment 17, what should i do with > this bug, should i close this one as VERIFIED and open a new one against > another component? Yes, that is a different bug, and looks to be a problem in libxml2, as Rich suggested. You should file a different BZ and put the component as libxml2 (for now at least). In the meantime, if you no longer encounter the old crash when you run nc in a loop with the updated netcf installed *but without the unofficial libnl update*, then you should mark this bug as VERIFIED.
(In reply to comment #19) > (In reply to comment #13) > > Install the new libnl from https://github.com/tgraf/libnl-1.1-stable > > and update netcf to netcf-0.1.9-3.el6.x86_64. > > You *should not* install a new libnl from some external source to test this > bz. That version of libnl will not be included in RHEL6.4 (because libnl > isn't in the ACL). We were able to get netcf added to the ACL, and it has > been patched to avoid the offending code in libnl. So the proper test is to > install the new official netcf build *only*, then run the nc program in a > loop. > I see, thanks for the detailed explanation. I just install the latest netcf package(netcf-0.1.9-3.el6.x86_64) and run the nc program in a loop: #for i in {1..770}; do ./nc & (sleep 1; kill -9 `pidof ./nc`); sleep 1; done &> nc.log After the loop finish, it get 3 'core dumped'. # tail -20 /var/log/messages Jan 9 10:57:03 intel-8400-8-2 abrtd: Directory 'ccpp-2013-01-09-10:57:02-26195' creation detected Jan 9 10:57:03 intel-8400-8-2 abrt[26218]: Saved core dump of pid 26195 (/root/nc) to /var/spool/abrt/ccpp-2013-01-09-10:57:02-26195 (213925888 bytes) Jan 9 10:57:03 intel-8400-8-2 abrtd: Executable '/root/nc' doesn't belong to any package Jan 9 10:57:03 intel-8400-8-2 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2013-01-09-10:57:02-26195' exited with 1 Jan 9 10:57:03 intel-8400-8-2 abrtd: Corrupted or bad directory /var/spool/abrt/ccpp-2013-01-09-10:57:02-26195, deleting Jan 9 11:08:37 intel-8400-8-2 abrtd: Directory 'ccpp-2013-01-09-11:08:37-2293' creation detected Jan 9 11:08:37 intel-8400-8-2 abrt[2316]: Saved core dump of pid 2293 (/root/nc) to /var/spool/abrt/ccpp-2013-01-09-11:08:37-2293 (214310912 bytes) Jan 9 11:08:37 intel-8400-8-2 abrtd: Executable '/root/nc' doesn't belong to any package Jan 9 11:08:37 intel-8400-8-2 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2013-01-09-11:08:37-2293' exited with 1 Jan 9 11:08:37 intel-8400-8-2 abrtd: Corrupted or bad directory /var/spool/abrt/ccpp-2013-01-09-11:08:37-2293, deleting Jan 9 11:13:13 intel-8400-8-2 dhclient[1251]: DHCPREQUEST on switch to 10.66.78.111 port 67 (xid=0x12d18fac) Jan 9 11:13:13 intel-8400-8-2 dhclient[1251]: DHCPACK from 10.66.78.111 (xid=0x12d18fac) Jan 9 11:13:14 intel-8400-8-2 dhclient[1251]: bound to 10.66.83.216 -- renewal in 6471 seconds. Jan 9 11:21:07 intel-8400-8-2 abrt[11594]: Saved core dump of pid 11571 (/root/nc) to /var/spool/abrt/ccpp-2013-01-09-11:21:06-11571 (214011904 bytes) Jan 9 11:21:07 intel-8400-8-2 abrtd: Directory 'ccpp-2013-01-09-11:21:06-11571' creation detected Jan 9 11:21:07 intel-8400-8-2 abrtd: Executable '/root/nc' doesn't belong to any package Jan 9 11:21:07 intel-8400-8-2 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2013-01-09-11:21:06-11571' exited with 1 Jan 9 11:21:07 intel-8400-8-2 abrtd: Corrupted or bad directory /var/spool/abrt/ccpp-2013-01-09-11:21:06-11571, deleting Jan 9 11:23:38 intel-8400-8-2 dnsmasq-dhcp[18747]: DHCPREQUEST(virbr0) 192.168.122.93 52:54:00:99:72:50 Jan 9 11:23:38 intel-8400-8-2 dnsmasq-dhcp[18747]: DHCPACK(virbr0) 192.168.122.93 52:54:00:99:72:50 > (In reply to comment #18) > > As the problem mentioned in comment 15 & comment 17, what should i do with > > this bug, should i close this one as VERIFIED and open a new one against > > another component? > > Yes, that is a different bug, and looks to be a problem in libxml2, as Rich > suggested. You should file a different BZ and put the component as libxml2 > (for now at least). > > In the meantime, if you no longer encounter the old crash when you run nc in > a loop with the updated netcf installed *but without the unofficial libnl > update*, then you should mark this bug as VERIFIED. Yes, no longer encounter the crash, but just 3 'core dumped', do you think it's OK to mark this bug as VERIFIED?
Created attachment 675236 [details] run nc.c in a loop
Yes. Mark this bug as VERIFIED and open a new bug against libxml2 for the new crashes. If you can collect the coredump, run gdb on it, and attach the output of "thread apply all bt" to that new bug, I'm sure it would be very helpful.
(In reply to comment #22) > Yes. Mark this bug as VERIFIED and open a new bug against libxml2 for the > new crashes. If you can collect the coredump, run gdb on it, and attach the > output of "thread apply all bt" to that new bug, I'm sure it would be very > helpful. Thanks. Move this one to VERIFIED, and i will file a new bug against libxml2.
(In reply to comment #23) > (In reply to comment #22) > > Yes. Mark this bug as VERIFIED and open a new bug against libxml2 for the > > new crashes. If you can collect the coredump, run gdb on it, and attach the > > output of "thread apply all bt" to that new bug, I'm sure it would be very > > helpful. > > Thanks. > Move this one to VERIFIED, and i will file a new bug against libxml2. I can NOT reproduce the crash bug anymore, so will not file a new bug until encounter it again.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0494.html