Hide Forgot
Created attachment 513519 [details] core file, libvirtd and vdsm logs Environment: RHEVM 3.0 on dev env, last commit 12b1f476f4f1a01bb86cb5687c19f4ef0f0784c6 libvirt-0.9.3-5.el6.x86_64, vdsm-4.9-81.el6.x86_64 Scenario: Running automation test that perform various VM operations. from dmesg: libvirtd[6685]: segfault at 0 ip (null) sp 00007fffc2f5eee8 error 14 in libvirtd[400000+107000] from gdb: Reading symbols from /lib64/libnss_dns-2.12.so...Reading symbols from /usr/lib/debug/lib64/libnss_dns-2.12.so.debug...done. done. Loaded symbols for /lib64/libnss_dns-2.12.so Core was generated by `libvirtd --daemon --listen'. Program terminated with signal 11, Segmentation fault. #0 0x0000000000000000 in ?? () Missing separate debuginfos, use: debuginfo-install krb5-libs-1.9-9.el6.x86_64 libcurl-7.19.7-26.el6.x86_64 libgcrypt-1.4.5-5.el6.x86_64 openssl-1.0.0-10.el6.x86_64 (gdb) bt #0 0x0000000000000000 in ?? () #1 0x0000003031a34150 in _gnutls_string_resize (dest=0x7ff500077788, new_size=<value optimized out>) at gnutls_str.c:192 #2 0x0000003031a1a614 in _gnutls_io_read_buffered (session=0x7ff500076ae0, iptr=0x7fffc2f5efe8, sizeOfPtr=5, recv_type=<value optimized out>) at gnutls_buffers.c:515 #3 0x0000003031a16031 in _gnutls_recv_int (session=0x7ff500076ae0, type=GNUTLS_APPLICATION_DATA, htype=4294967295, data=0x7ff50002f748 "", sizeofdata=4) at gnutls_record.c:904 #4 0x00007ff516a4145d in virNetTLSSessionRead (sess=<value optimized out>, buf=<value optimized out>, len=<value optimized out>) at rpc/virnettlscontext.c:812 #5 0x00007ff516a3d05d in virNetSocketReadWire (sock=0x7ff500012fe0, buf=0x7ff50002f748 "", len=4) at rpc/virnetsocket.c:801 #6 0x00007ff516a3d2d0 in virNetSocketRead (sock=0x7ff500012fe0, buf=0x7ff50002f748 "", len=4) at rpc/virnetsocket.c:981 #7 0x00007ff516a395ed in virNetClientIOReadMessage (client=0x7ff50002f6f0) at rpc/virnetclient.c:717 #8 virNetClientIOHandleInput (client=0x7ff50002f6f0) at rpc/virnetclient.c:736 #9 0x00007ff516a3add0 in virNetClientIncomingEvent (sock=0x7ff500012fe0, events=<value optimized out>, opaque=0x7ff50002f6f0) at rpc/virnetclient.c:1127 #10 0x00007ff5169986b2 in virEventPollDispatchHandles () at util/event_poll.c:469 #11 virEventPollRunOnce () at util/event_poll.c:610 #12 0x00007ff516997567 in virEventRunDefaultImpl () at util/event.c:247 #13 0x000000000043c9dd in virNetServerRun (srv=0x7aaae0) at rpc/virnetserver.c:662 #14 0x000000000041d828 in main (argc=<value optimized out>, argv=<value optimized out>) at libvirtd.c:1552 (gdb)
i managed to reproduce this issue too. it happens when migrating the same domain back and forth between 2 hosts. libvirtd may hang or crash. Easy reproduce step is running RUTH automatic test: migrateVmTests.MigrateVmTest.migrationSize: def migrationSize(self): initSize = self.agent1.getLVSize(self.sdUUID, self.volUUID) self.log.debug("initSize: %s", initSize) self.vm.migrate(self.client2) midSize = self.agent1.getLVSize(self.sdUUID, self.volUUID) self.log.debug("midSize: %s", midSize) self.assertEqual(initSize, midSize) self.vm.migrate(self.client) endSize = self.agent1.getLVSize(self.sdUUID, self.volUUID) self.log.debug("endSize: %s", endSize) self.assertEqual(initSize, endSize) bt full of the core dump: #1 0x0000003b4ac34150 in _gnutls_string_resize (dest=0x7f9f240cb0e8, new_size=<value optimized out>) at gnutls_str.c:192 unused = 0 alloc_len = 512 #2 0x0000003b4ac1a614 in _gnutls_io_read_buffered (session=0x7f9f240ca440, iptr=0x7fff66112478, sizeOfPtr=5, recv_type=<value optimized out>) at gnutls_buffers.c:515 ret = 0 ret2 = 0 min = <value optimized out> buf_pos = <value optimized out> buf = <value optimized out> recvlowat = 0 recvdata = <value optimized out> #3 0x0000003b4ac16031 in _gnutls_recv_int (session=0x7f9f240ca440, type=GNUTLS_APPLICATION_DATA, htype=4294967295, data=0x7f9f240cfdd8 "", sizeofdata=4) at gnutls_record.c:904 tmp = <value optimized out> decrypted_length = <value optimized out> version = <value optimized out> headers = 0x0 recv_type = <value optimized out> length = <value optimized out> recv_data = 0x6f00000006 <Address 0x6f00000006 out of bounds> ret = <value optimized out> ret2 = <value optimized out> header_size = 5 empty_packet = <value optimized out> #4 0x00007f9f3f72045d in virNetTLSSessionRead (sess=<value optimized out>, buf=<value optimized out>, len=<value optimized out>) at rpc/virnettlscontext.c:812 ret = <value optimized out> #5 0x00007f9f3f71c05d in virNetSocketReadWire (sock=0x7f9f240c6300, buf=0x7f9f240cfdd8 "", len=4) at rpc/virnetsocket.c:801 errout = 0x0 ret = <value optimized out> __FUNCTION__ = "virNetSocketReadWire" #6 0x00007f9f3f71c2d0 in virNetSocketRead (sock=0x7f9f240c6300, buf=0x7f9f240cfdd8 "", len=4) at rpc/virnetsocket.c:981 No locals. #7 0x00007f9f3f7185ed in virNetClientIOReadMessage (client=0x7f9f240cfd80) at rpc/virnetclient.c:717 wantData = <value optimized out> ret = <value optimized out> #8 virNetClientIOHandleInput (client=0x7f9f240cfd80) at rpc/virnetclient.c:736 ret = <value optimized out> #9 0x00007f9f3f719dd0 in virNetClientIncomingEvent (sock=0x7f9f240c6300, events=<value optimized out>, opaque=0x7f9f240cfd80) at rpc/virnetclient.c:1127 client = 0x7f9f240cfd80 __func__ = "virNetClientIncomingEvent" __FUNCTION__ = "virNetClientIncomingEvent" #10 0x00007f9f3f6776b2 in virEventPollDispatchHandles () at util/event_poll.c:469 cb = 0x7f9f3f71bb80 <virNetSocketEventHandle> watch = 10 opaque = 0x7f9f240c6300 hEvents = 1 i = 8 n = <value optimized out> #11 virEventPollRunOnce () at util/event_poll.c:610 fds = 0x1d3b020 ret = <value optimized out> timeout = <value optimized out> nfds = 9 __func__ = "virEventPollRunOnce" __FUNCTION__ = "virEventPollRunOnce" #12 0x00007f9f3f676567 in virEventRunDefaultImpl () at util/event.c:247 ---Type <return> to continue, or q <return> to quit--- __func__ = "virEventRunDefaultImpl" #13 0x000000000043c9dd in virNetServerRun (srv=0x1ccaae0) at rpc/virnetserver.c:662 timerid = -1 timerActive = 0 i = <value optimized out> __FUNCTION__ = "virNetServerRun" __func__ = "virNetServerRun" #14 0x000000000041d828 in main (argc=<value optimized out>, argv=<value optimized out>) at libvirtd.c:1552 srv = 0x1ccaae0 remote_config_file = 0x1cca1f0 "/etc/libvirt/libvirtd.conf" statuswrite = -1 ret = 1 pid_file = 0x1cd65f0 "/var/run/libvirtd.pid" sock_file = 0x1cd67e0 "/var/run/libvirt/libvirt-sock" sock_file_ro = 0x1cd67b0 "/var/run/libvirt/libvirt-sock-ro" timeout = -1 verbose = 0 godaemon = 1 ipsock = 1 config = 0x1cca890 privileged = true opts = {{name = 0x4c6fca "verbose", has_arg = 0, flag = 0x7fff661128f4, val = 1}, {name = 0x4c6fd2 "daemon", has_arg = 0, flag = 0x7fff661128f0, val = 1}, {name = 0x4db21e "listen", has_arg = 0, flag = 0x7fff661128ec, val = 1}, {name = 0x4d242a "config", has_arg = 1, flag = 0x0, val = 102}, {name = 0x4e47fc "timeout", has_arg = 1, flag = 0x0, val = 116}, {name = 0x4ea520 "pid-file", has_arg = 1, flag = 0x0, val = 112}, {name = 0x4d5725 "version", has_arg = 0, flag = 0x0, val = 129}, {name = 0x4d5912 "help", has_arg = 0, flag = 0x0, val = 63}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} __func__ = "main"
I knew why it did sound familiar, Wen Congyang reported the issue: https://www.redhat.com/archives/libvir-list/2011-July/msg00547.html there is a problem of calling virNetSocketFree while the structures are still in use. Could very much be the cause for 722748 too, Daniel
Hum there is a problem with the libvirtd.log . In case of crash it happens a dump of the debug data generated in the last operations, see for example in the libvirtd.log from 722748, but there there is none, so that's not the libvirtd log of a crashed libvirt. Could you reproduce again and let the daemon crash outside of gdb, then collect the log ? thanks, Daniel
This series should fix the problem https://www.redhat.com/archives/libvir-list/2011-July/msg01179.html
Should also add this to improve error reporting for migration https://www.redhat.com/archives/libvir-list/2011-July/msg01201.html
verify pass with kernel-2.6.32-166.el6.x86_64 libvirt-0.9.3-6.el6.x86_64 qemu-kvm-0.12.1.2-2.169.el6.x86_64 reproduce steps: 1. build tls environment from source to target host server: target client: source 2. build tls environment from source to source host with the same cacert.pem server: source client: source on source host test tls environment: 3. # virsh -c qemu+tls://{target ip}/system Welcome to virsh, the virtualization interactive terminal. Type: 'help' for help with commands 'quit' to quit virsh # exit # virsh -c qemu+tls://{source ip}/system Welcome to virsh, the virtualization interactive terminal. Type: 'help' for help with commands 'quit' to quit virsh # exit 4. start a guest on target host which image is on shared nfs mounted on both sides # setsebool virt_use_nfs 1 # iptables -F 5. do migration on source host with command: # virsh -c qemu+tls://{target ip}/system migrate --p2p guest_name qemu+tls://{source ip}/system there is no libvirtd segmentation fault on target host.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1513.html