Description of problem: libvirtd appears to close a connection while the client (VDSM) is still reading from it during a migration. In the libvirtd logs we see the following closure of fd 22 that appears to be used here : libvirtd.log.1 124068 2013-04-11 12:17:20.214+0000: 3874: debug : virObjectUnref:137 : OBJECT_DISPOSE: obj=0x7f223805d3d0 124069 2013-04-11 12:17:20.214+0000: 3874: debug : virFileClose:72 : Closed fd 22 On the client (VDSM) we see a virNetSocketReadWire error : libvirt.log 2198 2013-04-11 12:17:20.214+0000: 10889: error : virNetSocketReadWire:1176 : Cannot recv data: Connection reset by peer This then causes the vdsmd service to restart itself : vdsm.log 7268 Thread-3491::DEBUG::2013-04-11 14:16:48,961::libvirtvm::454::vm.Vm::(_startUnderlyingMigration) vmId=`b7b3e8a8-0426-4d93-91dc-0b900040df06`::starting migration to qemu+tls://10.99.50.15/system [..] 7450 Thread-3491::ERROR::2013-04-11 14:17:20,214::libvirtconnection::93::vds::(wrapper) connection to libvirt broken. taking vdsm down. [..] 7752 MainThread::INFO::2013-04-11 14:17:22,035::vdsm::94::vds::(run) VDSM main thread ended. Waiting for 24 other threads.. [..] 7797 MainThread::INFO::2013-04-11 14:17:22,297::vdsm::88::vds::(run) I am the actual vdsm 4.10-1.8 hyp06.rhev.antagonist.nl (2.6.32-358.2.1.el6.x86_64) libvirtd.log, libvirt.log (collected by VDSM) and vdsm.log will be attached. Version-Release number of selected component (if applicable): libvirt-0.10.2-18.el6.x86_64 vdsm-4.10.2-1.6.el6.x86_64 (client) How reproducible: Unclear at this time, appears to be a single hit in this env. Steps to Reproduce: Unclear. Actual results: Client connection closed early by libvirtd? Expected results: Client connection not closed early. Additional info:
Moving this BZ to the vdsm component. Jiri highlighted over IRC that libvirtd is closing the connection due to the use of keepalives here : 124047 2013-04-11 12:17:20.213+0000: 3874: warning : virKeepAliveTimerInternal:156 : No response from client 0xf27000 after 5 keepalive messages in 30 seconds [..] These are once again enabled by default in 6.4 (BZ#832081) and in this case the onus is on vdsmd to recover, reconnect etc. Do we need to disable keepalives, would BZ#949192 help here? Lee
*** This bug has been marked as a duplicate of bug 951576 ***
Yaniv, Please check the question in comment #5
This appears to occur only because keepalives are enabled and don't always work reliably in all libvirt versions (notably, libvirt-0.10.2-18.el6.x86_64 mentioned in the initial report). Subsequent libvirt fixes that have been made and/or disabling vdsm's use of libvirt connection keepalives (bug 834041) would each be sufficient to prevent this problem; marking as duplicate. *** This bug has been marked as a duplicate of bug 834041 ***