Description of problem:
libvirtd appears to close a connection while the client (VDSM) is still reading from it during a migration.
In the libvirtd logs we see the following closure of fd 22 that appears to be used here :
libvirtd.log.1
124068 2013-04-11 12:17:20.214+0000: 3874: debug : virObjectUnref:137 : OBJECT_DISPOSE: obj=0x7f223805d3d0
124069 2013-04-11 12:17:20.214+0000: 3874: debug : virFileClose:72 : Closed fd 22
On the client (VDSM) we see a virNetSocketReadWire error :
libvirt.log
2198 2013-04-11 12:17:20.214+0000: 10889: error : virNetSocketReadWire:1176 : Cannot recv data: Connection reset by peer
This then causes the vdsmd service to restart itself :
vdsm.log
7268 Thread-3491::DEBUG::2013-04-11 14:16:48,961::libvirtvm::454::vm.Vm::(_startUnderlyingMigration) vmId=`b7b3e8a8-0426-4d93-91dc-0b900040df06`::starting migration to qemu+tls://10.99.50.15/system
[..]
7450 Thread-3491::ERROR::2013-04-11 14:17:20,214::libvirtconnection::93::vds::(wrapper) connection to libvirt broken. taking vdsm down.
[..]
7752 MainThread::INFO::2013-04-11 14:17:22,035::vdsm::94::vds::(run) VDSM main thread ended. Waiting for 24 other threads..
[..]
7797 MainThread::INFO::2013-04-11 14:17:22,297::vdsm::88::vds::(run) I am the actual vdsm 4.10-1.8 hyp06.rhev.antagonist.nl (2.6.32-358.2.1.el6.x86_64)
libvirtd.log, libvirt.log (collected by VDSM) and vdsm.log will be attached.
Version-Release number of selected component (if applicable):
libvirt-0.10.2-18.el6.x86_64
vdsm-4.10.2-1.6.el6.x86_64 (client)
How reproducible:
Unclear at this time, appears to be a single hit in this env.
Steps to Reproduce:
Unclear.
Actual results:
Client connection closed early by libvirtd?
Expected results:
Client connection not closed early.
Additional info:
Moving this BZ to the vdsm component. Jiri highlighted over IRC that libvirtd is closing the connection due to the use of keepalives here :
124047 2013-04-11 12:17:20.213+0000: 3874: warning : virKeepAliveTimerInternal:156 : No response from client 0xf27000 after 5 keepalive messages in 30 seconds
[..]
These are once again enabled by default in 6.4 (BZ#832081) and in this case the onus is on vdsmd to recover, reconnect etc.
Do we need to disable keepalives, would BZ#949192 help here?
Lee
This appears to occur only because keepalives are enabled and don't always work reliably in all libvirt versions (notably, libvirt-0.10.2-18.el6.x86_64 mentioned in the initial report).
Subsequent libvirt fixes that have been made and/or disabling vdsm's use of libvirt connection keepalives (bug 834041) would each be sufficient to prevent this problem; marking as duplicate.
*** This bug has been marked as a duplicate of bug 834041 ***