Bug 951571 - libvirtd closes a connection while the client is still reading during migration.
Summary: libvirtd closes a connection while the client is still reading during migration.
Keywords:
Status: CLOSED DUPLICATE of bug 834041
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.1.3
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 3.2.0
Assignee: Yaniv Bronhaim
QA Contact: Elad
URL:
Whiteboard: infra
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-04-12 13:47 UTC by Lee Yarwood
Modified: 2018-12-01 15:40 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-04-22 14:02:51 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Lee Yarwood 2013-04-12 13:47:14 UTC
Description of problem:

libvirtd appears to close a connection while the client (VDSM) is still reading from it during a migration. 

In the libvirtd logs we see the following closure of fd 22 that appears to be used here :

libvirtd.log.1

124068 2013-04-11 12:17:20.214+0000: 3874: debug : virObjectUnref:137 : OBJECT_DISPOSE: obj=0x7f223805d3d0
124069 2013-04-11 12:17:20.214+0000: 3874: debug : virFileClose:72 : Closed fd 22

On the client (VDSM) we see a virNetSocketReadWire error :

libvirt.log 

2198 2013-04-11 12:17:20.214+0000: 10889: error : virNetSocketReadWire:1176 : Cannot recv data: Connection reset by peer 

This then causes the vdsmd service to restart itself :

vdsm.log 

7268 Thread-3491::DEBUG::2013-04-11 14:16:48,961::libvirtvm::454::vm.Vm::(_startUnderlyingMigration) vmId=`b7b3e8a8-0426-4d93-91dc-0b900040df06`::starting migration to qemu+tls://10.99.50.15/system
[..]
7450 Thread-3491::ERROR::2013-04-11 14:17:20,214::libvirtconnection::93::vds::(wrapper) connection to libvirt broken. taking vdsm down.
[..]
7752 MainThread::INFO::2013-04-11 14:17:22,035::vdsm::94::vds::(run) VDSM main thread ended. Waiting for 24 other threads..
[..]
7797 MainThread::INFO::2013-04-11 14:17:22,297::vdsm::88::vds::(run) I am the actual vdsm 4.10-1.8 hyp06.rhev.antagonist.nl (2.6.32-358.2.1.el6.x86_64)

libvirtd.log, libvirt.log (collected by VDSM) and vdsm.log will be attached.

Version-Release number of selected component (if applicable):
libvirt-0.10.2-18.el6.x86_64
vdsm-4.10.2-1.6.el6.x86_64 (client)


How reproducible:
Unclear at this time, appears to be a single hit in this env.

Steps to Reproduce:
Unclear.
  
Actual results:
Client connection closed early by libvirtd?

Expected results:
Client connection not closed early.

Additional info:

Comment 4 Lee Yarwood 2013-04-16 10:20:15 UTC
Moving this BZ to the vdsm component. Jiri highlighted over IRC that libvirtd is closing the connection due to the use of keepalives here :

124047 2013-04-11 12:17:20.213+0000: 3874: warning : virKeepAliveTimerInternal:156 : No response from client 0xf27000 after 5 keepalive messages in 30 seconds
[..]

These are once again enabled by default in 6.4 (BZ#832081) and in this case the onus is on vdsmd to recover, reconnect etc. 

Do we need to disable keepalives, would BZ#949192 help here?

Lee

Comment 6 Barak 2013-04-18 08:07:38 UTC

*** This bug has been marked as a duplicate of bug 951576 ***

Comment 7 Barak 2013-04-18 18:56:19 UTC
Yaniv,

Please check the question in comment #5

Comment 9 Greg Padgett 2013-04-22 14:02:51 UTC
This appears to occur only because keepalives are enabled and don't always work reliably in all libvirt versions (notably, libvirt-0.10.2-18.el6.x86_64 mentioned in the initial report).

Subsequent libvirt fixes that have been made and/or disabling vdsm's use of libvirt connection keepalives (bug 834041) would each be sufficient to prevent this problem; marking as duplicate.

*** This bug has been marked as a duplicate of bug 834041 ***


Note You need to log in before you can comment on or make changes to this bug.