Bug 721411

Summary: [libvirt] When migrating domain libvirt start logging the error " virNetSocketReadWire:826 : Cannot recv data: Input/output error" ~8000 times a sec
Product: Red Hat Enterprise Linux 6 Reporter: David Naori <dnaori>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: urgent    
Version: 6.1CC: berrange, dallan, danken, dnaori, dyuan, gren, hateya, mgoldboi, mzhan, nzhang, rwu, veillard, weizhan, ykaul
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.9.3-5.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 11:16:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
source-log
none
libvirtd log on the destenation machine none

Description David Naori 2011-07-14 14:15:18 UTC
Created attachment 512908 [details]
source-log

Description of problem:
When migrating a domain, the source machine starting to log those errors repeatedly- forever.

every migration adds ~8000 lines a second in case of migrating several vms the logs are floded and ends the disk space.

16:37:48.141: 14332: debug : qemuMonitorEmitStop:897 : mon=0x7f8a80166100
16:37:48.141: 14332: debug : qemuProcessHandleStop:471 : Transitioned guest TOEXPORT-06 to paused state due to unknown event
16:37:48.141: 14332: debug : qemuProcessHandleStop:481 : Preserving lock state '(null)'
16:37:48.143: 14332: debug : virDomainFree:2092 : dom=0x1f76360, (VM: name=TOEXPORT-06, uuid=cb206c72-7d64-473d-b784-25e63b7fd055), 
16:37:52.439: 14332: debug : qemuMonitorFree:209 : mon=0x7f8a80166100
16:37:54.356: 14332: debug : virDomainFree:2092 : dom=0x1f748e0, (VM: name=TOEXPORT-06, uuid=cb206c72-7d64-473d-b784-25e63b7fd055), 

16:37:54.364: 14332: error : virNetSocketReadWire:826 : Cannot recv data: Input/output error
16:37:54.364: 14332: error : virNetSocketReadWire:826 : Cannot recv data: Input/output error


Version-Release number of selected component (if applicable):
libvirt-0.9.3-3.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.migrate domain and watch the logs.
  
Actual results:
no disk space left on / directory.

Expected results:


Additional info:
added relevant part of the log int he source and dest host

Comment 1 David Naori 2011-07-14 14:19:46 UTC
Created attachment 512909 [details]
libvirtd log on the destenation machine

Comment 4 Daniel Berrangé 2011-07-14 16:58:09 UTC
I can reproduce the problem. It only occurs if performing peer-2-peer migration using a TLS enabled URI for the migration function.

Comment 5 Daniel Berrangé 2011-07-14 17:17:39 UTC
There are two fixes we need, one fixes it server side, the other client side:

commit 3cfdc57b8553cae95b8849bbcb7a4b227085cec1
Author: Daniel P. Berrange <berrange>
Date:   Fri Jul 8 12:54:29 2011 +0100

    Fix sending of reply to final RPC message
    
    The dispatch for the CLOSE RPC call was invoking the method
    virNetServerClientClose(). This caused the client connection
    to be immediately terminated. This meant the reply to the
    final RPC message was never sent. Prior to the RPC rewrite
    we merely flagged the connection for closing, and actually
    closed it when the next RPC call dispatch had completed.
    
    * daemon/remote.c: Flag connection for a delayed close
    * daemon/stream.c: Update to use new API for closing
      failed connection
    * src/rpc/virnetserverclient.c, src/rpc/virnetserverclient.h:
      Add support for a delayed connection close. Rename the
      virNetServerClientMarkClose method to virNetServerClientImmediateClose
      to clarify its semantics


commit afe8839f011c8c54c429f33ca0e6515fceb4e0fd
Author: Daniel P. Berrange <berrange>
Date:   Fri Jul 8 12:41:06 2011 +0100

    Fix leak of remote driver if final 'CLOSE' RPC call fails
    
    When closing a remote connection we issue a (fairly pointless)
    'CLOSE' RPC call to the daemon. If this fails we skip all the
    cleanup of private data, but the virConnectPtr object still
    gets released as normal. This causes a memory leak. Since the
    CLOSE RPC call is pretty pointless, just carry on freeing the
    remote driver if it fails.
    
    * src/remote/remote_driver.c: Ignore failure to issue CLOSE
      RPC call

Comment 6 Daniel Berrangé 2011-07-15 10:26:37 UTC
This additional patch would prevent the infinite loop ever recurring in the event of similar bugs

http://www.redhat.com/archives/libvir-list/2011-July/msg00946.html

Comment 8 weizhang 2011-07-18 07:59:47 UTC
verify pass on 
kernel-2.6.32-166.el6.x86_64
qemu-kvm-0.12.1.2-2.169.el6.x86_64
libvirt-0.9.3-5.el6.x86_64

steps:
1. prepare tls enabled uri environment
2. prepare a nfs server and mount nfs on both hosts
3. do p2p migration using tls enabled uri
virsh migrate --live --p2p domain qemu+tls://{target ip}/system
4. check /var/log/libvirt/libvirtd.log
there is no Input/output error and migration is success.

can reproduce the bug on libvirt-0.9.3-3.el6.x86_64

Comment 9 errata-xmlrpc 2011-12-06 11:16:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html