Bug 721411 - [libvirt] When migrating domain libvirt start logging the error " virNetSocketReadWire:826 : Cannot recv data: Input/output error" ~8000 times a sec
Summary: [libvirt] When migrating domain libvirt start logging the error " virNetSocke...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.1
Hardware: Unspecified
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-14 14:15 UTC by David Naori
Modified: 2011-12-06 11:16 UTC (History)
14 users (show)

Fixed In Version: libvirt-0.9.3-5.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 11:16:33 UTC
Target Upstream Version:


Attachments (Terms of Use)
source-log (66.23 KB, application/x-xz)
2011-07-14 14:15 UTC, David Naori
no flags Details
libvirtd log on the destenation machine (15.24 KB, application/x-xz)
2011-07-14 14:19 UTC, David Naori
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1513 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2011-12-06 01:23:30 UTC

Description David Naori 2011-07-14 14:15:18 UTC
Created attachment 512908 [details]
source-log

Description of problem:
When migrating a domain, the source machine starting to log those errors repeatedly- forever.

every migration adds ~8000 lines a second in case of migrating several vms the logs are floded and ends the disk space.

16:37:48.141: 14332: debug : qemuMonitorEmitStop:897 : mon=0x7f8a80166100
16:37:48.141: 14332: debug : qemuProcessHandleStop:471 : Transitioned guest TOEXPORT-06 to paused state due to unknown event
16:37:48.141: 14332: debug : qemuProcessHandleStop:481 : Preserving lock state '(null)'
16:37:48.143: 14332: debug : virDomainFree:2092 : dom=0x1f76360, (VM: name=TOEXPORT-06, uuid=cb206c72-7d64-473d-b784-25e63b7fd055), 
16:37:52.439: 14332: debug : qemuMonitorFree:209 : mon=0x7f8a80166100
16:37:54.356: 14332: debug : virDomainFree:2092 : dom=0x1f748e0, (VM: name=TOEXPORT-06, uuid=cb206c72-7d64-473d-b784-25e63b7fd055), 

16:37:54.364: 14332: error : virNetSocketReadWire:826 : Cannot recv data: Input/output error
16:37:54.364: 14332: error : virNetSocketReadWire:826 : Cannot recv data: Input/output error


Version-Release number of selected component (if applicable):
libvirt-0.9.3-3.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.migrate domain and watch the logs.
  
Actual results:
no disk space left on / directory.

Expected results:


Additional info:
added relevant part of the log int he source and dest host

Comment 1 David Naori 2011-07-14 14:19:46 UTC
Created attachment 512909 [details]
libvirtd log on the destenation machine

Comment 4 Daniel Berrangé 2011-07-14 16:58:09 UTC
I can reproduce the problem. It only occurs if performing peer-2-peer migration using a TLS enabled URI for the migration function.

Comment 5 Daniel Berrangé 2011-07-14 17:17:39 UTC
There are two fixes we need, one fixes it server side, the other client side:

commit 3cfdc57b8553cae95b8849bbcb7a4b227085cec1
Author: Daniel P. Berrange <berrange>
Date:   Fri Jul 8 12:54:29 2011 +0100

    Fix sending of reply to final RPC message
    
    The dispatch for the CLOSE RPC call was invoking the method
    virNetServerClientClose(). This caused the client connection
    to be immediately terminated. This meant the reply to the
    final RPC message was never sent. Prior to the RPC rewrite
    we merely flagged the connection for closing, and actually
    closed it when the next RPC call dispatch had completed.
    
    * daemon/remote.c: Flag connection for a delayed close
    * daemon/stream.c: Update to use new API for closing
      failed connection
    * src/rpc/virnetserverclient.c, src/rpc/virnetserverclient.h:
      Add support for a delayed connection close. Rename the
      virNetServerClientMarkClose method to virNetServerClientImmediateClose
      to clarify its semantics


commit afe8839f011c8c54c429f33ca0e6515fceb4e0fd
Author: Daniel P. Berrange <berrange>
Date:   Fri Jul 8 12:41:06 2011 +0100

    Fix leak of remote driver if final 'CLOSE' RPC call fails
    
    When closing a remote connection we issue a (fairly pointless)
    'CLOSE' RPC call to the daemon. If this fails we skip all the
    cleanup of private data, but the virConnectPtr object still
    gets released as normal. This causes a memory leak. Since the
    CLOSE RPC call is pretty pointless, just carry on freeing the
    remote driver if it fails.
    
    * src/remote/remote_driver.c: Ignore failure to issue CLOSE
      RPC call

Comment 6 Daniel Berrangé 2011-07-15 10:26:37 UTC
This additional patch would prevent the infinite loop ever recurring in the event of similar bugs

http://www.redhat.com/archives/libvir-list/2011-July/msg00946.html

Comment 8 weizhang 2011-07-18 07:59:47 UTC
verify pass on 
kernel-2.6.32-166.el6.x86_64
qemu-kvm-0.12.1.2-2.169.el6.x86_64
libvirt-0.9.3-5.el6.x86_64

steps:
1. prepare tls enabled uri environment
2. prepare a nfs server and mount nfs on both hosts
3. do p2p migration using tls enabled uri
virsh migrate --live --p2p domain qemu+tls://{target ip}/system
4. check /var/log/libvirt/libvirtd.log
there is no Input/output error and migration is success.

can reproduce the bug on libvirt-0.9.3-3.el6.x86_64

Comment 9 errata-xmlrpc 2011-12-06 11:16:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html


Note You need to log in before you can comment on or make changes to this bug.