Bug 634069

Summary: Concurrent migrate multiple guests got libvirtd errors
Product: Red Hat Enterprise Linux 6 Reporter: Wayne Sun <gsun>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1CC: dallan, eblake, gren, jialiu, llim, veillard, xen-maint, yoyzhang
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-17 18:29:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/messages of source host
none
/var/log/messages of target host none

Description Wayne Sun 2010-09-15 06:44:40 UTC
Created attachment 447395 [details]
/var/log/messages of source host

Description of problem:
When concurrent migrate multiple guests, often got this errors:

libvirtd: 12:58:01.714: error : qemuMonitorJSONCommandWithFd:242 : cannot send monitor command '{"execute":"qmp_capabilities"}': Broken pipe

libvirtd: 11:14:41.085: error : qemuMonitorOpenUnix:279 : monitor socket did not show up.: Connection refused
libvirtd: 11:14:41.089: error : qemudWaitForMonitor:2550 : internal error process exited while connecting to monitor: char device redirected to /dev/pts/30#012inet_listen_opts: bind(ipv4,127.0.0.1,5926): Address already in use#012inet_listen_opts: FAILED#012

The error will cause the guest can't be migrated, guest got broke and need to be restart. 

For my test, i migrate 40 guests at the same time. And with 36 success, 4 failed. When reverse migrate the 36 guests back, got 29 success, 7 failed. So, more severe when migrate back.
I also did migrate 30 and 20 guests, also got this problem.

I'm using two big boxs, each with 48cpus & 500G mem. The guest is minimum rhel6 guest.

Version-Release number of selected component (if applicable):
RC1 build: 20100826.1
# rpm -q libvirt qemu-kvm kernel
libvirt-0.8.1-27.el6.x86_64
qemu-kvm-0.12.1.2-2.113.el6.x86_64
kernel-2.6.32-71.el6.x86_64

How reproducible:
Often

Steps to Reproduce:
1.concurrent run "virsh migrate --live guestname qemu+ssh://address/system"
2.
3.
  
Actual results:
Concurrent migrate multiple guests with errors.

Expected results:
Concurrent migrate multiple guests without errors.

Additional info:

Comment 1 Wayne Sun 2010-09-15 06:46:50 UTC
Created attachment 447396 [details]
/var/log/messages of target host

Comment 3 Wayne Sun 2010-09-15 07:51:12 UTC
For Bi-directional concurrent multiple guests migration, i try with migrate 20 guests bi-directional from the 2 boxs. And also get few guests failed to migrate, 2 failed in one box and 6 failed on another. The error is the same, and there also have:
error: cannot send monitor command '{"execute":"qmp_capabilities"}': Connection reset by peer

Comment 4 Daniel Veillard 2011-01-12 07:18:36 UTC
Pasting here the explanation I gave in the IRC channel:

[15:14] <DV> gsun: the reason is in libvrt source: daemon/libvirtd.c
[15:14] <DV> static int min_workers = 5;
[15:14] <DV> static int max_workers = 20;
[15:14] <DV> static int max_clients = 20;
[15:15] <DV> in practice we allow only 20 simulaneous connections to a given libvirt daemon
[15:15] <DV> when doing a migration I think we open connections both ways
[15:16] <DV> add 2 connections for virtmanager and you know why only 18 migrations suceedded
[15:16] <DV> and 2 failed with no connections.

So that not fixeable without increasing that value and rebuilding libvirt.
Maybe we should do this ...
Retargetting for 6.1 maybe we can increase the number of connections without
harm

Daniel

Comment 5 Daniel Veillard 2011-01-12 07:30:11 UTC
actually we can raise the number of connections just from
  /etc/libvirt/libvirtd.conf

and that's sufficient for the test:

[15:18] <gsun> DV, oh, i see. But by modify libvirtd.conf can change the max clients, right?
[15:18] <DV> hum
[15:19] <gsun> DV, for last time i did modify it and push the migration to 40 guests and 36 success
[15:19] <DV> ah yes

Daniel