Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Created attachment 447395[details]
/var/log/messages of source host
Description of problem:
When concurrent migrate multiple guests, often got this errors:
libvirtd: 12:58:01.714: error : qemuMonitorJSONCommandWithFd:242 : cannot send monitor command '{"execute":"qmp_capabilities"}': Broken pipe
libvirtd: 11:14:41.085: error : qemuMonitorOpenUnix:279 : monitor socket did not show up.: Connection refused
libvirtd: 11:14:41.089: error : qemudWaitForMonitor:2550 : internal error process exited while connecting to monitor: char device redirected to /dev/pts/30#012inet_listen_opts: bind(ipv4,127.0.0.1,5926): Address already in use#012inet_listen_opts: FAILED#012
The error will cause the guest can't be migrated, guest got broke and need to be restart.
For my test, i migrate 40 guests at the same time. And with 36 success, 4 failed. When reverse migrate the 36 guests back, got 29 success, 7 failed. So, more severe when migrate back.
I also did migrate 30 and 20 guests, also got this problem.
I'm using two big boxs, each with 48cpus & 500G mem. The guest is minimum rhel6 guest.
Version-Release number of selected component (if applicable):
RC1 build: 20100826.1
# rpm -q libvirt qemu-kvm kernel
libvirt-0.8.1-27.el6.x86_64
qemu-kvm-0.12.1.2-2.113.el6.x86_64
kernel-2.6.32-71.el6.x86_64
How reproducible:
Often
Steps to Reproduce:
1.concurrent run "virsh migrate --live guestname qemu+ssh://address/system"
2.
3.
Actual results:
Concurrent migrate multiple guests with errors.
Expected results:
Concurrent migrate multiple guests without errors.
Additional info:
For Bi-directional concurrent multiple guests migration, i try with migrate 20 guests bi-directional from the 2 boxs. And also get few guests failed to migrate, 2 failed in one box and 6 failed on another. The error is the same, and there also have:
error: cannot send monitor command '{"execute":"qmp_capabilities"}': Connection reset by peer
Pasting here the explanation I gave in the IRC channel:
[15:14] <DV> gsun: the reason is in libvrt source: daemon/libvirtd.c
[15:14] <DV> static int min_workers = 5;
[15:14] <DV> static int max_workers = 20;
[15:14] <DV> static int max_clients = 20;
[15:15] <DV> in practice we allow only 20 simulaneous connections to a given libvirt daemon
[15:15] <DV> when doing a migration I think we open connections both ways
[15:16] <DV> add 2 connections for virtmanager and you know why only 18 migrations suceedded
[15:16] <DV> and 2 failed with no connections.
So that not fixeable without increasing that value and rebuilding libvirt.
Maybe we should do this ...
Retargetting for 6.1 maybe we can increase the number of connections without
harm
Daniel
actually we can raise the number of connections just from
/etc/libvirt/libvirtd.conf
and that's sufficient for the test:
[15:18] <gsun> DV, oh, i see. But by modify libvirtd.conf can change the max clients, right?
[15:18] <DV> hum
[15:19] <gsun> DV, for last time i did modify it and push the migration to 40 guests and 36 success
[15:19] <DV> ah yes
Daniel
Created attachment 447395 [details] /var/log/messages of source host Description of problem: When concurrent migrate multiple guests, often got this errors: libvirtd: 12:58:01.714: error : qemuMonitorJSONCommandWithFd:242 : cannot send monitor command '{"execute":"qmp_capabilities"}': Broken pipe libvirtd: 11:14:41.085: error : qemuMonitorOpenUnix:279 : monitor socket did not show up.: Connection refused libvirtd: 11:14:41.089: error : qemudWaitForMonitor:2550 : internal error process exited while connecting to monitor: char device redirected to /dev/pts/30#012inet_listen_opts: bind(ipv4,127.0.0.1,5926): Address already in use#012inet_listen_opts: FAILED#012 The error will cause the guest can't be migrated, guest got broke and need to be restart. For my test, i migrate 40 guests at the same time. And with 36 success, 4 failed. When reverse migrate the 36 guests back, got 29 success, 7 failed. So, more severe when migrate back. I also did migrate 30 and 20 guests, also got this problem. I'm using two big boxs, each with 48cpus & 500G mem. The guest is minimum rhel6 guest. Version-Release number of selected component (if applicable): RC1 build: 20100826.1 # rpm -q libvirt qemu-kvm kernel libvirt-0.8.1-27.el6.x86_64 qemu-kvm-0.12.1.2-2.113.el6.x86_64 kernel-2.6.32-71.el6.x86_64 How reproducible: Often Steps to Reproduce: 1.concurrent run "virsh migrate --live guestname qemu+ssh://address/system" 2. 3. Actual results: Concurrent migrate multiple guests with errors. Expected results: Concurrent migrate multiple guests without errors. Additional info: