Bug 698138

Summary: concurrent migration with tcp connection will lose guests on target
Product: Red Hat Enterprise Linux 6 Reporter: weizhang <weizhan>
Component: libvirtAssignee: Osier Yang <jyang>
Status: CLOSED WORKSFORME QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: dallan, dyuan, eblake, gren, juzhang, jyang, llim, mzhan, yoyzhang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-12 15:27:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
result guest show
none
libvirtd.log and the log of missing guest none

Description weizhang 2011-04-20 09:56:12 UTC
Created attachment 493417 [details]
result guest show

Description of problem:
I do migration with tcp connection on machine with 4 core 4G mem. I start 20 guests and each with 1 vcpu and 256M mem. I do migration with script mig.sh

#!/bin/sh
for i in {1..20}
do
      virsh migrate --live mig$i qemu+tcp://10.66.82.249/system &
done

after migration, the number of guests on target is less than 20. On source, there is no error reports, and all the guests are in shutoff status.

Version-Release number of selected component (if applicable):
libvirt-0.8.7-18.el6.x86_64
kernel-2.6.32-131.0.1.el6.x86_64
qemu-kvm-0.12.1.2-2.158.el6.x86_64


How reproducible:
100%

Steps to Reproduce:
1. Edit /etc/sysconfig/libvirtd
       LIBVIRTD_ARGS="--listen"

2. Edit /etc/libvirt/libvirtd.conf
       listen_tls = 0
       listen_tcp=1
       auth_tcp="none"
3. run #service libvirtd restart
4. mount nfs on both sides
5. do # setsebool -P virt_use_nfs 1 
   on both sides

6. define and start 20 guest with name mig[n]
7. run # sh mig.sh
cat mig.sh
#!/bin/sh
for i in {1..20}
do
      virsh migrate --live mig$i qemu+tcp://10.66.82.249/system &
done
  
Actual results:
the number of guests on target is less than 20. On source, there is no error reports, and all the guests are in shutoff status.

Expected results:
All the guest can migrate to target host successfully.

Additional info:

Comment 1 Osier Yang 2011-04-20 11:47:01 UTC
Kyla, Could you check if there is error log on destination host?

Comment 2 weizhang 2011-04-20 12:45:23 UTC
Created attachment 493473 [details]
libvirtd.log and the log of missing guest

Comment 3 weizhang 2011-04-20 12:46:10 UTC
(In reply to comment #2)
> Created attachment 493473 [details]
> libvirtd.log and the log of missing guest

The log is on the target

Comment 4 Osier Yang 2011-04-26 10:50:23 UTC
there is no useful log in mig5.log, but in libvirtd log, we can see domain mig5 is crashed.


20:23:12.403: 28473: debug : qemuMonitorIO:601 : Triggering EOF callback error? 0
20:23:12.403: 28476: debug : virDomainObjRef:971 : obj=0x186d210 refs=3
20:23:12.403: 28473: debug : qemuHandleMonitorEOF:741 : Received EOF on 0x18618e0 'mig5'
20:23:12.403: 28476: debug : virGetDomain:381 : New hash entry 0x7f319815ffd0
20:23:12.403: 28476: debug : qemuMonitorStartCPUs:954 : mon=0x7f318c09dfe0
20:23:12.403: 28476: debug : virJSONValueToString:1042 : object=0x7f319809cf60
20:23:12.403: 28473: debug : qemuHandleMonitorEOF:756 : Monitor connection to 'mig5' closed without SHUTDOWN event; assuming the domain crashed
20:23:12.403: 28476: debug : virJSONValueToStringOne:976 : object=0x7f319809cf60 type=0 gen=0x7f319818ac70
20:23:12.403: 28473: debug : qemudShutdownVMDaemon:3460 : Shutting down VM 'mig5' pid=28891 migrated=0

Comment 5 Dave Allan 2011-06-08 20:12:00 UTC
(In reply to comment #4)
> there is no useful log in mig5.log, but in libvirtd log, we can see domain mig5
> is crashed.

Given that, isn't this not a bug?

Comment 7 Osier Yang 2011-06-21 02:43:56 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > there is no useful log in mig5.log, but in libvirtd log, we can see domain mig5
> > is crashed.
> 
> Given that, isn't this not a bug?

This shouldn't be a bug of libvirt, but may be one bug of qemu, it fails on loading the domain on destination host, and shutdown the domain silently, and the testing used "live" migration, that's why the guest disappeared on destination host after migration.

Comment 9 Dave Allan 2011-11-23 02:39:40 UTC
Do we know why the guests crashed on the dst host?

Comment 10 Dave Allan 2011-12-06 14:25:16 UTC
Is this a simple failure to allocate memory?  20 guests @ 256MB/guest == 5GB RAM.  According to the description, the box only has 4GB RAM...am I missing something?

Comment 11 Dave Allan 2011-12-06 14:26:51 UTC
I'm a little suspicious of this BZ because the oVirt guys do this kind of test all the time and they're not reporting failures, and they're pretty vocal about that kind of thing.

Comment 12 weizhang 2011-12-12 09:06:41 UTC
I have already tested on 
kernel-2.6.32-220.el6.x86_64
libvirt-0.9.8-1.el6.x86_64
qemu-kvm-0.12.1.2-2.209.el6.x86_64

seems can not reproduce 

I test on 3 cores 4G mem machine, because I use the empty guests, so I can start them successfully.

Comment 13 Dave Allan 2011-12-12 15:27:03 UTC
(In reply to comment #12)
> seems can not reproduce 
> 
> I test on 3 cores 4G mem machine, because I use the empty guests, so I can
> start them successfully.

Ok, I'm going to close as WORKSFORME for now.