Bug 982544

Summary: VM Migration failed on rhevm env
Product: Red Hat Enterprise Linux 6 Reporter: Xuesong Zhang <xuzhang>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.5CC: acathrow, bili, dyuan, honzhang, jdenemar, whuang, xuzhang, zpeng
Target Milestone: rcKeywords: TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-10 12:50:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
libvirtd.log on source host
none
libvirtd.log on target host
none
vdsm log on source host
none
vdsm log on target host
none
The libvirtd debug log on target host
none
source libvirtd.log
none
target libvirtd.log
none
backtrace on TargetHost
none
MessageLogOnSourceHost
none
GDB VDSM on SourceHost none

Description Xuesong Zhang 2013-07-09 09:24:37 UTC
Description of problem:
The VM migration on rhevm will be failed.

Version-Release number of selected component (if applicable):
libvirt-0.10.2-19.el6
qemu-kvm-rhev-0.12.1.2-2.377.el6 
spice-server-0.12.3-1.el6
vdsm-4.10.2-23.0.el6ev
kernel-2.6.32-396.el6

How reproducible:
100%

Steps:
1. prepare one rhevm env, register 2 hosts on the rhevm.
2. prepare one health VM on host1.
3. migrate the the VM to host2 on rhevm, it is failed, the VM is still on host1.


Actual results:
The migration will be failed.

Expected results:
The migration should be successfully.

Addtional info:
1. The following related testing are successful.
   1.1 While downgraded the libvirt to last version libvirt-0.10.2-18.el6_4.9.x86_64.rpm, other packages version didn't be changed. The VM migration is successful on rhevm.
   1.2 While using the pure libvirt-0.10.2-19.el6 env, no VDSM package. The migration will be successful with qemu+ssh and qemu+tls.
2. I attached the VDSM and libvirtd log on source and target host.

Comment 1 Xuesong Zhang 2013-07-09 09:30:56 UTC
Created attachment 770874 [details]
libvirtd.log on source host

Comment 2 Xuesong Zhang 2013-07-09 09:32:02 UTC
Created attachment 770876 [details]
libvirtd.log on target host

Comment 3 Xuesong Zhang 2013-07-09 09:32:42 UTC
Created attachment 770877 [details]
vdsm log on source host

Comment 4 Xuesong Zhang 2013-07-09 09:33:34 UTC
Created attachment 770878 [details]
vdsm log on target host

Comment 6 Xuesong Zhang 2013-07-09 09:44:05 UTC
	
2013-Jul-09, 17:13
	
Migration failed (VM: VM2_xuzhang, Source: Host2_xuzhang, Destination: Host1_xuzhang1).
	
2013-Jul-09, 17:13
	
Migration failed . Trying to migrate to another Host (VM: VM2_xuzhang, Source: Host2_xuzhang, Destination: Host1_xuzhang1).
	
2013-Jul-09, 17:12
	
Detected new Host Host2_xuzhang. Host state was set to Up.
	
2013-Jul-09, 17:12
	
Migration started (VM: VM2_xuzhang, Source: Host2_xuzhang, Destination: Host1_xuzhang1, User: admin@internal).

Comment 7 Xuesong Zhang 2013-07-09 09:47:03 UTC
I paste the error message from rhevm Events page on comments 6.

Comment 9 Jiri Denemark 2013-07-09 10:11:05 UTC
This bug report is pretty useless so far. I hope rhev is able to provide a better error message than just "Migration failed". Libvirtd logs contain a lot of unrelated stuff and only errors and warnings. Please, turn on debug logs for libvirtd on both hosts, empty the logs, and try again. Also, is the new libvirt installed on both hosts or just one of them? When you downgrade libvirt, do you do that on both hosts or just one of them?

Comment 10 Xuesong Zhang 2013-07-09 10:20:16 UTC
hi, Jiri,

Here are the answers of your last 2 questions?
Q: is the new libvirt installed on both hosts or just one of them? 
A: yes, the new libvirt are installed on both hosts (the source and target).

Q: When you downgrade libvirt, do you do that on both hosts or just one of them?
A: yes, we downgrade the libvirt on both the hosts (the source and targeet).

BTW, I will upload the filterred libvirtd.log later.

Comment 11 Xuesong Zhang 2013-07-10 05:51:09 UTC
Created attachment 771413 [details]
The libvirtd debug log on target host

Comment 12 Xuesong Zhang 2013-07-10 06:12:34 UTC
On the source host, there isn't any libvirtd debug log generated. So, only upload the libvirtd debug log which is on target host. Please reference the log in Commnet 11.

Comment 13 Jiri Denemark 2013-07-10 06:30:16 UTC
(In reply to Zhang Xuesong from comment #12)
> On the source host, there isn't any libvirtd debug log generated.

You likely need to fix your configuration on source host then.

> So, only upload the libvirtd debug log which is on target host. Please
> reference the log in Commnet 11.

There's no sign of any migration attempt in that log.

Please, provide from both hosts generated during the failed migration.

Comment 14 Xuesong Zhang 2013-07-10 07:42:16 UTC
I'm sure I have changed the configuration file libvirtd.conf like the following on the 2 hosts (source and target):
log_level = 1
log_filters="1:libvirt 3:event 3:json 1:util 1:qemu"
log_outputs="1:file:/var/log/libvirtd_debug.log"

Here is the steps I do last time: 
1. restart the libvirtd and vdsmd service 
2. delete the original generated libvirtd.log
3. start migration
4. after the migration failed, collect the libvirtd.log from source and target hosts. (I can't see any libvirtd.log on source at this step)


This time, I didn't change the settings in libvirtd.conf, leaving it as the vdsm setting automatically:
log_outputs="1:file:/var/log/libvirtd.log"
log_filters="3:virobject 3:virfile 2:virnetlink 3:cgroup 3:event 3:json 1:libvirt 1:util 1:qemu"

Here is the steps this time, then I can get the libvirtd.log.
1. delete the original libvirtd.log
2. restart the libvirtd and vdsmd service
3. start migration
4. after migration failed, collect the libvirtd.log from source and target hosts.



If you still need more information, I can provide my env for your investigating.


(In reply to Jiri Denemark from comment #13)
> (In reply to Zhang Xuesong from comment #12)
> > On the source host, there isn't any libvirtd debug log generated.
> 
> You likely need to fix your configuration on source host then.
> 
> > So, only upload the libvirtd debug log which is on target host. Please
> > reference the log in Commnet 11.
> 
> There's no sign of any migration attempt in that log.
> 
> Please, provide from both hosts generated during the failed migration.

Comment 15 Xuesong Zhang 2013-07-10 07:43:12 UTC
Created attachment 771426 [details]
source libvirtd.log

Comment 16 Xuesong Zhang 2013-07-10 07:43:56 UTC
Created attachment 771427 [details]
target libvirtd.log

Comment 17 Jiri Denemark 2013-07-10 09:19:41 UTC
Good, according to the log from the destination, libvirtd crashed there. Could you provide backtrace of all threads of the crashed daemon?

Comment 18 Xuesong Zhang 2013-07-10 09:27:07 UTC
OK, I will prepare the env and upload theses info to you later.

(In reply to Jiri Denemark from comment #17)
> Good, according to the log from the destination, libvirtd crashed there.
> Could you provide backtrace of all threads of the crashed daemon?

Comment 19 Xuesong Zhang 2013-07-10 12:06:57 UTC
Created attachment 771580 [details]
backtrace on TargetHost

Comment 20 Xuesong Zhang 2013-07-10 12:07:45 UTC
Created attachment 771581 [details]
MessageLogOnSourceHost

Comment 21 Xuesong Zhang 2013-07-10 12:08:21 UTC
Created attachment 771582 [details]
GDB VDSM on SourceHost

Comment 22 Xuesong Zhang 2013-07-10 12:13:22 UTC
Hi, Jiri,

Comments 19 is the backtrace on target host for your reference.

We find the VDSM process is died on source host, but we can see it is existed normally while we gdb the VDSM.
Attach the message log and gdb VDSM info in comment 20 and comment 21 for your reference.

Following is the info that VDSM is died on source host while migrating:
vdsm vds ERROR connection to libvirt broken. taking vdsm down. ecode: 38 edom: 7
......
respawn: slave '/usr/share/vdsm/vdsm' died, respawning slave
......


(In reply to Jiri Denemark from comment #17)
> Good, according to the log from the destination, libvirtd crashed there.
> Could you provide backtrace of all threads of the crashed daemon?

Comment 23 Jiri Denemark 2013-07-10 12:50:35 UTC
The backtrace shows libvirtd crashed as a result of a bug introduced by a patch for 977961. This regression is fixed by a follow-up patch mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=977961#c8

*** This bug has been marked as a duplicate of bug 977961 ***