Description of problem: On the Nahalem host,sometimes both dst and src guest become responseless after implement migration. So far,can not reproduce the issue in other CPU by the same scenario. from the src guest.it seems something wrong with/after (?) the last 10M ram copy After printing the notification message. I can not input anything to monitor even Ctrl+C. CLI: root 19373 18512 0 13:33 pts/1 00:00:25 /usr/libexec/qemu-kvm -drive file=/mnt/rhel53-64-ser-virtio.qcow2,if=virtio,cache=off,index=0,boot=on -net nic,macaddr=20:20:20:00:18:59,model=virtio -net tap,script=/etc/qemu-ifup -rtc-td-hack -no-hpet -usbdevice tablet -cpu qemu64,+sse2 -smp 2 -m 4096 -vnc :12 -monitor stdio -boot c -incoming tcp:0:5800 guest is a fresh installed 5u3 x86_64 server (default installation) (qemu) migrate -d tcp:xx.xx.xx.xx:5800 (qemu) migration_notify_done (qemu) info migrate Migration status: active transferred ram: 44340 kbytes remaining ram: 4170456 kbytes total ram: 4214796 kbytes (qemu) info migrate Migration status: active transferred ram: 64312 kbytes remaining ram: 4150484 kbytes total ram: 4214796 kbytes (qemu) info migrate Migration status: active transferred ram: 4093728 kbytes remaining ram: 121068 kbytes total ram: 4214796 kbytes (qemu) info migrate Migration status: active transferred ram: 4107992 kbytes remaining ram: 106804 kbytes total ram: 4214796 kbytes (qemu) migration_notify_done top on src host: Tasks: 5 total, 0 running, 5 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 12291060k total, 6143348k used, 6147712k free, 17480k buffers Swap: 10342392k total, 20436k used, 10321956k free, 1728108k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19373 root 25 0 4357m 4.1g 3900 S 0.0 34.7 0:18.18 qemu-kvm 19380 root 15 0 4357m 4.1g 3900 S 0.0 34.7 0:00.02 qemu-kvm 19381 root 15 0 4357m 4.1g 3900 S 0.0 34.7 0:03.39 qemu-kvm 19382 root 15 0 4357m 4.1g 3900 S 0.0 34.7 0:03.27 qemu-kvm 19384 root 15 0 4357m 4.1g 3900 S 0.0 34.7 0:00.00 qemu-kvm strace on src # strace -p 19373 -p 19380 -p 19381 -p 19382 -p 19384 Process 19373 attached - interrupt to quit Process 19380 attached - interrupt to quit Process 19381 attached - interrupt to quit Process 19382 attached - interrupt to quit Process 19384 attached - interrupt to quit [pid 19382] futex(0xb63040, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 19384] rt_sigtimedwait([ALRM IO], <unfinished ...> [pid 19381] futex(0xb63040, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 19380] rt_sigtimedwait([USR2], <unfinished ...> [pid 19373] write(18, "\1\0\0\0\0\0\0\0", 8 top on dst: Tasks: 5 total, 0 running, 5 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 12291044k total, 5838552k used, 6452492k free, 99180k buffers Swap: 10342392k total, 0k used, 10342392k free, 1344168k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 30111 root 15 0 4314m 4.0g 3824 S 0.0 34.5 0:01.44 qemu-kvm 30118 root 18 0 4314m 4.0g 3824 S 0.0 34.5 0:00.00 qemu-kvm 30119 root 18 0 4314m 4.0g 3824 S 0.0 34.5 0:00.00 qemu-kvm 30120 root 15 0 4314m 4.0g 3824 S 0.0 34.5 0:00.00 qemu-kvm 30122 root 18 0 4314m 4.0g 3824 S 0.0 34.5 0:00.00 qemu-kvm strace on dst: # strace -p 30111 -p 30118 -p 30119 -p 30120 -p 30122 Process 30111 attached - interrupt to quit Process 30118 attached - interrupt to quit Process 30119 attached - interrupt to quit Process 30120 attached - interrupt to quit Process 30122 attached - interrupt to quit [pid 30120] futex(0xb63040, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30122] rt_sigtimedwait([ALRM IO], <unfinished ...> [pid 30119] futex(0xb63040, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> [pid 30118] rt_sigtimedwait([USR2], <unfinished ...> [pid 30111] recvfrom(21, Version-Release number of selected component (if applicable): [root@dhcp-66-70-6 ~]# cat /etc/redhat-release Red Hat Enterprise Virtualization Hypervisor release 5.4-2.0.99 (7.1) [root@dhcp-66-70-6 ~]# rpm -q kvm kvm-83-77.el5 [root@dhcp-66-70-6 ~]# rpm -q kernel kernel-2.6.18-154.el5 How reproducible: 2/10 Steps to Reproduce: 1.install a new rhel5u3 x86_64 server guest 2.run the guest on src host and start listen mode on dst host. 3.implement migration Actual results: Expected results: Additional info: Host information CPU: processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz stepping : 4 cpu MHz : 1600.000 cache size : 8192 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 7 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc ida nonstop_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr popcnt lahf_lm bogomips : 5319.97 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: [8] # free -m total used free shared buffers cached Mem: 12002 5701 6301 0 96 1312 -/+ buffers/cache: 4291 7711 Swap: 10099 0 10099
Not that I see how it may be related, but perhaps it's worth to test with EPT and then without VPID?
Can reproduce. ( it happened on the second attempt ) # rmmod kvm_intel # modprobe kvm_intel enable_ept=1 enable_vpid=0 # dmesg kvm: virtualization flags detected on this hardware: vmx tpr_shadow vnmi flexpriority ept vpid loaded kvm module (kvm-83-77.el5) ^^ please correct me, if i am wrong
Does the strace output advances or stay stuck? Also get 'pstack PID' output. Anything on dmesg? Can you reduce the guest memory to 2G and recheck? What happens if you disable ept completely?
(In reply to comment #3) > Does the strace output advances or stay stuck? Also get 'pstack PID' output. I have the migration ran more than 8h. that is the whole strace output. > Anything on dmesg? only logs like *cpu0 unimplemented perfctr wrmsr:* > Can you reduce the guest memory to 2G and recheck? > What happens if you disable ept completely? I am retesting. and I disable ept by: # rmmod kvm_intel # modprobe kvm_intel enable_ept=0 is it enough for *disable ept completely* ?
with 2G ram. reproduced : Note: ept is enabled . I am running the test without ept now [root@dhcp-66-70-6 ~]# pstack 2156 Thread 5 (Thread 0x4149c940 (LWP 2163)): #0 0x00002af01b78f744 in do_sigwaitinfo () from /lib64/libc.so.6 #1 0x00002af01b78f7fd in sigwaitinfo () from /lib64/libc.so.6 #2 0x0000000000419ff1 in fprintf () #3 0x00002af01a5a14a7 in start_thread () from /lib64/libpthread.so.0 #4 0x00002af01b831b7d in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x41ece940 (LWP 2164)): #0 0x00002af01a5a8174 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00002af01a5a3aca in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x00002af01a5a398c in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000516544 in fprintf () #4 0x0000000000516b64 in fprintf () #5 0x00002af01a5a14a7 in start_thread () from /lib64/libpthread.so.0 #6 0x00002af01b831b7d in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x428cf940 (LWP 2165)): #0 0x00002af01a5a8174 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00002af01a5a3aca in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x00002af01a5a398c in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000516544 in fprintf () #4 0x0000000000516b64 in fprintf () #5 0x00002af01a5a14a7 in start_thread () from /lib64/libpthread.so.0 #6 0x00002af01b831b7d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x43cd1940 (LWP 2167)): #0 0x00002af01b78f744 in do_sigwaitinfo () from /lib64/libc.so.6 #1 0x00002af01b78f7fd in sigwaitinfo () from /lib64/libc.so.6 #2 0x0000000000419ff1 in fprintf () #3 0x00002af01a5a14a7 in start_thread () from /lib64/libpthread.so.0 #4 0x00002af01b831b7d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x2af01f4be080 (LWP 2156)): #0 0x00002af01a5a854b in write () from /lib64/libpthread.so.0 #1 0x00000000005163f3 in fprintf () #2 0x0000000000409cb3 in fprintf () #3 0x000000000047f895 in fprintf () #4 0x00000000004b3276 in fprintf () #5 0x00000000004b3336 in fprintf () #6 0x0000000000486587 in fprintf () #7 0x000000000047fa23 in fprintf () #8 0x00000000004808a9 in fprintf () #9 0x000000000040712c in fprintf () #10 0x00000000004099b7 in fprintf () #11 0x00000000005167fa in fprintf () #12 0x000000000040e415 in fprintf () #13 0x00002af01b77b994 in __libc_start_main () from /lib64/libc.so.6 #14 0x0000000000406be9 in fprintf () #15 0x00007fff0f319ff8 in ?? () #16 0x0000000000000000 in ?? () [root@dhcp-66-70-6 ~]#
> > What happens if you disable ept completely? > > I am retesting. and I disable ept by: > # rmmod kvm_intel > # modprobe kvm_intel enable_ept=0 > > is it enough for *disable ept completely* ? Can reproduce when ept is disabled How reproducible: 1/9 (the ninth attempt failed ) on another host : processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz run the same steps x20 . ==> can not reproduce.
Please note the test result in comment #7 is not on an i7 host, though it is more difficult to reproduce . and we can reproduce the issue in :Red Hat Enterprise Virtualization Hypervisor release 5.4-2.0.99 (9.1)
(In reply to comment #5) > [root@dhcp-66-70-6 ~]# pstack 2156 .. > Thread 1 (Thread 0x2af01f4be080 (LWP 2156)): > #0 0x00002af01a5a854b in write () from /lib64/libpthread.so.0 > #1 0x00000000005163f3 in fprintf () > #2 0x0000000000409cb3 in fprintf () > #3 0x000000000047f895 in fprintf () Can you install the kvm-debuginfo package so we can see the symbols? > #4 0x00000000004b3276 in fprintf () > #5 0x00000000004b3336 in fprintf () > #6 0x0000000000486587 in fprintf () > #7 0x000000000047fa23 in fprintf () > #8 0x00000000004808a9 in fprintf () > #9 0x000000000040712c in fprintf ()
(In reply to comment #10) > (In reply to comment #5) > > > [root@dhcp-66-70-6 ~]# pstack 2156 > > .. > > > Thread 1 (Thread 0x2af01f4be080 (LWP 2156)): > > #0 0x00002af01a5a854b in write () from /lib64/libpthread.so.0 > > #1 0x00000000005163f3 in fprintf () > > #2 0x0000000000409cb3 in fprintf () > > #3 0x000000000047f895 in fprintf () > > Can you install the kvm-debuginfo package so we can see the symbols? > > > > #4 0x00000000004b3276 in fprintf () > > #5 0x00000000004b3336 in fprintf () > > #6 0x0000000000486587 in fprintf () > > #7 0x000000000047fa23 in fprintf () > > #8 0x00000000004808a9 in fprintf () > > #9 0x000000000040712c in fprintf () kvm-debuginfo is installed by default. [root@dhcp-66-70-31 /]# pstack 29034 Thread 5 (Thread 0x41af1940 (LWP 29041)): #0 0x00002b7fb1fed744 in do_sigwaitinfo () from /lib64/libc.so.6 #1 0x00002b7fb1fed7fd in sigwaitinfo () from /lib64/libc.so.6 #2 0x000000000041a3f1 in fprintf () #3 0x00002b7fb0dff4a7 in start_thread () from /lib64/libpthread.so.0 #4 0x00002b7fb208fb7d in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x424f2940 (LWP 29042)): #0 0x00002b7fb0e06174 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00002b7fb0e01aca in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x00002b7fb0e0198c in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000517064 in fprintf () #4 0x0000000000517684 in fprintf () #5 0x00002b7fb0dff4a7 in start_thread () from /lib64/libpthread.so.0 #6 0x00002b7fb208fb7d in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x42ef3940 (LWP 29043)): #0 0x00002b7fb0e06174 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00002b7fb0e01aca in _L_lock_1034 () from /lib64/libpthread.so.0 #2 0x00002b7fb0e0198c in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000517064 in fprintf () #4 0x0000000000517684 in fprintf () #5 0x00002b7fb0dff4a7 in start_thread () from /lib64/libpthread.so.0 #6 0x00002b7fb208fb7d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x442f5940 (LWP 29045)): #0 0x00002b7fb1fed744 in do_sigwaitinfo () from /lib64/libc.so.6 #1 0x00002b7fb1fed7fd in sigwaitinfo () from /lib64/libc.so.6 #2 0x000000000041a3f1 in fprintf () #3 0x00002b7fb0dff4a7 in start_thread () from /lib64/libpthread.so.0 #4 0x00002b7fb208fb7d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x2b7fb5d1c080 (LWP 29034)): #0 0x00002b7fb0e0654b in write () from /lib64/libpthread.so.0 #1 0x0000000000516f13 in fprintf () #2 0x0000000000409cb3 in fprintf () #3 0x000000000047fce5 in fprintf () #4 0x00000000004b3d96 in fprintf () #5 0x00000000004b3e56 in fprintf () #6 0x00000000004869d7 in fprintf () #7 0x000000000047fe73 in fprintf () #8 0x0000000000480cf9 in fprintf () #9 0x000000000040712c in fprintf () #10 0x00000000004099b7 in fprintf () #11 0x000000000051731a in fprintf () #12 0x000000000040e8d5 in fprintf () #13 0x00002b7fb1fd9994 in __libc_start_main () from /lib64/libc.so.6 #14 0x0000000000406be9 in fprintf () #15 0x00007fff38aa7fe8 in ?? () #16 0x0000000000000000 in ?? () [root@dhcp-66-70-31 /]# rpm -qa | grep kvm etherboot-zroms-kvm-5.4.4-10.el5 kvm-debuginfo-83-83.el5 kvm-qemu-img-83-83.el5 kmod-kvm-83-83.el5 kvm-tools-83-83.el5 kvm-83-83.el5
For some reason the stack trace is not complete. Anyway, from the log I can see the the source blocks on socket writes and the destination on read. It's weird, I'll ask them for additional info. Can you keep tcpdump trace of the migration, especially in the last phase? (it should be quite big)
Please provide the information requested in Comment #12
tcpdump will be available in 40min at : http://focus.bne.redhat.com/~lihuang/tcpdump/ -> src : 32.pcap dst : 6.pcap (qemu) info migrate Migration status: active transferred ram: 3129992 kbytes remaining ram: 1193684 kbytes total ram: 4214796 kbytes (qemu) info migrate Migration status: active transferred ram: 3136840 kbytes remaining ram: 1194916 kbytes total ram: 4214796 kbytes (qemu) info migrate Migration status: active transferred ram: 3142420 kbytes remaining ram: 1189772 kbytes total ram: 4214796 kbytes (qemu) migration_notify_done
I checked the source file. It looks like the connection is stopped due to spanning tree recalculation: 20:09:44.585092 IP dhcp-66-70-31.nay.redhat.com.39232 > dhcp-66-70-6.nay.redhat.com.5800: . 329592089:329593537(1448) ack 1 win 46 <nop,nop,timestamp 3279940 3314180> 20:09:44.585278 IP dhcp-66-70-31.nay.redhat.com.39232 > dhcp-66-70-6.nay.redhat.com.5800: P 329593537:329594985(1448) ack 1 win 46 <nop,nop,timestamp 3279940 3314180> 20:09:44.585321 IP dhcp-66-70-6.nay.redhat.com.5800 > dhcp-66-70-31.nay.redhat.com.39232: . ack 329594985 win 24576 <nop,nop,timestamp 3314208 3279940> Here the communication stopped, maybe because a tap was added/published on the destination. Seems like it is the spanning tree that triggers it. You can see it is active below. 20:09:45.113629 arp who-has dhcp-66-70-31.nay.redhat.com tell dhcp-66-70-6.nay.redhat.com 20:09:45.113763 arp reply dhcp-66-70-31.nay.redhat.com is-at 00:21:9b:ff:b5:76 (oui Unknown) 20:09:45.206610 arp who-has 10.66.70.195 tell dhcp-66-70-6.nay.redhat.com 20:09:45.206756 arp reply 10.66.70.195 is-at 00:21:9b:58:5f:d3 (oui Unknown) The tcp connection is probably paused - maybe there is no data to send 20:09:46.368998 STP 802.1w, Rapid STP, Flags [Learn, Forward], bridge-id 8098.00:1d:45:20:d5:c0.80a2, length 43 20:09:48.378657 STP 802.1w, Rapid STP, Flags [Learn, Forward], bridge-id 8098.00:1d:45:20:d5:c0.80a2, length 43 20:09:50.388659 STP 802.1w, Rapid STP, Flags [Learn, Forward], bridge-id 8098.00:1d:45:20:d5:c0.80a2, length 43 20:09:50.899022 IP 10.66.70.195.60169 > dhcp-66-70-6.nay.redhat.com.ssh: P 1027881100:1027881148(48) ack 346347990 win 136 <nop,nop,timestamp 74515412 2556650> 20:09:50.904599 IP dhcp-66-70-6.nay.redhat.com.ssh > 10.66.70.195.60169: P 1:49(48) ack 48 win 90 <nop,nop,timestamp 3320528 74515412> 20:09:50.904711 IP 10.66.70.195.60169 > dhcp-66-70-6.nay.redhat.com.ssh: . ack 49 win 136 <nop,nop,timestamp 74515414 3320528> 20:09:51.074070 IP 10.66.70.195.60169 > dhcp-66-70-6.nay.redhat.com.ssh: P 48:96(48) ack 49 win 136 <nop,nop,timestamp 74515456 3320528> Please re-try with spanning tree of on the bridge. Can you post brctl show command? This is also alarming: 20:09:44.585092 IP dhcp-66-70-31.nay.redhat.com.39232 > dhcp-66-70-6.nay.redhat.com.5800: . 329592089:329593537(1448) ack 1 win 46 <nop,nop,timestamp 3279940 3314180> 20:09:44.585278 IP dhcp-66-70-31.nay.redhat.com.39232 > dhcp-66-70-6.nay.redhat.com.5800: P 329593537:329594985(1448) ack 1 win 46 <nop,nop,timestamp 3279940 3314180> 20:09:44.585321 IP dhcp-66-70-6.nay.redhat.com.5800 > dhcp-66-70-31.nay.redhat.com.39232: . ack 329594985 win 24576 <nop,nop,timestamp 3314208 3279940> 20:09:45.113629 arp who-has dhcp-66-70-31.nay.redhat.com tell dhcp-66-70-6.nay.redhat.com 20:09:45.113763 arp reply dhcp-66-70-31.nay.redhat.com is-at 00:21:9b:ff:b5:76 (oui Unknown) 20:09:45.206610 arp who-has 10.66.70.195 tell dhcp-66-70-6.nay.redhat.com 20:09:45.206756 arp reply 10.66.70.195 is-at 00:21:9b:58:5f:d3 (oui Unknown) 20:09:46.368998 STP 802.1w, Rapid STP, Flags [Learn, Forward], bridge-id 8098.00:1d:45:20:d5:c0.80a2, length 43 20:09:48.378657 STP 802.1w, Rapid STP, Flags [Learn, Forward], bridge-id 8098.00:1d:45:20:d5:c0.80a2, length 43 20:09:50.388659 STP 802.1w, Rapid STP, Flags [Learn, Forward], bridge-id 8098.00:1d:45:20:d5:c0.80a2, length 43 20:09:50.899022 IP 10.66.70.195.60169 > dhcp-66-70-6.nay.redhat.com.ssh: P 1027881100:1027881148(48) ack 346347990 win 136 <nop,nop,timestamp 74515412 2556650> 20:09:50.904599 IP dhcp-66-70-6.nay.redhat.com.ssh > 10.66.70.195.60169: P 1:49(48) ack 48 win 90 <nop,nop,timestamp 3320528 74515412> 20:09:50.904711 IP 10.66.70.195.60169 > dhcp-66-70-6.nay.redhat.com.ssh: . ack 49 win 136 <nop,nop,timestamp 74515414 3320528> 20:09:51.074070 IP 10.66.70.195.60169 > dhcp-66-70-6.nay.redhat.com.ssh: P 48:96(48) ack 49 win 136 <nop,nop,timestamp 74515456 3320528>
> > Please re-try with spanning tree of on the bridge. > Can you post brctl show command? > Yes. stp is not enable by default. retrying ..
(In reply to comment #16) > > > > Please re-try with spanning tree of on the bridge. > > Can you post brctl show command? > > > > Yes. stp is not enable by default. retrying .. It is good that stp is not enabled. The thing is why do we get a networking outage? There are STP recaluculations and there is a lost connection between the source and destination (that's why you see the arp request). Please check how this differs from other hosts.
After i set stp on/yes. connection of host lost .... I am going to test with in a private network. (BTW. if enable stp is a workaround for this issue, why we didn't set it as default ? )
(In reply to comment #18) > After i set stp on/yes. connection of host lost .... > I am going to test with in a private network. > (BTW. if enable stp is a workaround for this issue, why we didn't set it as > default ? ) It is not the workaround. STP should be disabled (the default). You might have some setup issues.
Created attachment 351336 [details] sosreport can reproduce in kvm-83-87.el5. and I haven't found the differs from other hosts. Attached is the sosreport. please check.
From above I can see that you do have spanning tree running: "Jul 11 11:08:30 dhcp-66-70-6 kernel: breth0: topology change detected, propagating Jul 11 11:08:30 dhcp-66-70-6 kernel: breth0: port 3(tap0) entering forwarding state Jul 11 11:12:40 dhcp-66-70-6 kernel: breth0: port 3(tap0) entering disabled state Jul 11 11:12:40 dhcp-66-70-6 kernel: device tap0 left promiscuous mode Jul 11 11:12:40 dhcp-66-70-6 kernel: breth0: port 3(tap0) entering disabled state Jul 11 11:13:01 dhcp-66-70-6 kernel: device tap0 entered promiscuous mode Jul 11 11:13:01 dhcp-66-70-6 kernel: breth0: topology change detected, propagating Jul 11 11:13:01 dhcp-66-70-6 kernel: breth0: port 3(tap0) entering forwarding state " Please run brctl stp breth0 off and brctl setfd breth 0 and retest. Also please add kvm sosreport plugin for next time.
Can you try using this rpm (private build): https://brewweb.devel.redhat.com/taskinfo?taskID=1888898
Hi Dor test 10 times . not reproduce the original issue so far. but got a qemu monitor error: remaining ram: 105200 kbytes total ram: 4214796 kbytes (qemu) info migrate Migration status: active transferred ram: 4117176 kbytes remaining ram: 97824 kbytes total ram: 4214796 kbytes (qemu) migration_notify_done failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread failed to notify io thread info migrate Migration status: completed (qemu)
(In reply to comment #26) > Hi Dor > > test 10 times . not reproduce the original issue so far. > but got a qemu monitor error: > > remaining ram: 105200 kbytes > total ram: 4214796 kbytes > (qemu) info migrate > Migration status: active > transferred ram: 4117176 kbytes > remaining ram: 97824 kbytes > total ram: 4214796 kbytes > (qemu) migration_notify_done > failed to notify io thread > failed to notify io thread Superb! This message exactly shows that the solution is good. I'll erase it from the code.
tested on kvm-83-94.el5 command used: source: /usr/libexec/qemu-kvm -drive file=/mnt/RHEL-4.8-64.qcow2 -net nic,macaddr=20:20:20:00:18:59,model=virtio -net tap,script=/etc/qemu-ifup -rtc-td-hack -no-hpet -usbdevice tablet -cpu qemu64,+sse2 -smp 2 -m 4096 -vnc :12 -monitor stdio -boot c des: /usr/libexec/qemu-kvm -drive file=/mnt/RHEL-4.8-64.qcow2 -net nic,macaddr=20:20:20:00:18:59,model=virtio -net tap,script=/etc/qemu-ifup -rtc-td-hack -no-hpet -usbdevice tablet -cpu qemu64,+sse2 -smp 2 -m 4096 -vnc :12 -monitor stdio -boot c -incoming tcp:0:6000 try several time with the following situation during : 1. run stress testing during migration on rhel5.4-32 stress -i 4 -m 4 -c 4 -d 4 2. download file from website during migration on rhel4.8-32 3. play flash during migration on rhel5.4-64 4. reboot vm during migration on rhel4.8-64 5. migrate rhel3.9-32 when vm is started. Can not reproduce this issue with these situations. Change the status to *VERIFIED*,please reopen it if you can reproduce.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-1272.html