Bug 513765
Summary: | Large guest ( 256G RAM + 16 vcpu ) hang during live migration | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | lihuang <lihuang> | |
Component: | kvm | Assignee: | Juan Quintela <quintela> | |
Status: | CLOSED ERRATA | QA Contact: | ovirt-maint <ovirt-maint> | |
Severity: | medium | Docs Contact: | ||
Priority: | low | |||
Version: | 5.4 | CC: | bcao, dyasny, ehabkost, gyang, juzhang, llim, michen, tao, tburke, virt-maint, ykaul | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | kvm-83-213.el5 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 658823 (view as bug list) | Environment: | ||
Last Closed: | 2011-01-13 23:11:18 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 545233, 565939, 568128, 580949, 643970, 645188, 658823 |
Description
lihuang
2009-07-25 14:11:46 UTC
Can you please retest with latest kvm? Post migration, is the mouse/keyboard are back? Can't they move at all or very slowly? Glauber, why do you say it's dirty bit issue? Agreed with dor, first thing is to test it with latest software. I've came across many migration issues in the past, that happened due to us calculating dirty bit mask incorrectly. So it is quite possible that this bug is a dupe that we forgot to close. If it is not the case, I'll be glad to dig in again. re-test in kvm-83-164.el5 1. during migration (before hit the downtime) mouse/kbd is not avaiable. 2.after migration. some times mouse/kbd is not come back. "Mar 15 05:24:28 x86 kernel: psmouse.c: Explorer Mouse at isa0060/serio1/input0 lost synchronization, throwing 1 bytes away " could be found from dmesg. 3. after quest dst vm (before migration done) ; (qemu) info migrate Migration status: active transferred ram: 3341932 kbytes remaining ram: 265114932 kbytes total ram: 268455948 kbytes (qemu) (qemu) info migrate Migration status: active transferred ram: 3341932 kbytes remaining ram: 265116864 kbytes total ram: 268455948 kbytes (qemu) info migrate Migration status: active transferred ram: 3341932 kbytes remaining ram: 265116880 kbytes total ram: 268455948 kbytes issue still exist. - Does it responds to pings during the migration? - Can you retest 256G but with less vcpus? Say -smp 1 or -smp 2? I'm trying to see where the problem is. Similarly, testing a guest of 1G with -smp 16 will be helpful too. ================ -smp 2 -m 256G : ================ [root@intel-XE7450-512-1 ~]# ping 10.66.82.192 PING 10.66.82.192 (10.66.82.192) 56(84) bytes of data. From 10.66.83.79 icmp_seq=32 Destination Host Unreachable From 10.66.83.79 icmp_seq=33 Destination Host Unreachable From 10.66.83.79 icmp_seq=34 Destination Host Unreachable From 10.66.83.79 icmp_seq=36 Destination Host Unreachable From 10.66.83.79 icmp_seq=37 Destination Host Unreachable no response even after migration. after restart network in side guest.it is come back. (similar to bug 524651, but guest is using e1000 nic in the test. soft lockup found in dmesg : http://pastebin.test.redhat.com/21126 (harmless as bug 512656 ? ) ================ -smp 16 -m 2G : ================ first it respond to ping, after a while (~ 1 min). no response. ping again, then return "Destination Host Unreachable". network is not come back until restart network after migration. will try a smaller configure (-smp 8 / -m 64 ) tomorrow. Thanks Lijun Huang. ================ -smp 2 -m 2G : ================ 1. mouse/kbd/network(ping) works well during and after migration. ================ -smp 2 -m 64G : ================ 1. mouse/kbd hang during migration. but come back after migration. 2. inside guest. ping #host is continuous (but latency is large during migration). 3. From Host, ping #guest : 64 bytes from 10.66.82.192: icmp_seq=54 ttl=64 time=303 ms 64 bytes from 10.66.82.192: icmp_seq=55 ttl=64 time=343 ms 64 bytes from 10.66.82.192: icmp_seq=56 ttl=64 time=31.9 ms 64 bytes from 10.66.82.192: icmp_seq=57 ttl=64 time=3.00 ms ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available --- 10.66.82.192 ping statistics --- 73 packets transmitted, 57 received, 21% packet loss, time 91029ms rtt min/avg/max/mdev = 0.000/320.319/5304.193/1017.810 ms, pipe 6 [root@intel-XE7450-512-1 ~]# ping 10.66.82.192 PING 10.66.82.192 (10.66.82.192) 56(84) bytes of data. From 10.66.83.79 icmp_seq=9 Destination Host Unreachable From 10.66.83.79 icmp_seq=10 Destination Host Unreachable From 10.66.83.79 icmp_seq=11 Destination Host Unreachable after migraion, it come back. [root@intel-XE7450-512-1 ~]# ping 10.66.82.192 PING 10.66.82.192 (10.66.82.192) 56(84) bytes of data. 64 bytes from 10.66.82.192: icmp_seq=1 ttl=64 time=2.07 ms 64 bytes from 10.66.82.192: icmp_seq=2 ttl=64 time=12.5 ms 64 bytes from 10.66.82.192: icmp_seq=3 ttl=64 time=0.391 ms 64 bytes from 10.66.82.192: icmp_seq=4 ttl=64 time=0.663 ms Hi, I need to fix this bug. How can i reproduce it?, I have here machine with just 100giga ram Any chance it trigger at lower levels of memory lets say 50giga guest? Is it always happen? Can i some how get access to machine with big memory ? Is it still happening to you? Thanks for the info ! I just sent to rhvirt-patchs patch to fix it. Thanks. reproduced and patches have been posted for it. *** Bug 601045 has been marked as a duplicate of this bug. *** For the issue of Large guest ( 256G RAM + 16 vcpu ) hang during live migration Re-produced on kvm-83-164.el5 Verified on kvm-83-217.el5 steps: 1.start VM in src host: /usr/libexec/qemu-kvm -m 128G -smp 16 -name VM1 -uuid 438915f2-c0fc-8d6b-bb06-b8ddd28046fa -no-kvm-pit-reinjection -boot c -drive file=/home/tt.img,if=virtio,index=0,boot=on,cache=none -net nic,macaddr=54:52:00:3a:d4:4d,vlan=0,model=virtio -net tap,script=/etc/qemu-ifup,vlan=0 -serial pty -parallel none -usbdevice tablet -k en-us -vnc :2 -monitor stdio 2.start listenning port 3.do live migration. Actual Results: for kvm-83-164.el5 ,guest hang during migartion for kvm-83-217.el5 ,guest works well. Referring to comment#0 Another phenomenon is : If quit dst vm. in src vm. QEMU cmd 'info migrate ' show migration still running. the value of _transferred ram_ and _total ram_ stuck. while the value of _remaining ram_ increase . e.g. ( in this testing, we just used 128G RAM ) re-test on kvm-83-217.el5 this issue still exists.the value of _transferred ram_ and _total ram_ stuck._remaining ram_ increase a little ,then stuck. Expected Results: '(qemu)info migration' should show migration failed. Based on above ,re-assign this issue. eventually, state of migration will shown as failed. it needs to detect that the other side have died. This is not a regression, is something that has always been there. opening other bugzilla for this is ok. Moving back to ON_QA, based on comment #22. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0028.html |