Bug 632557
| Summary: | Migration with STRESS caused guest hang | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Keqin Hong <khong> | ||||
| Component: | qemu-kvm | Assignee: | Juan Quintela <quintela> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 6.0 | CC: | bcao, llim, michen, mkenneth, plyons, rwu, tburke, virt-maint | ||||
| Target Milestone: | rc | Keywords: | RHELNAK | ||||
| Target Release: | 6.1 | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-02-04 12:47:23 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 580951 | ||||||
| Attachments: |
|
||||||
Thank you for your bug report. This issue was evaluated for inclusion in the current release of Red Hat Enterprise Linux. Unfortunately, we are unable to address this request in the current release. Because we are in the final stage of Red Hat Enterprise Linux 6 development, only significant, release-blocking issues involving serious regressions and data corruption can be considered. If you believe this issue meets the release blocking criteria as defined and communicated to you by your Red Hat Support representative, please ask your representative to file this issue as a blocker for the current release. Otherwise, ask that it be evaluated for inclusion in the next minor release of Red Hat Enterprise Linux. Created attachment 446491 [details]
kvm_stat log
What's the output of 5 consecutive "info migrate" commands at qemu monitor console, when the migration is stalled? (qemu) info migrate Migration status: active transferred ram: 10860216 kbytes remaining ram: 3079744 kbytes total ram: 8405440 kbytes (qemu) info migrate Migration status: active transferred ram: 11212244 kbytes remaining ram: 3058088 kbytes total ram: 8405440 kbytes (qemu) info migrate Migration status: active transferred ram: 11686008 kbytes remaining ram: 2987196 kbytes total ram: 8405440 kbytes (qemu) info migrate Migration status: active transferred ram: 12123908 kbytes remaining ram: 3125364 kbytes total ram: 8405440 kbytes (qemu) info migrate Migration status: active transferred ram: 12189708 kbytes remaining ram: 3125872 kbytes total ram: 8405440 kbytes Ok, so on the migration side, it does really seem that the reason is that we're dirtying pages faster than we transfer, and not some other mystical reason. Still don't have a theory on why it hangs after it is finished. Would be good to rule out the dirty pages as a driver for this. Can you try migrating again, but now issuing, right before migration: (qemu) migrate_set_speed 4G This should transfer the pages, regardless of the memory pressure we're seeing... I tried with (qemu) migrate_set_speed 4G right before migration. Migration first succeeded from A to B with no problem, but triggered guest hang after migration from B to A. (qemu) migrate_set_speed 4G (qemu) migrate -d tcp:10.66.86.26:5831 (qemu) info migrate Migration status: active transferred ram: 3670644 kbytes remaining ram: 4734932 kbytes total ram: 8405440 kbytes (qemu) info migrate Migration status: completed Ok, let me get it straight: You do set_speed from A -> B, and it works You *DO NOT* do set_speed from B -> A, and then it hangs. Is that correct? (In reply to comment #8) > Ok, let me get it straight: > > You do set_speed from A -> B, and it works > You *DO NOT* do set_speed from B -> A, and then it hangs. > > Is that correct? No, I did set_speed for both before migration. From A -> B, migration finished, and guest continued to work. From B -> A, migration also completed, but guest hanged. It might just show that under high mem stress, even with set_speed migration still can cause guest to hang, just not 100% reproducible. Thanks. This is a very good test case for live migration! When the guest hang, is there a message? Can you still see the screen with vnc? Without live migration, does a guest running stress will even hang? (In reply to comment #10) > This is a very good test case for live migration! > > When the guest hang, is there a message? no message I observed. > Can you still see the screen with vnc? Yes I can. But guest hung, no network, no mouse/keyboard input allowed. > Without live migration, does a guest running stress will even hang? No, it won't. *** This bug has been marked as a duplicate of bug 643970 *** |
Description of problem: Migration locally with STRESS couldn't finish. After terminating ./stress process, migration completed. However, sometimes guest hanged. Version-Release number of selected component (if applicable): host: qemu-kvm-0.12.1.2-2.113.el6.x86_64 kernel-2.6.32-71.el6.x86_64 guest: RHEL5.5.z-64 kernel-2.6.18-194.11.3.el5 How reproducible: 3/20 Steps to Reproduce: 1. start src VM with 4vcpu and 8G mem /usr/libexec/qemu-kvm -M rhel6.0.0 -enable-kvm -m 8G -smp 4,sockets=4,cores=1,threads=1 -name rhel5-64 -uuid d1a201e7-7109-507d-cb9a-b010becc6c6b -nodefconfig -nodefaults -monitor stdio -rtc base=utc -boot c -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/root/rhel5-64.img,if=none,id=drive-ide0-0-0,boot=on,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:94:3f:29,bus=pci.0,addr=0x3 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -vnc 10.66.85.229:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 2. run ./stress --cpu 4 -vm 16 --vm-bytes 256M -verbose 3. start dest VM in listening mode ... --incoming tcp:0:5800 4. migrate from src to dest 5. ^C to terminate ./stress 6. wait for migration to complete Actual results: Guest hanged (Desktop and console), couldn't ping its Ethernet interface Expected results: no hang Additional info: Tested on both a 64-cpu Intel box and another AMD box (virtlab: amd-2471-32-1), both had the problem. Seems it happened on rhel5.5-z-64 (kernel-2.6.18-194.11.3.el5) only, as I didn't reproduce it on 32bit guest. top: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 60919 root 20 0 8713m 4.0g 3128 R 11.6 3.7 1:03.07 qemu-kvm 60907 root 20 0 8713m 4.0g 3128 S 9.6 3.7 1:07.83 qemu-kvm 60916 root 20 0 8713m 4.0g 3128 S 0.0 3.7 0:04.34 qemu-kvm 60917 root 20 0 8713m 4.0g 3128 S 0.0 3.7 0:02.52 qemu-kvm 60918 root 20 0 8713m 4.0g 3128 S 0.0 3.7 0:02.28 qemu-kvm # gdb attach 60916 (gdb) bt #0 0x0000003581ad95f7 in ioctl () from /lib64/libc.so.6 #1 0x000000000042a57f in kvm_run (env=0x16ffd10) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:928 #2 0x000000000042aa09 in kvm_cpu_exec (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1658 #3 0x000000000042b62f in kvm_main_loop_cpu (_env=0x16ffd10) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1900 #4 ap_main_loop (_env=0x16ffd10) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1950 #5 0x00000035822077e1 in start_thread () from /lib64/libpthread.so.0 #6 0x0000003581ae153d in clone () from /lib64/libc.so.6 # strace -p 60916 Process 60916 attached - interrupt to quit rt_sigtimedwait([BUS RT_6], 0x7ffae001fb70, {0, 0}, 8) = -1 EAGAIN (Resource temporarily unavailable) rt_sigpending([ALRM]) = 0 # strace -p 60919 Process 60919 attached - interrupt to quit rt_sigtimedwait([BUS RT_6], 0x7ffade219b70, {0, 0}, 8) = -1 EAGAIN (Resource temporarily unavailable) rt_sigpending([]) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(5, 0xffffffffc008ae67, 0x7ffade219bb0) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(5, 0xffffffffc008ae67, 0x7ffade219bb0) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(5, 0xffffffffc008ae67, 0x7ffade219bb0) = 0 ioctl(14, 0xae80, 0) = 0 ioctl(5, 0xffffffffc008ae67, 0x7ffade219bb0) = 0