| Summary: | The rhel7 guest OS hung after doing 1024 rounds migration. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Hu Jianwei <jiahu> | ||||||
| Component: | qemu-kvm | Assignee: | Juan Quintela <quintela> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 7.0 | CC: | dyuan, hhuang, jiahu, jishao, juzhang, mzhan, qzhang, rbalakri, virt-maint, zpeng | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-05-12 19:56:08 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
Hi, Jianwei Could you help update the qemu command line generated by "ps ax | grep qemu"? And how many guests are running on your host? Thanks. Hi Qunfang, Only one guest on machine(source/destination) during doing this task. Qemu-kvm command line is: [root@ibm-x3650m3-07 216378]# ps aux | grep qemu-kvm| grep -v grep qemu 8337 3.8 0.9 1646332 312568 ? Sl 15:41 0:01 /usr/libexec/qemu-kvm -name r7_mig -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 3c84a580-7582-a249-a685-8903cdfa3fe3 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/r7_mig.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/mnt/jiahu/images/r7_mig.img,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=31,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:41:7c:87,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5901,addr=127.0.0.1,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming fd:26 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 [root@ibm-x3650m3-07 216378]# Thanks. Created attachment 830470 [details]
If hung, only displayed a flashing cursor
Created attachment 831392 [details]
Another similar hung.
Could you try to reproduce with virtio for the disk instead of IDE? thanks Hi zpeng, please help to reply the comment7, thanks. I can't reproduce this with virtio for the disk use build: libvirt-1.2.17-7.el7.x86_64 qemu-kvm-rhev-2.3.0-22.el7.x86_64 after 1024 round, the guest can worked well. qemu cmd: /usr/libexec/qemu-kvm -name rhel7 -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu Westmere -m 500 -realtime mlock=off -smp 4,sockets=1,cores=4,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,share=yes,size=524288000 -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -uuid 28321759-1302-4a7a-b97d-1b32ef73b052 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-rhel7/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/migrate/kvm-rhel7.1-x86_64-qcow2.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=30 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f8:c8:dd,bus=pci.0,addr=0x3 -netdev tap,fd=31,id=hostnet1,vhost=on,vhostfd=32 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:f8:c8:d1,bus=pci.0,addr=0x8 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/guest.agent,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0 -spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming tcp:[::]:49152 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on I think that the problem is on the script for migration. They are not checking that the migration had success. And they can't reproduce with virtio. I will vote for WONTFIX. I can not reproduce the bug with virtio for the disk and the ide disk with the
qemu-kvm-rhev-2.6.0-11.el7.x86_64 and libvirt-1.3.5-1.el7.x86_64
(1)for the virtio disk:
# virsh dumpxml r7.1 | grep disk -aA8
<suspend-to-disk enabled='no'/>
</pm>
<devices>
<emulator>/usr/libexec/qemu-kvm</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none'/>
<source file='/nfs/r7.1.img'>
<seclabel model='selinux' labelskip='yes'/>
</source>
<backingStore/>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</disk>
<disk type='block' device='cdrom'>
<driver name='qemu' type='raw'/>
<backingStore/>
<target dev='hda' bus='ide'/>
<readonly/>
<alias name='ide0-0-0'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
After 1024 loops migrating
.......
Loop 1024: Migrating r7.1 from 10.66.4.192 to 10.66.70.107
COMMAND: virsh -c qemu+tcp://root.4.192/system migrate --live --p2p r7.1 qemu+tcp://root.70.107/system --berbose
Migration: [100 %]
real 0m32.232s
user 0m0.012s
sys 0m0.015s
Loop 1024: Migrating r7.1 back from 10.66.70.107 to 10.66.4.192
COMMAND: virsh -c qemu+tcp://root.70.107/system migrate --live --p2p r7.1 qemu+tcp://root.4.192/system --verbose
Migration: [100 %]
real 0m34.887s
user 0m0.010s
sys 0m0.017s
# virsh list
Id Name State
----------------------------------------------------
1034 r7.1 running
# ps -ef | grep qemu
qemu 3935 1 0 Jul12 ? 00:01:20 /usr/libexec/qemu-kvm -name guest=r7.1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1034-r7.1/master-key.aes -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid fe958396-c684-42ba-a435-90da12db62aa -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1034-r7.1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/nfs/r7.1.img,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-0-0,readonly=on -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charchannel0 -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.linux-kvm.port.0 -device usb-tablet,id=input0 -spice port=5900,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -device ich9-intel-hda,id=sound0,bus=pci.0,addr=0xa -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
(2)for the ide disk:
# virsh dumpxml ide2 | grep disk -A9
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='threads'/>
<source file='/nfs2/r7.2.img'>
<seclabel model='selinux' relabel='yes'/>
</source>
<backingStore/>
<target dev='hda' bus='ide'/>
<serial>eca38821-c430-48a1-a932-a4814198f24d</serial>
<alias name='ide0-0-0'/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
After 1024 loops migrating
.......
Loop 1024: Migrating ide2 back from 10.66.70.107 to 10.66.4.192
COMMAND: virsh -c qemu+tcp://root.70.107/system migrate --live --p2p ide2 qemu+tcp://root.4.192/system --verbose
Migration: [100 %]
real 1m37.319s
user 0m0.034s
sys 0m0.049s
Loop 1024: Migrating ide2 from 10.66.4.192 to 10.66.70.107
COMMAND: virsh -c qemu+tcp://root.4.192/system migrate --live --p2p ide2 qemu+tcp://root.70.107/system --berbose
Migration: [100 %]
real 1m36.344s
user 0m0.033s
sys 0m0.058s
# virsh list
Id Name State
----------------------------------------------------
1737 ide2 running
# ps -ef | grep qemu
qemu 24693 1 10 15:08 ? 00:00:14 /usr/libexec/qemu-kvm -name guest=ide2,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1737-ide2/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 549bb113-4721-4145-949d-2305832117f6 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1737-ide2/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/nfs2/r7.2.img,format=qcow2,if=none,id=drive-ide0-0-0,serial=eca38821-c430-48a1-a932-a4814198f24d,cache=none,werror=stop,rerror=stop,aio=threads -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=33 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:90:e4:b5,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-1737-ide2/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -spice port=5900,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -device AC97,id=sound0,bus=pci.0,addr=0x4 -device i6300esb,id=watchdog0,bus=pci.0,addr=0x9 -watchdog-action reset -device usb-host,id=hostdev0 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
It can be reproduced this issue with the ide disk and the version as Description libvirt-1.1.1-13.el7.x86_64 qemu-kvm-1.5.3-19.el7.x86_64 kernel-3.10.0-54.el7.x86_64 Loop 1024: Migrating vm1 back from 10.66.4.223 to 10.66.5.45 COMMAND: virsh -c qemu+tcp://root.4.223/system migrate --live --p2p vm1 qemu+tcp://root.5.45/system --verbose Migration: [100 %] real 0m3.735s user 0m0.014s sys 0m0.006s PING 10.66.5.6 (10.66.5.6) 56(84) bytes of data. From 10.66.5.45 icmp_seq=1 Destination Host Unreachable From 10.66.5.45 icmp_seq=2 Destination Host Unreachable From 10.66.5.45 icmp_seq=3 Destination Host Unreachable From 10.66.5.45 icmp_seq=4 Destination Host Unreachable From 10.66.5.45 icmp_seq=5 Destination Host Unreachable virsh list Id Name State ---------------------------------------------------- 2122 vm1 running when I access vm1, the OS was hung Are you testing at which iteration the guest stopped working? And what is the error message there? Could you post the script that you use for ping pong testing 1024 times? Thanks, Juan. |
Description of problem: The guest OS hung after doing 1024 rounds migration. Version-Release number of selected component (if applicable): libvirt-1.1.1-13.el7.x86_64 qemu-kvm-1.5.3-19.el7.x86_64 kernel-3.10.0-54.el7.x86_64 How reproducible: 100%(4/4) Steps to Reproduce: 1. Define one guest with NFS's disk [root@ibm-x3850x5-06 216380]# virsh list --all Id Name State ---------------------------------------------------- 1228 r7_mig shut off [root@ibm-x3850x5-06 216380]# virsh dumpxml r7_mig| grep disk -aA8 <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source file='/mnt/jiahu/images/r7_tls.img'> <seclabel model='selinux' relabel='yes'/> </source> <target dev='hda' bus='ide'/> <alias name='ide0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> ... [root@ibm-x3850x5-06 216380]# mount | grep 121 10.66.90.121:/vol/S3/libvirtmanual on /mnt type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.66.90.121,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=10.66.90.121) 2. Run below shell script [root@ibm-x3650m3-07 216378]# cat migration.sh #!/bin/bash # Migrate a guest back and forth between two hosts, printing progress as it goes GUEST=$1 HOST1=$2 HOST2=$3 OPTIONS="--live --p2p" TRANSPORT="tcp" #TRANSPORT="tls" #TRANSPORT="ssh" date for i in `seq 1 1024`; do echo "Loop ${i}: Migrating ${GUEST} from ${HOST1} to ${HOST2}" echo "COMMAND: virsh -c qemu+${TRANSPORT}://root@${HOST1}/system migrate ${OPTIONS} ${GUEST} qemu+${TRANSPORT}://root@${HOST2}/system --berbose" time virsh -c qemu+${TRANSPORT}://root@${HOST1}/system migrate ${OPTIONS} ${GUEST} qemu+${TRANSPORT}://root@${HOST2}/system --verbose sleep 30 echo "Loop ${i}: Migrating ${GUEST} back from ${HOST2} to ${HOST1}" echo "COMMAND: virsh -c qemu+${TRANSPORT}://root@${HOST2}/system migrate ${OPTIONS} ${GUEST} qemu+${TRANSPORT}://root@${HOST1}/system --verbose" time virsh -c qemu+${TRANSPORT}://root@${HOST2}/system migrate ${OPTIONS} ${GUEST} qemu+${TRANSPORT}://root@${HOST1}/system --verbose sleep 30 done date [root@ibm-x3850x5-06 216380]# sh migration.sh source_ip dest_ip Actual results: The guest OS was hung after 1024 times migration, the display of guest OS was frozen, can not input anything. You can use my script to reproduce it. I can't capture any error logs from libvirt side. Expected results: The guest OS can keep working after doing a lot of rounds migration.