Bug 1035595

Summary: The rhel7 guest OS hung after doing 1024 rounds migration.
Product: Red Hat Enterprise Linux 7 Reporter: Hu Jianwei <jiahu>
Component: qemu-kvmAssignee: Juan Quintela <quintela>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: dyuan, hhuang, jiahu, jishao, juzhang, mzhan, qzhang, rbalakri, virt-maint, zpeng
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-12 19:56:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
If hung, only displayed a flashing cursor
none
Another similar hung. none

Description Hu Jianwei 2013-11-28 07:27:36 UTC
Description of problem:
The guest OS hung after doing 1024 rounds migration.

Version-Release number of selected component (if applicable):
libvirt-1.1.1-13.el7.x86_64
qemu-kvm-1.5.3-19.el7.x86_64
kernel-3.10.0-54.el7.x86_64

How reproducible:
100%(4/4)

Steps to Reproduce:
1. Define one guest with NFS's disk
[root@ibm-x3850x5-06 216380]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 1228  r7_mig                         shut off

[root@ibm-x3850x5-06 216380]# virsh dumpxml r7_mig| grep disk -aA8
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/mnt/jiahu/images/r7_tls.img'>
        <seclabel model='selinux' relabel='yes'/>
      </source>
      <target dev='hda' bus='ide'/>
      <alias name='ide0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
...
[root@ibm-x3850x5-06 216380]# mount | grep 121
10.66.90.121:/vol/S3/libvirtmanual on /mnt type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.66.90.121,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=10.66.90.121)

2. Run below shell script
[root@ibm-x3650m3-07 216378]# cat migration.sh 
#!/bin/bash

# Migrate a guest back and forth between two hosts, printing progress as it goes
GUEST=$1
HOST1=$2
HOST2=$3
OPTIONS="--live  --p2p"
TRANSPORT="tcp"
#TRANSPORT="tls"
#TRANSPORT="ssh"

date
for i in `seq 1 1024`;
do
    echo "Loop ${i}: Migrating ${GUEST} from ${HOST1} to ${HOST2}"
    echo "COMMAND: virsh -c qemu+${TRANSPORT}://root@${HOST1}/system migrate ${OPTIONS} ${GUEST} qemu+${TRANSPORT}://root@${HOST2}/system --berbose"
    time virsh -c qemu+${TRANSPORT}://root@${HOST1}/system migrate ${OPTIONS} ${GUEST} qemu+${TRANSPORT}://root@${HOST2}/system --verbose
    sleep 30
    echo "Loop ${i}: Migrating ${GUEST} back from ${HOST2} to ${HOST1}"
    echo "COMMAND: virsh -c qemu+${TRANSPORT}://root@${HOST2}/system migrate ${OPTIONS} ${GUEST} qemu+${TRANSPORT}://root@${HOST1}/system --verbose"
    time virsh -c qemu+${TRANSPORT}://root@${HOST2}/system migrate ${OPTIONS} ${GUEST} qemu+${TRANSPORT}://root@${HOST1}/system --verbose
    sleep 30
done
date
[root@ibm-x3850x5-06 216380]# sh migration.sh source_ip dest_ip

Actual results:
The guest OS was hung after 1024 times migration, the display of guest OS was frozen, can not input anything. You can use my script to reproduce it. I can't capture any error logs from libvirt side.

Expected results:
The guest OS can keep working after doing a lot of rounds migration.

Comment 1 Qunfang Zhang 2013-11-28 07:40:13 UTC
Hi, Jianwei

Could you help update the qemu command line generated by "ps ax | grep qemu"?  And how many guests are running on your host? 

Thanks.

Comment 2 Hu Jianwei 2013-11-28 07:45:11 UTC
Hi Qunfang,

Only one guest on machine(source/destination) during doing this task.

Qemu-kvm command line is:
[root@ibm-x3650m3-07 216378]# ps aux | grep qemu-kvm| grep -v grep
qemu      8337  3.8  0.9 1646332 312568 ?      Sl   15:41   0:01 /usr/libexec/qemu-kvm -name r7_mig -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 3c84a580-7582-a249-a685-8903cdfa3fe3 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/r7_mig.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/mnt/jiahu/images/r7_mig.img,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=31,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:41:7c:87,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -spice port=5901,addr=127.0.0.1,disable-ticketing,seamless-migration=on -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=67108864 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming fd:26 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
[root@ibm-x3650m3-07 216378]# 

Thanks.

Comment 4 Hu Jianwei 2013-11-29 04:02:08 UTC
Created attachment 830470 [details]
If hung, only displayed a flashing cursor

Comment 5 Hu Jianwei 2013-12-02 03:00:27 UTC
Created attachment 831392 [details]
Another similar hung.

Comment 7 Juan Quintela 2014-08-13 11:37:54 UTC
Could you try to reproduce with virtio for the disk instead of IDE?  thanks

Comment 10 dyuan 2015-09-10 02:12:20 UTC
Hi zpeng, please help to reply the comment7, thanks.

Comment 11 zhe peng 2015-09-11 02:22:19 UTC
I can't reproduce this with virtio for the disk
use build:
libvirt-1.2.17-7.el7.x86_64
qemu-kvm-rhev-2.3.0-22.el7.x86_64

after 1024 round, the guest can worked well.

qemu cmd:
/usr/libexec/qemu-kvm -name rhel7 -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu Westmere -m 500 -realtime mlock=off -smp 4,sockets=1,cores=4,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu,share=yes,size=524288000 -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -uuid 28321759-1302-4a7a-b97d-1b32ef73b052 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-rhel7/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/libvirt/migrate/kvm-rhel7.1-x86_64-qcow2.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=30 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f8:c8:dd,bus=pci.0,addr=0x3 -netdev tap,fd=31,id=hostnet1,vhost=on,vhostfd=32 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:f8:c8:d1,bus=pci.0,addr=0x8 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/guest.agent,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0 -spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming tcp:[::]:49152 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on

Comment 13 Juan Quintela 2016-06-15 11:09:41 UTC
I think that the problem is on the script for migration.  They are not checking that the migration had success.

And they can't reproduce with virtio.  I will vote for WONTFIX.

Comment 14 Jingjing Shao 2016-07-15 07:12:39 UTC
I can not reproduce the bug with virtio for the disk  and the ide disk with the 
qemu-kvm-rhev-2.6.0-11.el7.x86_64  and  libvirt-1.3.5-1.el7.x86_64


(1)for the virtio disk:

# virsh dumpxml r7.1 | grep disk -aA8
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/nfs/r7.1.img'>
        <seclabel model='selinux' labelskip='yes'/>
      </source>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
    <disk type='block' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <backingStore/>
      <target dev='hda' bus='ide'/>
      <readonly/>
      <alias name='ide0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>


After 1024 loops migrating
.......
Loop 1024: Migrating r7.1 from 10.66.4.192 to 10.66.70.107
COMMAND: virsh -c qemu+tcp://root.4.192/system migrate --live  --p2p r7.1 qemu+tcp://root.70.107/system --berbose
Migration: [100 %]

real	0m32.232s
user	0m0.012s
sys	0m0.015s
Loop 1024: Migrating r7.1 back from 10.66.70.107 to 10.66.4.192
COMMAND: virsh -c qemu+tcp://root.70.107/system migrate --live  --p2p r7.1 qemu+tcp://root.4.192/system --verbose
Migration: [100 %]

real	0m34.887s
user	0m0.010s
sys	0m0.017s


# virsh list
 Id    Name                           State
----------------------------------------------------
 1034  r7.1                           running


# ps -ef | grep qemu
qemu      3935     1  0 Jul12 ?        00:01:20 /usr/libexec/qemu-kvm -name guest=r7.1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1034-r7.1/master-key.aes -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid fe958396-c684-42ba-a435-90da12db62aa -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1034-r7.1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/nfs/r7.1.img,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-0-0,readonly=on -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charchannel0 -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.linux-kvm.port.0 -device usb-tablet,id=input0 -spice port=5900,addr=127.0.0.1,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -device ich9-intel-hda,id=sound0,bus=pci.0,addr=0xa -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on


(2)for the ide disk:

# virsh dumpxml ide2 | grep  disk -A9
 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='threads'/>
      <source file='/nfs2/r7.2.img'>
        <seclabel model='selinux' relabel='yes'/>
      </source>
      <backingStore/>
      <target dev='hda' bus='ide'/>
      <serial>eca38821-c430-48a1-a932-a4814198f24d</serial>
      <alias name='ide0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

After 1024 loops migrating
.......
Loop 1024: Migrating ide2 back from 10.66.70.107 to 10.66.4.192
COMMAND: virsh -c qemu+tcp://root.70.107/system migrate --live  --p2p ide2 qemu+tcp://root.4.192/system --verbose
Migration: [100 %]

real	1m37.319s
user	0m0.034s
sys	0m0.049s
Loop 1024: Migrating ide2 from 10.66.4.192 to 10.66.70.107
COMMAND: virsh -c qemu+tcp://root.4.192/system migrate --live  --p2p ide2 qemu+tcp://root.70.107/system --berbose
Migration: [100 %]

real	1m36.344s
user	0m0.033s
sys	0m0.058s


# virsh list
 Id    Name                           State
----------------------------------------------------
 1737  ide2                           running


# ps -ef | grep qemu
qemu     24693     1 10 15:08 ?        00:00:14 /usr/libexec/qemu-kvm -name guest=ide2,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1737-ide2/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 549bb113-4721-4145-949d-2305832117f6 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1737-ide2/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/nfs2/r7.2.img,format=qcow2,if=none,id=drive-ide0-0-0,serial=eca38821-c430-48a1-a932-a4814198f24d,cache=none,werror=stop,rerror=stop,aio=threads -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=33 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:90:e4:b5,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-1737-ide2/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -spice port=5900,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,bus=pci.0,addr=0x2 -device AC97,id=sound0,bus=pci.0,addr=0x4 -device i6300esb,id=watchdog0,bus=pci.0,addr=0x9 -watchdog-action reset -device usb-host,id=hostdev0 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on

Comment 15 Jingjing Shao 2016-07-25 02:42:55 UTC
It can be reproduced this issue with the ide disk and the version as  Description 

libvirt-1.1.1-13.el7.x86_64
qemu-kvm-1.5.3-19.el7.x86_64
kernel-3.10.0-54.el7.x86_64

Loop 1024: Migrating vm1 back from 10.66.4.223 to 10.66.5.45
COMMAND: virsh -c qemu+tcp://root.4.223/system migrate --live  --p2p vm1 qemu+tcp://root.5.45/system --verbose
Migration: [100 %]

real	0m3.735s
user	0m0.014s
sys	0m0.006s
PING 10.66.5.6 (10.66.5.6) 56(84) bytes of data.
From 10.66.5.45 icmp_seq=1 Destination Host Unreachable
From 10.66.5.45 icmp_seq=2 Destination Host Unreachable
From 10.66.5.45 icmp_seq=3 Destination Host Unreachable
From 10.66.5.45 icmp_seq=4 Destination Host Unreachable
From 10.66.5.45 icmp_seq=5 Destination Host Unreachable



 virsh list
 Id    Name                           State
----------------------------------------------------
 2122  vm1                            running


when I access vm1, the OS was hung

Comment 16 Juan Quintela 2017-04-25 12:22:41 UTC
Are you testing at which iteration the guest stopped working?  And what is the error message there?

Could you post the script that you use for ping pong testing 1024 times?

Thanks, Juan.