Bug 1164759 - Handle multipage ranges in invalidate_and_set_dirty()
Summary: Handle multipage ranges in invalidate_and_set_dirty()
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Dr. David Alan Gilbert
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-11-17 11:17 UTC by Dr. David Alan Gilbert
Modified: 2015-03-05 09:58 UTC (History)
7 users (show)

Fixed In Version: qemu-kvm-rhev-2.1.2-11.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-03-05 09:58:28 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0624 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2015-03-05 14:37:36 UTC

Description Dr. David Alan Gilbert 2014-11-17 11:17:57 UTC
Description of problem:
(Spotted upstream)
Peter Maydell's message to qemu-devel 2014-11-17 / 1416167061-13203-1-git-send-email-peter.maydell@linaro.org:
'The code in invalidate_and_set_dirty() needs to handle addr/length
combinations which cross guest physical page boundaries. This can happen,
for example, when disk I/O reads large blocks into guest RAM which previously
held code that we have cached translations for. Unfortunately we were only
checking the clean/dirty status of the first page in the range, and then
were calling a tb_invalidate function which only handles ranges that don't
cross page boundaries. Fix the function to deal with multipage ranges.

The symptoms of this bug were that guest code would misbehave (eg segfault),
in particular after a guest reboot but potentially any time the guest
reused a page of its physical RAM for new code.'

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Dr. David Alan Gilbert 2014-11-19 10:11:41 UTC
I'm going to roll this together with 'hw/ide/core.c: Prevent SIGSEGV' which is another fix that's just gone in 2.2 that we should backport, but we don't have a failure case for.

Comment 4 Dr. David Alan Gilbert 2014-11-19 10:44:42 UTC
Actually, scrap that, the ide fix doesn't backport cleanly, so it needs to go separately.

Comment 6 Miroslav Rezanina 2014-11-21 09:46:21 UTC
Fix included in qemu-kvm-rhev-2.1.2-11.el7

Comment 8 huiqingding 2014-12-02 04:49:34 UTC
Hi, David,

Would you please tell us how to reproduce this bug?

Thanks a lot.

Huiqing

Comment 9 Dr. David Alan Gilbert 2014-12-02 09:54:00 UTC
(In reply to huiqingding from comment #8)
> Hi, David,
> 
> Would you please tell us how to reproduce this bug?

Hi Huiqing,
  We haven't got a test for it; the patch is a fix for a theoretical problem.
As long as migration still works correctly we're OK; a good test would be to do lots of IO in the guest during migration.

Dave

Comment 10 huiqingding 2014-12-09 03:07:30 UTC
Test this issue using the following version:
RHEL7.1 src host:
kernel-3.10.0-211.el7.x86_64
qemu-kvm-rhev-2.1.2-16.el7.x86_64

RHEL7.1 dst host:
kernel-3.10.0-211.el7.x86_64
qemu-kvm-rhev-2.1.2-16.el7.x86_64

RHEL7.1 guest
kernel-3.10.0-211.el7.x86_64

Steps to Test:
1. Boot a RHEL7.1 guest using a full line on src host:
#/usr/libexec/qemu-kvm -cpu SandyBridge,enforce \
-enable-kvm  -m 4096 -realtime mlock=off -smp 4,sockets=2,cores=2,threads=1,maxcpus=160 -numa node,cpus=0 \
-numa node,cpus=1 -numa node,cpus=2 -numa node,cpus=3 \
-nodefconfig -nodefaults \
-global PIIX4_PM.disable_s3=0 \
-global PIIX4_PM.disable_s4=0 \
-global ide-drive.physical_block_size=4096 \
-global ide-drive.logical_block_size=4096 \
-global virtio-blk-pci.physical_block_size=512 \
-global virtio-blk-pci.logical_block_size=512 \
-boot order=cdn,once=n,menu=on,strict=on,reboot-timeout=60000 -k en-us \
-soundhw ac97 \
-device virtio-balloon-pci,id=ballooning,bus=pci.0,addr=0x5,indirect_desc=on,event_idx=on,multifunction=on,rombar=100 \
-monitor stdio \
-name test-all-qemu-kvm-option -uuid `uuidgen` \
-drive file=/mnt/virtio-blk-disk,if=none,id=drive-virtio-disk,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop,media=disk,snapshot=off,bus=1,unit=1 \
-device virtio-blk-pci,scsi=off,drive=drive-virtio-disk,id=virtio-disk,bus=pci.0,addr=0x7,physical_block_size=512,logical_block_size=512,multifunction=on,scsi=on,event_idx=on,indirect_desc=on,vectors=16,x-data-plane=off,ioeventfd=on,serial=fuxc,discard_granularity=1,min_io_size=4096,opt_io_size=4096 \
-usbdevice tablet -usbdevice mouse  \
-netdev tap,id=hostnet0,vhost=on,id=hostnet0,script=/etc/qemu-ifup \
-device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=c2:9a:2f:9c:de:10,bus=pci.0,addr=0x9,multifunction=on,status=on,gso=on,ioeventfd=on,vectors=8,indirect_desc=off,event_idx=off,guest_tso4=off,guest_tso6=on,guest_ecn=off,guest_ufo=on,host_tso4=off,host_tso6=on,host_ecn=on,mrg_rxbuf=off,ctrl_vq=on,host_ufo=on,mrg_rxbuf=on,ctrl_rx=on,ctrl_vlan=on,ctrl_rx_extra=on,ctrl_mac_addr=on \
-netdev tap,id=hostnet1,vhost=off,script=/etc/qemu-ifup  \
-device e1000,netdev=hostnet1,id=virtio-net-pci1,mac=1a:d9:71:4a:35:a9,bus=pci.0,addr=0xa,multifunction=off \
-netdev tap,id=hostnet2,vhost=off,script=/etc/qemu-ifup \
-device rtl8139,netdev=hostnet2,id=virtio-net-pci2,mac=22:6f:4e:8f:62:21,bus=pci.0,addr=0xb,multifunction=off \
-serial unix:/tmp/monitor2,server,nowait \
-rtc base=utc -no-shutdown \
-drive file=/mnt/ide-disk,if=none,id=drive-data-disk,format=raw,cache=none,aio=native,werror=stop,rerror=stop,copy-on-read=off,serial=fux-ide,media=disk \
-device ide-drive,drive=drive-data-disk,id=system-disk,logical_block_size=512,physical_block_size=512,min_io_size=32,opt_io_size=64,discard_granularity=512,ver=fuxc-ver,bus=ide.0,unit=0  \
-chardev tty,id=serial1,path=/dev/ttyS0 \
-device isa-serial,chardev=serial1 \
-chardev socket,id=channel1,path=/tmp/helloworld1,server,nowait  \
-chardev socket,id=channel2,path=/tmp/helloworld2,server,nowait \
-device virtio-serial-pci,id=virtio-serial0,max_ports=16,vectors=0 \
-chardev file,id=channel3,path=/mnt/helloworld1.txt \
-device virtserialport,chardev=channel3,name=com.redhat.rhevm.vdsm1,bus=virtio-serial0.0,id=port1,nr=1 \
-chardev socket,id=isa-serial-1,path=/tmp/isa-serial-1,server,nowait \
-device isa-serial,chardev=isa-serial-1 -global pvpanic.ioport=0x0505 \
-machine pc-i440fx-rhel7.1.0,dump-guest-core=off \
-drive file=/mnt/en_windows_7_ultimate_with_sp1_x86_dvd_u_677460.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw \
-device ide-drive,bus=ide.1,unit=1,drive=drive-ide0-1-0,id=ide0-1-0,logical_block_size=512,physical_block_size=512,min_io_size=32,opt_io_size=64,discard_granularity=512,unit=1,ver=fuxc-ver-cdrom,bus=ide.0,unit=1 \
-drive file=/mnt/rhel7_1_1113.qcow2,if=none,id=drive-scsi-disk,format=qcow2,cache=writethrough,werror=stop,rerror=stop \
-device virtio-scsi-pci,id=scsi0,addr=0x13,vectors=16,indirect_desc=on,event_idx=off,hotplug=on,param_change=on,num_queues=1,max_sectors=512,cmd_per_lun=16,multifunction=on,rombar=64 \
-device scsi-hd,drive=drive-scsi-disk,bus=scsi0.0,id=data-disk2,bootindex=1 \
-device sga -spice port=5901,password=redhat-vga,disable-ticketing -vga qxl -global qxl-vga.vram_size=33554432 \
-device intel-hda,id=sound0,bus=pci.0 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 \
-chardev socket,path=/tmp/foo,server,nowait,id=foo \
-drive file=/usr/share/virtio-win/virtio-win_amd64.vfd,if=none,id=drive-fdc0-0-0,readonly=on,format=raw \
-global isa-fdc.driveA=drive-fdc0-0-0 \
-device ich9-usb-uhci6,id=uhci6 \
-device usb-kbd,id=kdb0,bus=uhci6.0 \
-device ich9-usb-uhci5,id=uhci5 \
-device usb-mouse,id=mouse0,bus=uhci5.0 \
-device ich9-usb-uhci4,id=uhci4 \
-device usb-tablet,id=tablet0,bus=uhci4.0 \
-device nec-usb-xhci,id=xhci \
-device usb-bot,id=bot1,bus=xhci.0 \
-drive file=/mnt/driver.iso,if=none,id=usb-cdrom1,format=raw \
-device scsi-cd,bus=bot1.0,scsi-id=0,lun=1,drive=usb-cdrom1,id=usb-cdrom1 \
-drive file=/mnt/bot-disk1,id=usb-disk1,if=none,format=qcow2 \
-device scsi-hd,bus=bot1.0,scsi-id=0,lun=0,drive=usb-disk1,id=usb-disk1 \
-device usb-ehci,id=ehci \
-device usb-bot,id=bot2,bus=ehci.0 \
-drive file=/mnt/driver.iso,if=none,id=usb-cdrom2,format=raw \
-device scsi-cd,bus=bot2.0,scsi-id=0,lun=1,drive=usb-cdrom2,id=usb-cdrom2 \
-drive file=/mnt/bot-disk2,id=usb-disk2,if=none,format=qcow2 \
-device scsi-hd,bus=bot2.0,scsi-id=0,lun=0,drive=usb-disk2,id=usb-disk2 \
-device piix3-usb-uhci,id=usb,bus=pci.0 \
-device usb-tablet \
-device usb-bot,id=bot3,bus=usb.0 \
-drive file=/mnt/driver.iso,if=none,id=usb-cdrom3,format=raw \
-device scsi-cd,bus=bot3.0,scsi-id=0,lun=1,drive=usb-cdrom3,id=usb-cdrom3 \
-drive file=/mnt/bot-disk3,id=usb-disk3,if=none,format=qcow2 \
-device scsi-hd,bus=bot3.0,scsi-id=0,lun=0,drive=usb-disk3,id=usb-disk3 \
-device ich9-usb-uhci3,id=uhci \
-device usb-storage,drive=drive-usb-0,id=usb-0,removable=on,bus=uhci.0,port=1 \
-drive file=/mnt/usb-uhci,if=none,id=drive-usb-0,media=disk,format=qcow2 \
-device ich9-usb-ehci1,id=ehci1 \
-device usb-storage,drive=drive-usb-1,id=usb-1,removable=on,bus=ehci1.0,port=1 \
-drive file=/mnt/usb-ehci,if=none,id=drive-usb-1,media=disk,format=qcow2 \
-device nec-usb-xhci,id=xhci1 \
-device usb-storage,drive=drive-usb-2,id=usb-2,removable=on,bus=xhci1.0,port=1 \
-drive file=/mnt/usb-xhci,if=none,id=drive-usb-2,media=disk,format=qcow2 \
-cdrom /mnt/driver.iso \
-watchdog ib700 -watchdog-action reset \
-device virtio-serial-pci,id=virtio-serial1,bus=pci.0,addr=0x1a \
-chardev spicevmc,id=charchannel0,name=vdagent \
-device virtserialport,bus=virtio-serial1.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 \

2. boot the guest with "-incoming tcp:0:5800" in dst host

3. inside guest, do lots of io operation:
for i `seq 10`
do
    dd if=/dev/urandom of=file$i bs=1M count=4096 &
done

4. do ping-pong migration for 5 times
(qemu) migrate -d tcp:10.66.9.152:5800

Results:
after step4, each migration can be finished normally, check dmesg there is no error info.

Comment 11 huiqingding 2014-12-09 03:19:42 UTC
QE also do function test for migration:

Test run for qemu-kvm-rhev-2.1.2-13.el7 - migration&rdma&auto coverage - win8.1-64
https://tcms.engineering.redhat.com/run/202424/

Not found regression bug, only find one bz1167197 :
Bug 1167197 - qemu-kvm can not cancel migration in src host when network of dst host failed

Comment 14 errata-xmlrpc 2015-03-05 09:58:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0624.html


Note You need to log in before you can comment on or make changes to this bug.