Bug 1653542

Summary: [RFE] Implement a "live-migration friendly" disk cache mode [rhel8-fast]
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: CongLi <coli>
Component: qemu-kvmAssignee: Stefan Hajnoczi <stefanha>
Status: CLOSED ERRATA QA Contact: CongLi <coli>
Severity: medium Docs Contact:
Priority: high    
Version: 8.0CC: areis, chayang, coli, ddepaula, dgilbert, hpopal, jinzhao, juzhang, knoel, mrezanin, mtessun, qzhang, rbalakri, sfroemer, slopezpa, stefanha, virt-maint, xianwang, xiaohli, xuwei, yhong, yuhuang
Target Milestone: rcKeywords: FutureFeature, TestOnly
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-4.0.0-3.module+el8.1.0+3265+26c4ed71 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 1568285
: 1660575 (view as bug list) Environment:
Last Closed: 2019-11-06 07:12:13 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1623566, 1660575    

Comment 6 CongLi 2019-01-09 07:00:29 UTC
Hi Stefan,

QE have a test on qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64, met 2 different core dumps, could you please help confirm if the usage if correct or if all relevant patches have been merged into qemu 3.1 (fixed in version qemu-3.0)?

1. raw image:

Step 1: boot up src and dst guest (in the same host)
src:    
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,format=raw,file=/home/kvm_autotest_root/images/win2019-64-virtio.raw,file.x-check-cache-dropped=on,cache=writeback \
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pcie.0-root-port-6,addr=0x0 \
dst:
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,format=raw,file=/home/kvm_autotest_root/images/win2019-64-virtio.raw,file.x-check-cache-dropped=on,cache=writeback \
    -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
    -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pcie.0-root-port-6,addr=0x0 \
    -incoming tcp:0:5888 \    

Step 2: migrate src to dst ({"execute": "migrate","arguments":{"uri": "tcp:0:5888"}})
Step 3: resume dst guest when migration finished, qemu core dump.

(qemu) qemu-kvm: block/io.c:1619: bdrv_co_write_req_prepare: Assertion `child->perm & BLK_PERM_WRITE' failed.
drive1.sh: line 23: 31700 Aborted                 (core dumped) MALLOC_PERTURB_=1 /usr/libexec/qemu-kvm -S -name 'avocado-vt-vm1' -machine q35 -nodefaults -device VGA,bus=pcie.0,addr=0x1 -drive id=drive_image1,if=none,snapshot=off,aio=threads,format=raw,file=/home/kvm_autotest_root/images/win2019-64-virtio.raw,file.x-check-cache-dropped=on,cache=writeback -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pcie.0-root-port-6,addr=0x0 -device pcie-root-port,id=pcie.0-root-port-8,slot=8,chassis=8,addr=0x8,bus=pcie.0 -device virtio-net-pci,mac=9a:13:14:15:16:17,id=id3oiTUl,vectors=4,netdev=idA1dqov,bus=pcie.0-root-port-8,addr=0x0 -netdev tap,id=idA1dqov,vhost=on -m 15360 -smp 12,maxcpus=12,cores=6,threads=1,sockets=2 -cpu 'Opteron_G5',+kvm_pv_unhalt,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time -vnc :1 -rtc base=localtime,clock=host,driftfix=slew -boot order=cdn,once=c,menu=off,strict=off -enable-kvm -monitor stdio -qmp tcp:localhost:6666,server,nowait -incoming tcp:0:5888

#0  0x00007f7d8d35793f in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f7d8d341c95 in __GI_abort () at abort.c:79
#2  0x00007f7d8d341b69 in __assert_fail_base
    (fmt=0x7f7d8d4a8d70 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x56252cb61103 "child->perm & BLK_PERM_WRITE", file=0x56252cb610ed "block/io.c", line=1619, function=<optimized out>) at assert.c:92
#3  0x00007f7d8d34fdf6 in __GI___assert_fail
    (assertion=assertion@entry=0x56252cb61103 "child->perm & BLK_PERM_WRITE", file=file@entry=0x56252cb610ed "block/io.c", line=line@entry=1619, function=function@entry=0x56252cb61a50 <__PRETTY_FUNCTION__.26609> "bdrv_co_write_req_prepare") at assert.c:101
#4  0x000056252c96dd15 in bdrv_co_write_req_prepare
    (child=<optimized out>, child=<optimized out>, flags=0, req=0x7f7d505ffeb0, bytes=4096, offset=608976896) at block/io.c:1619
#5  0x000056252c9713de in bdrv_co_write_req_prepare
    (child=0x56252db11fa0, child=0x56252db11fa0, flags=0, req=0x7f7d505ffeb0, bytes=4096, offset=608976896) at block/io.c:1619
#6  0x000056252c9713de in bdrv_aligned_pwritev
    (child=child@entry=0x56252db11fa0, req=req@entry=0x7f7d505ffeb0, offset=offset@entry=608976896, bytes=bytes@entry=4096, align=align@entry=1, qiov=qiov@entry=0x56252dd85370, flags=0) at block/io.c:1699
--Type <RET> for more, q to quit, c to continue without paging--
#7  0x000056252c9723c9 in bdrv_co_pwritev
    (child=0x56252db11fa0, offset=offset@entry=608976896, bytes=bytes@entry=4096, qiov=qiov@entry=0x56252dd85370, flags=flags@entry=0) at block/io.c:1961
#8  0x000056252c9603b1 in blk_co_pwritev
    (blk=0x56252db8d1a0, offset=608976896, bytes=4096, qiov=0x56252dd85370, flags=0)
    at block/block-backend.c:1203
#9  0x000056252c96044e in blk_aio_write_entry (opaque=0x56252e83b810)
    at block/block-backend.c:1409
#10 0x000056252ca00803 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
    at util/coroutine-ucontext.c:116
#11 0x00007f7d8d36d600 in __start_context ()
    at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
#12 0x00007ffd4062cd10 in  ()
#13 0x0000000000000000 in  ()


2. qcow2 image:

Same steps as raw image.

(qemu) qemu-kvm: util/error.c:57: error_setv: Assertion `*errp == NULL' failed.
drive1.sh: line 23: 31450 Aborted                 (core dumped) MALLOC_PERTURB_=1 /usr/libexec/qemu-kvm -S -name 'avocado-vt-vm1' -machine q35 -nodefaults -device VGA,bus=pcie.0,addr=0x1 -drive id=drive_image1,if=none,snapshot=off,aio=threads,format=qcow2,file=/home/kvm_autotest_root/images/win2019-64-virtio.qcow2,file.x-check-cache-dropped=on,cache=writeback -device pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pcie.0-root-port-6,addr=0x0 -device pcie-root-port,id=pcie.0-root-port-8,slot=8,chassis=8,addr=0x8,bus=pcie.0 -device virtio-net-pci,mac=9a:13:14:15:16:17,id=id3oiTUl,vectors=4,netdev=idA1dqov,bus=pcie.0-root-port-8,addr=0x0 -netdev tap,id=idA1dqov,vhost=on -m 15360 -smp 12,maxcpus=12,cores=6,threads=1,sockets=2 -cpu 'Opteron_G5',+kvm_pv_unhalt,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time -vnc :1 -rtc base=localtime,clock=host,driftfix=slew -boot order=cdn,once=c,menu=off,strict=off -enable-kvm -monitor stdio -qmp tcp:localhost:6666,server,nowait -incoming tcp:0:5888

#0  0x00007f71e06b793f in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f71e06a1c95 in __GI_abort () at abort.c:79
#2  0x00007f71e06a1b69 in __assert_fail_base
    (fmt=0x7f71e0808d70 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x558b816054bd "*errp == NULL", file=0x558b816054b0 "util/error.c", line=57, function=<optimized out>) at assert.c:92
#3  0x00007f71e06afdf6 in __GI___assert_fail
    (assertion=assertion@entry=0x558b816054bd "*errp == NULL", file=file@entry=0x558b816054b0 "util/error.c", line=line@entry=57, function=function@entry=0x558b81605578 <__PRETTY_FUNCTION__.15573> "error_setv") at assert.c:101
#4  0x0000558b81480395 in error_setv
    (errp=0x7f71ad7faf30, src=0x558b815eee0e "block/file-posix.c", line=2554, func=0x558b815ef810 <__func__.27556> "check_cache_dropped", err_class=ERROR_CLASS_GENERIC_ERROR, fmt=0x558b815eef87 "page cache still in use!", ap=0x7f71ad7fade0, suffix=0x0)
    at util/error.c:57
#5  0x0000558b814804e4 in error_setg_internal
    (errp=errp@entry=0x7f71ad7faf30, src=src@entry=0x558b815eee0e "block/file-posix.c", line=line@entry=2554, func=func@entry=0x558b815ef810 <__func__.27556> "check_cache_dropped", fmt=fmt@entry=0x558b815eef87 "page cache still in use!") at util/error.c:95
#6  0x0000558b813f5a7e in check_cache_dropped (errp=0x7f71ad7faf30, bs=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--
    at block/file-posix.c:2554
#7  0x0000558b813f5a7e in raw_co_invalidate_cache
    (bs=<optimized out>, errp=0x7f71ad7faf30) at block/file-posix.c:2603
#8  0x0000558b813ba4d5 in bdrv_co_invalidate_cache
    (bs=0x558b82851b10, errp=errp@entry=0x7f71ad7faf70) at block.c:4531
#9  0x0000558b813ba44a in bdrv_co_invalidate_cache
    (bs=0x558b8284b450, errp=0x7fff9837f8a8) at block.c:4500
#10 0x0000558b813ba664 in bdrv_invalidate_cache_co_entry (opaque=0x7fff9837f860)
    at block.c:4572
#11 0x0000558b8148f803 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
    at util/coroutine-ucontext.c:116
#12 0x00007f71e06cd600 in __start_context ()
    at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
#13 0x00007fff9837f090 in  ()
#14 0x0000000000000000 in  ()


Thanks.

Comment 7 CongLi 2019-01-09 07:22:30 UTC
(In reply to CongLi from comment #6) 
> Step 1: boot up src and dst guest (in the same host)
> src:    
>     -drive
> id=drive_image1,if=none,snapshot=off,aio=threads,format=raw,file=/home/
> kvm_autotest_root/images/win2019-64-virtio.raw,file.x-check-cache-dropped=on,
> cache=writeback \
>     -device
> pcie-root-port,id=pcie.0-root-port-6,slot=6,chassis=6,addr=0x6,bus=pcie.0 \
>     -device
> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pcie.0-root-port-
> 6,addr=0x0 \

Correct:
no file.x-check-cache-dropped=on cache in src side.

Comment 9 Stefan Hajnoczi 2019-01-09 13:46:15 UTC
(In reply to CongLi from comment #6)
> Hi Stefan,
> 
> QE have a test on qemu-kvm-3.1.0-3.module+el8+2614+d714d2bb.x86_64, met 2
> different core dumps, could you please help confirm if the usage if correct
> or if all relevant patches have been merged into qemu 3.1 (fixed in version
> qemu-3.0)?
> 
> 1. raw image:
> 
> Step 1: boot up src and dst guest (in the same host)

x-check-cache-dropped=on doesn't work well if you migrate on the same host - and the consistency problems that this patch solves only happens when migrating between two different hosts.  Therefore testing on a single host is not effective.

Please test migration between two different hosts using shared storage (e.g. an image file on NFS or a SAN LUN).

Comment 11 xianwang 2019-01-10 05:38:41 UTC
Hi, Dave,
a) Referring to comment 8, I understand it means we only need to test this function on fast train, yes?

b) About our current migration test, we just only use "cache=none" on both source side and destination side, so for this RFE issue, we need to add a polarion case that covering following scope: 
(qemu cli:)
source <----------------> destination 
"cache=writeback" <-----> "cache=writeback,file.x-check-cache-dropped=on"
"cache=writethrough" <--> "cache=writethrough,file.x-check-cache-dropped=on"
do you think so?

c)Due to cache=none and cache=directsync are still working, could you give their priorities? 
I just guess the priority of cache=none is still P1. All the cache mode is as following:
"cache=none"<-----------> "cache=none"
"cache=directsync:<-----> "cache=directsync"
"cache=writeback" <-----> "cache=writeback,file.x-check-cache-dropped=on"
"cache=writethrough" <--> "cache=writethrough,file.x-check-cache-dropped=on"

If there is something wrong, please point it out, thanks

Comment 12 xianwang 2019-01-10 09:21:32 UTC
(In reply to xianwang from comment #11)
> Hi, Dave,
> a) Referring to comment 8, I understand it means we only need to test this
> function on fast train, yes?
> 
> b) About our current migration test, we just only use "cache=none" on both
> source side and destination side, so for this RFE issue, we need to add a
> polarion case that covering following scope: 
> (qemu cli:)
> source <----------------> destination 
> "cache=writeback" <-----> "cache=writeback,file.x-check-cache-dropped=on"
> "cache=writethrough" <--> "cache=writethrough,file.x-check-cache-dropped=on"
> do you think so?
> 
Further more, does the "forward and backward migration" need to test this function?  
e.g, rhel7.6<-->rhel8.0, if it is needed, does it right that we could only test migration from rhel7.6 to rhel8.0  but could not from rhel8.0 to rhel7.6 ? because "file.x-check-cache-dropped" is not supported on rhel7.6 and it must be specified on destination side.  

> c)Due to cache=none and cache=directsync are still working, could you give
> their priorities? 
> I just guess the priority of cache=none is still P1. All the cache mode is
> as following:
> "cache=none"<-----------> "cache=none"
> "cache=directsync:<-----> "cache=directsync"
> "cache=writeback" <-----> "cache=writeback,file.x-check-cache-dropped=on"
> "cache=writethrough" <--> "cache=writethrough,file.x-check-cache-dropped=on"
> 
> If there is something wrong, please point it out, thanks

Comment 13 Dr. David Alan Gilbert 2019-01-10 10:39:39 UTC
(In reply to xianwang from comment #11)
> Hi, Dave,

It's probably best to ask Stefan since these are his changes.

> a) Referring to comment 8, I understand it means we only need to test this
> function on fast train, yes?

Check with Stefan as to which version implements this feature; I think it is
currently after 2.12, so just fast train; I don't see why comment 8 is relevant.
 
> b) About our current migration test, we just only use "cache=none" on both
> source side and destination side, so for this RFE issue, we need to add a
> polarion case that covering following scope: 
> (qemu cli:)
> source <----------------> destination 
> "cache=writeback" <-----> "cache=writeback,file.x-check-cache-dropped=on"
> "cache=writethrough" <--> "cache=writethrough,file.x-check-cache-dropped=on"
> do you think so?

Again, check with Stefan.
 
> c)Due to cache=none and cache=directsync are still working, could you give
> their priorities? 
> I just guess the priority of cache=none is still P1. All the cache mode is
> as following:
> "cache=none"<-----------> "cache=none"
> "cache=directsync:<-----> "cache=directsync"
> "cache=writeback" <-----> "cache=writeback,file.x-check-cache-dropped=on"
> "cache=writethrough" <--> "cache=writethrough,file.x-check-cache-dropped=on"
> 
> If there is something wrong, please point it out, thanks

Note that the 'x-check-cache-dropped' is just a test feature to help us
check that it's safe.  You should also be testing with cache=writeback(?) without
the x-check-cache-dropped to just check that migration works reliably in those modes.

ALso, you need to do a check with postcopy, and a test with a migration cancel.

Best to ask Stefan as to which cache= mode is now recommended.
Adding stefan for needinfo

Comment 14 CongLi 2019-01-11 08:50:59 UTC
Migration could be completed successfully in multi hosts with shared storage.


Hi Stefan,

Could you please give some guidance of triggering a verification failure ?

"""    
    mincore(2) checks whether pages are resident.  Use it to verify that
    page cache has been dropped.
    
    You can trigger a verification failure by mmapping the image file from
    another process that loads a byte from a page, forcing it to become
    resident.  bdrv_co_invalidate_cache() will fail while that process is
    alive.
"""

Thanks.

Comment 15 Stefan Hajnoczi 2019-01-11 09:32:47 UTC
(In reply to CongLi from comment #14)
> Could you please give some guidance of triggering a verification failure ?
> 
> """    
>     mincore(2) checks whether pages are resident.  Use it to verify that
>     page cache has been dropped.
>     
>     You can trigger a verification failure by mmapping the image file from
>     another process that loads a byte from a page, forcing it to become
>     resident.  bdrv_co_invalidate_cache() will fail while that process is
>     alive.
> """

Run this script on the destination host and leave it running during migration:

  $ cat mmap-image.py 
  #!/usr/bin/python2
  import sys
  import mmap

  with open(sys.argv[1], 'rb') as f:
      m = mmap.mmap(f.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ)
      print 'First byte:', m[0].encode('hex')
      raw_input('Waiting... (Press Ctrl+C to interrupt)')
  $ python2 mmap-image.py path/to/vm.img

When migration completes the x-check-cache-dropped= check will fail because the first page of the image file is in RAM.

Comment 16 Stefan Hajnoczi 2019-01-11 09:57:05 UTC
(In reply to xianwang from comment #11)
> a) Referring to comment 8, I understand it means we only need to test this
> function on fast train, yes?

It's hard to tell if cache=writeback migration completed safely without x-check-cached-dropped=on, so I think it should be tested in both fast train and rhel streams (assuming they both ship this feature).
 
> b) About our current migration test, we just only use "cache=none" on both
> source side and destination side, so for this RFE issue, we need to add a
> polarion case that covering following scope: 
> (qemu cli:)
> source <----------------> destination 
> "cache=writeback" <-----> "cache=writeback,file.x-check-cache-dropped=on"
> "cache=writethrough" <--> "cache=writethrough,file.x-check-cache-dropped=on"
> do you think so?

Yes, please.

I wouldn't worry about cross-version migration for cache=writeback (e.g. 7.6 -> 8.0).

> c)Due to cache=none and cache=directsync are still working, could you give
> their priorities? 
> I just guess the priority of cache=none is still P1. All the cache mode is
> as following:
> "cache=none"<-----------> "cache=none"
> "cache=directsync:<-----> "cache=directsync"
> "cache=writeback" <-----> "cache=writeback,file.x-check-cache-dropped=on"
> "cache=writethrough" <--> "cache=writethrough,file.x-check-cache-dropped=on"

From my perspective cache=writeback migration is a low-priority feature.  We don't expect many users to rely on it because cache=none is and will remain the recommended setting.

Comment 17 CongLi 2019-01-15 08:44:43 UTC
(In reply to Stefan Hajnoczi from comment #15)
> (In reply to CongLi from comment #14)
> > Could you please give some guidance of triggering a verification failure ?
> > 
> > """    
> >     mincore(2) checks whether pages are resident.  Use it to verify that
> >     page cache has been dropped.
> >     
> >     You can trigger a verification failure by mmapping the image file from
> >     another process that loads a byte from a page, forcing it to become
> >     resident.  bdrv_co_invalidate_cache() will fail while that process is
> >     alive.
> > """
> 
> Run this script on the destination host and leave it running during
> migration:
> 
>   $ cat mmap-image.py 
>   #!/usr/bin/python2
>   import sys
>   import mmap
> 
>   with open(sys.argv[1], 'rb') as f:
>       m = mmap.mmap(f.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ)
>       print 'First byte:', m[0].encode('hex')
>       raw_input('Waiting... (Press Ctrl+C to interrupt)')
>   $ python2 mmap-image.py path/to/vm.img
> 
> When migration completes the x-check-cache-dropped= check will fail because
> the first page of the image file is in RAM.


Hi Stefan,

Could you please specify what's the failure would be?

I could not trigger the failure with this script, I've tried many times 
but all failed, migration could be completed successfully, guest works well 
and no error in qemu, script also executes well, could you please help confirm it?

1. Run the script on dst host

2. Do migration from src to dst with cache=writeback, dst with x-check-cache-dropped.

3. When migration completed, no error occurred.

# python3 mmap-image.py /mnt/rhel80-64-virtio.qcow2 
First byte: 0x51
Waiting... (Press Ctrl+C to interrupt)
#

mmap-image.py with python3:
# cat mmap-image.py 
#!/usr/bin/python3
import sys
import mmap

with open(sys.argv[1], 'rb') as f:
    m = mmap.mmap(f.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ)
    print('First byte: %s' % hex(m[0]))
    input('Waiting... (Press Ctrl+C to interrupt)')


Thanks.

Comment 18 Stefan Hajnoczi 2019-01-15 16:13:47 UTC
I'm not 100% sure but it could be because you are using a qcow2 image file and using x-check-cache-dropped= requires a different command-line in that case.  Previously in this BZ we discussed command-lines for a raw image file.

On my machine I can trigger the warning like this:

  (src)# qemu-system-x86_64 -M accel=kvm -m 1G -drive if=virtio,file=test.img,format=raw,cache=writeback
  (dst)# qemu-system-x86_64 -M accel=kvm -m 1G -drive if=virtio,file=test.img,format=raw,cache=writeback,file.x-check-cache-dropped=on -incoming tcp::1234
  (dst)# python3 mmap-image.py test.img
  First byte: 0xeb
  Waiting... (Press Ctrl+C to interrupt)

  (src-qemu) migrate tcp:...:1234

Now the destination QEMU prints the following warning:

  qemu-system-x86_64: page cache still in use!

If I skip the python script then the warning is not printed.

Comment 19 CongLi 2019-01-16 06:14:48 UTC
(In reply to Stefan Hajnoczi from comment #18)
> I'm not 100% sure but it could be because you are using a qcow2 image file
> and using x-check-cache-dropped= requires a different command-line in that
> case.  Previously in this BZ we discussed command-lines for a raw image file.
> 
> On my machine I can trigger the warning like this:

Hi Stefan,

Could you please provide the qemu version ?

QE used latest downstream version qemu-kvm-3.1.0-4.module+el8+2681+819ab34d.x86_64 
but still fail to trigger the failure.

I used the CML you provided:

(src) # /usr/libexec/qemu-kvm -M accel=kvm -m 1G -drive if=virtio,
file=/mnt/rhel80-64-virtio.raw,format=raw,cache=writeback -monitor stdio
(dst) # /usr/libexec/qemu-kvm -M accel=kvm -m 1G -drive if=virtio,
file=/mnt/rhel80-64-virtio.raw,format=raw,cache=writeback,file.x-check-cache-dropped=on 
-incoming tcp:0:1234 -monitor stdio
(dst) # python3 mmap-image.py /mnt/rhel80-64-virtio.raw
First byte: 0xeb
Waiting... (Press Ctrl+C to interrupt) 

(src) (qemu) migrate tcp:10.73.130.201:1234
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off 
Migration status: completed
total time: 49026 milliseconds
downtime: 49 milliseconds
setup: 15 milliseconds
transferred ram: 916941 kbytes
throughput: 153.33 mbps
remaining ram: 0 kbytes
total ram: 1065800 kbytes
duplicate: 61581 pages
skipped: 0 pages
normal: 228653 pages
normal bytes: 914612 kbytes
dirty sync count: 5
page size: 4 kbytes
multifd bytes: 0 kbytes
(qemu) 
(qemu) info status
VM status: paused (postmigrate)


The dst QEMU prints nothing:
(qemu)
(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off 
Migration status: completed
total time: 0 milliseconds
(qemu) info status
VM status: running
(qemu)

> 
>   (src)# qemu-system-x86_64 -M accel=kvm -m 1G -drive
> if=virtio,file=test.img,format=raw,cache=writeback
>   (dst)# qemu-system-x86_64 -M accel=kvm -m 1G -drive
> if=virtio,file=test.img,format=raw,cache=writeback,file.x-check-cache-
> dropped=on -incoming tcp::1234
>   (dst)# python3 mmap-image.py test.img
>   First byte: 0xeb
>   Waiting... (Press Ctrl+C to interrupt)
> 
>   (src-qemu) migrate tcp:...:1234
> 
> Now the destination QEMU prints the following warning:
> 
>   qemu-system-x86_64: page cache still in use!
> 
> If I skip the python script then the warning is not printed.

Comment 21 Stefan Hajnoczi 2019-01-17 13:56:20 UTC
(In reply to CongLi from comment #19)
> (In reply to Stefan Hajnoczi from comment #18)
> > I'm not 100% sure but it could be because you are using a qcow2 image file
> > and using x-check-cache-dropped= requires a different command-line in that
> > case.  Previously in this BZ we discussed command-lines for a raw image file.
> > 
> > On my machine I can trigger the warning like this:
> 
> Hi Stefan,
> 
> Could you please provide the qemu version ?
> 
> QE used latest downstream version
> qemu-kvm-3.1.0-4.module+el8+2681+819ab34d.x86_64 
> but still fail to trigger the failure.
> 
> I used the CML you provided:
> 
> (src) # /usr/libexec/qemu-kvm -M accel=kvm -m 1G -drive if=virtio,
> file=/mnt/rhel80-64-virtio.raw,format=raw,cache=writeback -monitor stdio
> (dst) # /usr/libexec/qemu-kvm -M accel=kvm -m 1G -drive if=virtio,
> file=/mnt/rhel80-64-virtio.raw,format=raw,cache=writeback,file.x-check-cache-
> dropped=on 
> -incoming tcp:0:1234 -monitor stdio
> (dst) # python3 mmap-image.py /mnt/rhel80-64-virtio.raw
> First byte: 0xeb
> Waiting... (Press Ctrl+C to interrupt) 
> 
> (src) (qemu) migrate tcp:10.73.130.201:1234

I reproduced your results.  It seems all cached pages in the file are dropped on flush (could be NFS-specific).  It's still possible to make the test fail on purpose by modifying the script:

  #!/usr/bin/python3
  import sys
  import mmap

  with open(sys.argv[1], 'rb') as f:
      m = mmap.mmap(f.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ)
      while True:
          x = m[0]

There is now a race condition between this script accessing m[0] (which fetches the page) and QEMU checking if pages are resident in memory.  If you run it a few times you'll see the "qemu-system-x86_64: page cache still in use!" warning printed by the destination QEMU.

Hope this helps you demonstrate that the check can fail.

In any case, what you've found is good news - it means that with NFS shared storage live migration the risk of inconsistencies is low because cached pages are dropped on the destination.  It takes work to make it fail :).

Comment 22 Stefan Hajnoczi 2019-01-17 13:59:19 UTC
(In reply to Stefan Hajnoczi from comment #21)
> (In reply to CongLi from comment #19)
> > (In reply to Stefan Hajnoczi from comment #18)
> In any case, what you've found is good news - it means that with NFS shared
> storage live migration the risk of inconsistencies is low because cached
> pages are dropped on the destination.  It takes work to make it fail :).

I should clarify that the QEMU patches from this BZ are still necessary.  The pages are probably dropped thanks to the flush that was added on live migration handover.

Comment 24 CongLi 2019-01-18 06:00:18 UTC
(In reply to Stefan Hajnoczi from comment #21)
> I reproduced your results.  It seems all cached pages in the file are
> dropped on flush (could be NFS-specific).  It's still possible to make the
> test fail on purpose by modifying the script:
> 
>   #!/usr/bin/python3
>   import sys
>   import mmap
> 
>   with open(sys.argv[1], 'rb') as f:
>       m = mmap.mmap(f.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ)
>       while True:
>           x = m[0]
> 
> There is now a race condition between this script accessing m[0] (which
> fetches the page) and QEMU checking if pages are resident in memory.  If you
> run it a few times you'll see the "qemu-system-x86_64: page cache still in
> use!" warning printed by the destination QEMU.
> 
> Hope this helps you demonstrate that the check can fail.
> 
> In any case, what you've found is good news - it means that with NFS shared
> storage live migration the risk of inconsistencies is low because cached
> pages are dropped on the destination.  It takes work to make it fail :).

Thanks Stefan, it works now.
Could trigger the failure pages are resident in memory.
dst: '(qemu) qemu-kvm: page cache still in use!'

1. Could you please share the command line of qcow2 image ?

2. And could you please help confirm the way of using 'fincore' tool to confirm the 
pages are resident.


Thanks.

Comment 25 Stefan Hajnoczi 2019-01-21 11:30:58 UTC
(In reply to CongLi from comment #24)
> (In reply to Stefan Hajnoczi from comment #21)
> 1. Could you please share the command line of qcow2 image ?

I've tested that the raw command-line also works for qcow2:

  -drive if=virtio,file=test.qcow2,format=raw,cache=writeback,file.x-check-cache-dropped=on

Sorry, I thought it would be necessary to use a different syntax, but I was wrong.

> 2. And could you please help confirm the way of using 'fincore' tool to
> confirm the 
> pages are resident.

It is not possible to accurately use fincore(1) since the check must be performed at the moment of live migration handover.  I'm not aware of a way to pause the destination QEMU at the right point in time when fincore(1) should be run.  Immediately afterwards the destination QEMU may begin accessing the image file again and the page cache will become populated.

Comment 26 CongLi 2019-02-20 10:54:23 UTC
(In reply to Stefan Hajnoczi from comment #25)
> (In reply to CongLi from comment #24)
> > (In reply to Stefan Hajnoczi from comment #21)
> > 1. Could you please share the command line of qcow2 image ?
> 
> I've tested that the raw command-line also works for qcow2:
> 
>   -drive
> if=virtio,file=test.qcow2,format=raw,cache=writeback,file.x-check-cache-
> dropped=on
> 
> Sorry, I thought it would be necessary to use a different syntax, but I was
> wrong.
> 
> > 2. And could you please help confirm the way of using 'fincore' tool to
> > confirm the 
> > pages are resident.
> 
> It is not possible to accurately use fincore(1) since the check must be
> performed at the moment of live migration handover.  I'm not aware of a way
> to pause the destination QEMU at the right point in time when fincore(1)
> should be run.  Immediately afterwards the destination QEMU may begin
> accessing the image file again and the page cache will become populated.


Thanks Stefan for the explanation.

Tested on qemu-kvm-3.1.0-15.module+el8+2792+e33e01a0.x86_64:

Qcow2 works well too with x-check-cache-dropped=on.
QEMU could trigger the failure pages are resident in memory.
dst: '(qemu) qemu-kvm: page cache still in use!'


Thanks.

Comment 27 Stefan Hajnoczi 2019-03-21 10:28:15 UTC
Note I've changed Fixed In to QEMU 4.0.0 because we need commit f357fcd890a8d6ced6d261338b859a41414561e9 ("file-posix: add drop-cache=on|off option") so that libvirt can detect this feature.

Comment 29 CongLi 2019-05-14 11:34:42 UTC
(In reply to Stefan Hajnoczi from comment #27)
> Note I've changed Fixed In to QEMU 4.0.0 because we need commit
> f357fcd890a8d6ced6d261338b859a41414561e9 ("file-posix: add drop-cache=on|off
> option") so that libvirt can detect this feature.

Tested on qemu-kvm-4.0.0-0.module+el8.1.0+3169+3c501422.x86_64,
seems drop-cache option has not been merged in downstream.

The result is same for raw.
qemu-kvm: -drive if=none,file=/home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2,format=qcow2,cache=writeback,file.x-check-cache-dropped=on,drop-cache=on,id=drive_image1: Block format 'qcow2' does not support the option 'drop-cache'

Comment 30 Stefan Hajnoczi 2019-05-17 14:02:43 UTC
(In reply to CongLi from comment #29)
> (In reply to Stefan Hajnoczi from comment #27)
> > Note I've changed Fixed In to QEMU 4.0.0 because we need commit
> > f357fcd890a8d6ced6d261338b859a41414561e9 ("file-posix: add drop-cache=on|off
> > option") so that libvirt can detect this feature.
> 
> Tested on qemu-kvm-4.0.0-0.module+el8.1.0+3169+3c501422.x86_64,
> seems drop-cache option has not been merged in downstream.

I asked Mirek where this build came from and it's a virt-preview build.  Please try the regular RHEL-AV 8.1.0 qemu-kvm builds that are out now.  I have verified that RHEL-AV 8.1.0 qemu-kvm has the required patch.

Comment 31 CongLi 2019-05-23 10:39:50 UTC
(In reply to Stefan Hajnoczi from comment #30)
> (In reply to CongLi from comment #29)
> > (In reply to Stefan Hajnoczi from comment #27)
> > > Note I've changed Fixed In to QEMU 4.0.0 because we need commit
> > > f357fcd890a8d6ced6d261338b859a41414561e9 ("file-posix: add drop-cache=on|off
> > > option") so that libvirt can detect this feature.
> > 
> > Tested on qemu-kvm-4.0.0-0.module+el8.1.0+3169+3c501422.x86_64,
> > seems drop-cache option has not been merged in downstream.
> 
> I asked Mirek where this build came from and it's a virt-preview build. 
> Please try the regular RHEL-AV 8.1.0 qemu-kvm builds that are out now.  I
> have verified that RHEL-AV 8.1.0 qemu-kvm has the required patch.

I've tested on latest downstream qemu qemu-kvm-4.0.0-1.module+el8.1.0+3225+a8268fde.x86_64,
still met this issue. 
# /usr/libexec/qemu-kvm -drive if=none,file=/home/kvm_autotest_root/images/win2019-64-virtio-scsi.qcow2,format=qcow2,cache=writeback,file.x-check-cache-dropped=on,drop-cache=on,id=drive_image1

qemu-kvm: -drive if=none,file=/home/kvm_autotest_root/images/win2019-64-virtio-scsi.qcow2,format=qcow2,cache=writeback,file.x-check-cache-dropped=on,drop-cache=on,id=drive_image1: Block format 'qcow2' does not support the option 'drop-cache'


Hi Miroslav,

Could you please help confirm it ?

Thanks.

Comment 32 Danilo de Paula 2019-06-04 23:17:46 UTC
Moved to ON_QA for retesting.

Comment 33 Miroslav Rezanina 2019-06-05 09:47:48 UTC
Commit from comment #27 is present in the tree. In case this issue is still reproducible, we need additional fix.

Comment 34 CongLi 2019-06-18 11:34:44 UTC
Hi Stefan,

The issue is still existed on qemu-kvm-4.0.0-4.module+el8.1.0+3356+cda7f1ee.x86_64.

qemu-kvm: -drive if=none,file=/home/kvm_autotest_root/images/rhel810-64-virtio.qcow2,format=qcow2,cache=writeback,file.x-check-cache-dropped=on,drop-cache=on,id=drive_image1: Block format 'qcow2' does not support the option 'drop-cache'

Could you please help confirm the status of this bug ?

Thanks.

Comment 35 Stefan Hajnoczi 2019-07-23 11:49:43 UTC
(In reply to CongLi from comment #34)
> Hi Stefan,
> 
> The issue is still existed on
> qemu-kvm-4.0.0-4.module+el8.1.0+3356+cda7f1ee.x86_64.
> 
> qemu-kvm: -drive
> if=none,file=/home/kvm_autotest_root/images/rhel810-64-virtio.qcow2,
> format=qcow2,cache=writeback,file.x-check-cache-dropped=on,drop-cache=on,
> id=drive_image1: Block format 'qcow2' does not support the option
> 'drop-cache'
> 
> Could you please help confirm the status of this bug ?

Please use file.drop-cache=on instead of drop-cache=on.

Comment 36 Ademar Reis 2019-07-30 18:39:05 UTC
(In reply to Miroslav Rezanina from comment #33)
> Commit from comment #27 is present in the tree. In case this issue is still
> reproducible, we need additional fix.

We just needed an adjustment to the test-case (see comment #35). Moving it back to ON_QA.

Comment 40 errata-xmlrpc 2019-11-06 07:12:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3723