Bug 1547940

Summary: Sometimes qemu crash with "Bad ram offset 23ffae002" in NFV live migration(both pvp and ovs) testing when enable vIOMMU support
Product: Red Hat Enterprise Linux 7 Reporter: Pei Zhang <pezhang>
Component: qemu-kvm-rhevAssignee: Maxime Coquelin <maxime.coquelin>
Status: CLOSED CURRENTRELEASE QA Contact: Pei Zhang <pezhang>
Severity: high Docs Contact:
Priority: high    
Version: 7.5CC: chayang, fjin, juzhang, knoel, maxime.coquelin, pbonzini, peterx, pezhang, siliu, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-12 02:00:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1597482, 1686278    
Attachments:
Description Flags
XML of VM none

Description Pei Zhang 2018-02-22 10:24:13 UTC
Created attachment 1399295 [details]
XML of VM

Description of problem:

After migrating VM from src to des host, qemu will crash after VM running in des for several seconds.


Version-Release number of selected component (if applicable):
3.10.0-855.el7.x86_64
qemu-kvm-rhev-2.10.0-21.el7.x86_64
libvirt-3.9.0-13.el7.x86_64
openvswitch-2.9.0-3.el7fdp.x86_64
dpdk-17.11-7.el7.x86_64


How reproducible:
4/8


Steps to Reproduce:
1. In src and des host, boot testpmd, see [1]

2. In src host, boot VM , refer to the attachment.

3. In guest, load vfio and start testpmd.

4. In the third host, start MoonGen. 
./build/MoonGen examples/l2-load-latency.lua 0 1 64

5. Do live migration from src to des host, success.
# /bin/virsh migrate --verbose --persistent --live rhel7.5_nonrt qemu+ssh://10.73.72.154/system

6. Now VM is running in des host, however after several seconds, qemu crash.

# cat /var/log/libvirt/qemu/rhel7.5_nonrt.log
...
Bad ram offset 23ffb3002
2018-02-22 10:06:21.175+0000: shutting down, reason=crashed


# abrt-cli list
id 4d5030c8b806c4010b0ef08753d4219a358c875b
reason:         qemu-kvm killed by SIGABRT
time:           Thu 22 Feb 2018 05:06:18 AM EST
cmdline:        /usr/libexec/qemu-kvm -name guest=rhel7.5_nonrt,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-13-rhel7.5_nonrt/master-key.aes -machine pc-q35-rhel7.5.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off,kernel_irqchip=split -cpu host,tsc-deadline=on,pmu=off -m 8192 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/13-rhel7.5_nonrt,share=yes,size=8589934592,host-nodes=0,policy=bind -numa node,nodeid=0,cpus=0-3,memdev=ram-node0 -uuid b3bd8e42-17b7-11e8-a05f-a0369fc7bbea -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-13-rhel7.5_nonrt/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x3 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x4 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x5 -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x6 -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x7 -drive file=/mnt/nfv/rhel7.5_nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.1,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=18:66:da:5f:dd:01,bus=pci.2,addr=0x0 -chardev socket,id=charnet1,path=/tmp/vhost-user1 -netdev vhost-user,chardev=charnet1,id=hostnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=18:66:da:5f:dd:02,bus=pci.3,addr=0x0,iommu_platform=on,ats=on -chardev socket,id=charnet2,path=/tmp/vhost-user2 -netdev vhost-user,chardev=charnet2,id=hostnet2 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=18:66:da:5f:dd:03,bus=pci.4,addr=0x0,iommu_platform=on,ats=on -chardev socket,id=charnet3,path=/tmp/vhost-user3 -netdev vhost-user,chardev=charnet3,id=hostnet3 -device virtio-net-pci,netdev=hostnet3,id=net3,mac=18:66:da:5f:dd:04,bus=pci.5,addr=0x0,iommu_platform=on,ats=on -spice port=5900,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on -device cirrus-vga,id=video0,bus=pcie.0,addr=0x1 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.6,addr=0x0 -msg timestamp=on
package:        qemu-kvm-rhev-2.10.0-21.el7
uid:            0 (root)
count:          1
Directory:      /var/spool/abrt/ccpp-2018-02-22-05:06:18-19945
Run 'abrt-cli report /var/spool/abrt/ccpp-2018-02-22-05:06:18-19945' for creating a case in Red Hat Customer Portal



Actual results:
qemu crash.


Expected results:
qemu should not crash.


Additional info:
1. I can only reproduce this issue with automation, but manually not reproduced yet. 


Reference:
[1]
/usr/bin/testpmd \
-l 2,4,6,8,10,12,14 \
--socket-mem 1024 1024 -n 4 \
--vdev net_vhost0,iface=/tmp/vhost-user1,client=0,iommu-support=1 \
--vdev net_vhost1,iface=/tmp/vhost-user2,client=0,iommu-support=1 \
--vdev net_vhost2,iface=/tmp/vhost-user3,client=0,iommu-support=1 \
-- \
--portmask=3f \
--disable-hw-vlan -i \
--rxq=1 --txq=1 \
--nb-cores=6 \
--forward-mode=io

Comment 2 Pei Zhang 2018-02-22 10:27:18 UTC
Here is the abrt core dump file link:

http://fileshare.englab.nay.redhat.com/pub/section2/coredump/var/crash/pezhang/bug1547940/abrt/

Comment 3 Maxime Coquelin 2018-02-27 15:19:47 UTC
Hi Pei,

(In reply to Pei Zhang from comment #2)
> Here is the abrt core dump file link:
> 
> http://fileshare.englab.nay.redhat.com/pub/section2/coredump/var/crash/
> pezhang/bug1547940/abrt/

It seems the link is broken, I get error 404.

Comment 5 Maxime Coquelin 2018-03-09 09:37:51 UTC
The backtrace confirms the used ring index address isn't present in any of the RAM blocks (0x23ffb3002):

#0  0x00007f05b1314207 in ?? () from /lib64/libc.so.6
#1  0x00007f05b13158f8 in __GI_abort () at abort.c:99
#2  0x00005631234b1eba in qemu_get_ram_block (addr=<optimized out>) at /usr/src/debug/qemu-2.10.0/exec.c:1061
#3  0x00005631234b5aa5 in qemu_map_ram_ptr (ram_block=<optimized out>, addr=9663361026) at /usr/src/debug/qemu-2.10.0/exec.c:2193
#4  0x00005631234bc39b in address_space_lduw_internal_cached (cache=<optimized out>, cache=<optimized out>, endian=DEVICE_LITTLE_ENDIAN, result=0x0, attrs=..., addr=2) at /usr/src/debug/qemu-2.10.0/memory_ldst.inc.c:281
#5  address_space_lduw_le_cached (result=0x0, attrs=..., addr=2, cache=<optimized out>) at /usr/src/debug/qemu-2.10.0/memory_ldst.inc.c:315
#6  lduw_le_phys_cached (cache=<optimized out>, addr=addr@entry=2) at /usr/src/debug/qemu-2.10.0/memory_ldst.inc.c:334
#7  0x000056312353a707 in virtio_lduw_phys_cached (vdev=<optimized out>, pa=2, cache=<optimized out>) at /usr/src/debug/qemu-2.10.0/include/hw/virtio/virtio-access.h:166
#8  vring_used_idx (vq=vq@entry=0x563127de8000) at /usr/src/debug/qemu-2.10.0/hw/virtio/virtio.c:263
#9  0x000056312353cb59 in virtio_queue_update_used_idx (vdev=vdev@entry=0x563127ddc170, n=n@entry=0) at /usr/src/debug/qemu-2.10.0/hw/virtio/virtio.c:2346
#10 0x00005631235418d9 in vhost_virtqueue_stop (dev=dev@entry=0x563125d1ea80, vdev=vdev@entry=0x563127ddc170, vq=0x563125d1eca8, idx=0) at /usr/src/debug/qemu-2.10.0/hw/virtio/vhost.c:1148
#11 0x0000563123543574 in vhost_dev_stop (hdev=hdev@entry=0x563125d1ea80, vdev=vdev@entry=0x563127ddc170) at /usr/src/debug/qemu-2.10.0/hw/virtio/vhost.c:1598
#12 0x000056312352bb78 in vhost_net_stop_one (net=0x563125d1ea80, dev=0x563127ddc170) at /usr/src/debug/qemu-2.10.0/hw/net/vhost_net.c:289
#13 0x000056312352c06b in vhost_net_stop (dev=dev@entry=0x563127ddc170, ncs=<optimized out>, total_queues=total_queues@entry=1) at /usr/src/debug/qemu-2.10.0/hw/net/vhost_net.c:368
#14 0x0000563123529306 in virtio_net_vhost_status (status=<optimized out>, n=0x563127ddc170) at /usr/src/debug/qemu-2.10.0/hw/net/virtio-net.c:180
#15 virtio_net_set_status (vdev=0x563127ddc170, status=11 '\v') at /usr/src/debug/qemu-2.10.0/hw/net/virtio-net.c:254
#16 0x000056312353b65c in virtio_set_status (vdev=vdev@entry=0x563127ddc170, val=<optimized out>) at /usr/src/debug/qemu-2.10.0/hw/virtio/virtio.c:1147
#17 0x00005631236a82a0 in virtio_write_config (pci_dev=0x563127dd4000, address=<optimized out>, val=<optimized out>, len=<optimized out>) at hw/virtio/virtio-pci.c:664
#18 0x00005631234fa073 in memory_region_write_accessor (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, shift=<optimized out>, mask=<optimized out>, attrs=...)
    at /usr/src/debug/qemu-2.10.0/memory.c:560
#19 0x00005631234f7d89 in access_with_adjusted_size (addr=addr@entry=0, value=value@entry=0x7f05a84907b8, size=size@entry=2, access_size_min=<optimized out>, access_size_max=<optimized out>, 
    access=access@entry=0x5631234fa030 <memory_region_write_accessor>, mr=mr@entry=0x563126524400, attrs=attrs@entry=...) at /usr/src/debug/qemu-2.10.0/memory.c:626
#20 0x00005631234fbd05 in memory_region_dispatch_write (mr=mr@entry=0x563126524400, addr=addr@entry=0, data=1283, size=size@entry=2, attrs=attrs@entry=...) at /usr/src/debug/qemu-2.10.0/memory.c:1502
#21 0x00005631234b33f2 in flatview_write_continue (mr=0x563126524400, l=2, addr1=0, len=2, buf=0x7f05bba7b000 <Address 0x7f05bba7b000 out of bounds>, attrs=..., addr=3324, fv=0x56312cbec640) at /usr/src/debug/qemu-2.10.0/exec.c:2906
#22 flatview_write (fv=0x56312cbec640, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at /usr/src/debug/qemu-2.10.0/exec.c:2957
#23 0x00005631234b6a65 in flatview_rw (is_write=true, len=2, buf=0x7f05bba7b000 <Address 0x7f05bba7b000 out of bounds>, attrs=..., addr=3324, fv=<optimized out>) at /usr/src/debug/qemu-2.10.0/exec.c:3066
#24 address_space_rw (as=<optimized out>, addr=addr@entry=3324, attrs=attrs@entry=..., buf=buf@entry=0x7f05bba7b000 <Address 0x7f05bba7b000 out of bounds>, len=len@entry=2, is_write=is_write@entry=true)
    at /usr/src/debug/qemu-2.10.0/exec.c:3076
#25 0x0000563123509b4a in kvm_handle_io (count=1, size=2, direction=<optimized out>, data=<optimized out>, attrs=..., port=3324) at /usr/src/debug/qemu-2.10.0/accel/kvm/kvm-all.c:1807
#26 kvm_cpu_exec (cpu=cpu@entry=0x56312628e000) at /usr/src/debug/qemu-2.10.0/accel/kvm/kvm-all.c:2047
#27 0x00005631234e8cf2 in qemu_kvm_cpu_thread_fn (arg=0x56312628e000) at /usr/src/debug/qemu-2.10.0/cpus.c:1128
#28 0x00007f05b16b2dd5 in start_thread (arg=0x7f05a8491700) at pthread_create.c:308
#29 0x00007f05b13dcb3d in __GI_epoll_pwait (epfd=31206, events=0x7a19, maxevents=6, timeout=-1, set=0x7f05a8491700) at ../sysdeps/unix/sysv/linux/epoll_pwait.c:41
#30 0x0000000000000000 in ?? ()


I went manually through the list of RAM blocks, and I confirm there is no RAM block for this address.
Yet, we have to understand how that can happen.

Comment 6 Maxime Coquelin 2018-03-13 06:16:47 UTC
Looking at the backtrace, something unexpected is happening on guest side.
The device gets stopped because the guest clear "bus master" bit in the PCI
command register of the Virtio device.
It can happen for example when guest is shutdown, or, in case of DPDK, when
then Virtio PMD is removed.
This is not expected in this test scenario at that point, as the guest should
be doing packets forwarding.

The coredump doesn't contain the guest memory, it could be good to have it to
try to understand what the guest is doing. The other alternative would be to
reproduce the problem manually.

Comment 7 Pei Zhang 2018-03-19 09:30:11 UTC
Update:

1. Live migration over ovs+dpdk+vhost-user also hit this same issue. Bug title will be changed later.

2. This crash issue only happens when testing with vIOMMU support, no matter in pvp live migration or migration over ovs.

Comment 12 Peter Xu 2018-03-28 07:23:40 UTC
(In reply to Maxime Coquelin from comment #5)
> The backtrace confirms the used ring index address isn't present in any of
> the RAM blocks (0x23ffb3002):
> 
> #0  0x00007f05b1314207 in ?? () from /lib64/libc.so.6
> #1  0x00007f05b13158f8 in __GI_abort () at abort.c:99
> #2  0x00005631234b1eba in qemu_get_ram_block (addr=<optimized out>) at
> /usr/src/debug/qemu-2.10.0/exec.c:1061
> #3  0x00005631234b5aa5 in qemu_map_ram_ptr (ram_block=<optimized out>,
> addr=9663361026) at /usr/src/debug/qemu-2.10.0/exec.c:2193
> #4  0x00005631234bc39b in address_space_lduw_internal_cached
> (cache=<optimized out>, cache=<optimized out>, endian=DEVICE_LITTLE_ENDIAN,
> result=0x0, attrs=..., addr=2) at
> /usr/src/debug/qemu-2.10.0/memory_ldst.inc.c:281

Besides on knowing why a reset has happened inside guest, here I'm also curious whether the address space translation could have failed somehow.  The translation happened on memory_ldst.inc.c:264:

    mr = TRANSLATE(addr, &addr1, &l, false);

Since above stack shows that mr->ramblock must be NULL and addr1 is very possible a correct GPA (9663361026=0x23ffb3002, which falls into a valid mem range of a 8G guest).  To know that, maybe we can assert after above line with:

    assert(mr != &io_mem_unassigned);

If the assertion can be triggered, then it proves.

I'm still not sure whether this even worths a try, I'm just throwing this question out in case it would help.  I think if the translation failed, it means it's possible the page permission bits might be wrong somewhere (see the single failure tag translate_fail in flatview_do_translate()).

(Add Paolo in the loop too)

Peter

Comment 13 Maxime Coquelin 2018-03-28 08:10:46 UTC
(In reply to Peter Xu from comment #12)
> (In reply to Maxime Coquelin from comment #5)
> > The backtrace confirms the used ring index address isn't present in any of
> > the RAM blocks (0x23ffb3002):
> > 
> > #0  0x00007f05b1314207 in ?? () from /lib64/libc.so.6
> > #1  0x00007f05b13158f8 in __GI_abort () at abort.c:99
> > #2  0x00005631234b1eba in qemu_get_ram_block (addr=<optimized out>) at
> > /usr/src/debug/qemu-2.10.0/exec.c:1061
> > #3  0x00005631234b5aa5 in qemu_map_ram_ptr (ram_block=<optimized out>,
> > addr=9663361026) at /usr/src/debug/qemu-2.10.0/exec.c:2193
> > #4  0x00005631234bc39b in address_space_lduw_internal_cached
> > (cache=<optimized out>, cache=<optimized out>, endian=DEVICE_LITTLE_ENDIAN,
> > result=0x0, attrs=..., addr=2) at
> > /usr/src/debug/qemu-2.10.0/memory_ldst.inc.c:281
> 
> Besides on knowing why a reset has happened inside guest, here I'm also
> curious whether the address space translation could have failed somehow. 
> The translation happened on memory_ldst.inc.c:264:
> 
>     mr = TRANSLATE(addr, &addr1, &l, false);
> 
> Since above stack shows that mr->ramblock must be NULL and addr1 is very
> possible a correct GPA (9663361026=0x23ffb3002, which falls into a valid mem
> range of a 8G guest).  To know that, maybe we can assert after above line
> with:
> 
>     assert(mr != &io_mem_unassigned);
> 
> If the assertion can be triggered, then it proves.

Yes, that's a good idea.
I can provide a brew build adding this assertion.

Thanks,
Maxime

Comment 14 Peter Xu 2018-04-20 07:37:16 UTC
Hi,

During debugging the nested device assignment breakage recently I found that the breakage might be closely related to (or even duplicate of) this bug too.  Currently I have posted two fixes already for either QEMU and kernel:

http://lists.gnu.org/archive/html/qemu-devel/2018-04/msg02609.html
https://lkml.org/lkml/2018/4/18/5

Basically they are all about incorrect mapped pages, or missing PSIs.  For this bug, if our above assumption stands, DPDK requested a page translation but it was missing (without correct permission flags set in PTEs).  That sounds quite like the nested problem.

I'll update in this bug again after everything is clear and settled upstream, then we can see whether those fixes can dissolve this bug too.

Peter

Comment 15 Pei Zhang 2018-06-04 02:47:09 UTC
Update:

Tested with kernel[1] applying above patches Peter mentioned and upstream qemu-kvm[2], still hit crash issue. 


[1] git clone  git://git.app.eng.bos.redhat.com/rhel7.git
[2] git clone git://git.qemu.org/qemu.git

Comment 16 Maxime Coquelin 2018-08-10 09:59:11 UTC
*** Bug 1552978 has been marked as a duplicate of this bug. ***

Comment 17 Maxime Coquelin 2019-05-15 08:38:39 UTC
Hi Sitong,

Please ping me when you have time so that we see if we can reproduce this bug with RHEL 7.7.

Thanks,
Maxime

Comment 18 Sitong Liu 2019-05-16 01:31:29 UTC
(In reply to Maxime Coquelin from comment #17)
> Hi Sitong,
> 
> Please ping me when you have time so that we see if we can reproduce this
> bug with RHEL 7.7.
> 
> Thanks,
> Maxime

Hi Maxime,

Thanks for reminder, I will ping you as soon as possible when test env is available.

Thanks,
Sitong

Comment 19 Pei Zhang 2019-06-06 02:11:53 UTC
Hi Maxime,

This issue can not be reproduced with latest rhel7.7 testing. We tested 80 times nfv live migration with vIOMMU, including (60 with ovs and 20 with pvp), qemu keeps working well.


Versions:
qemu-kvm-rhev-2.12.0-31.el7.x86_64
3.10.0-1053.el7.x86_64
tuned-2.11.0-4.el7.noarch
openvswitch-2.9.0-106.el7fdp.x86_64
dpdk-18.11-4.el7_6.x86_64


Beaker job:
https://beaker.engineering.redhat.com/recipes/6963760#task94252573


Best regards,
Pei

Comment 20 Pei Zhang 2019-06-12 02:00:08 UTC
In our latest several runs, qemu keeps working well during live migration testing with vIOMMU. So this bug can not be reproduced anymore in rhel7.7.

I would prefer to close it as 'CURRENTRELEASE'.

Maxime, feel free to let me know if you have any concerns. Thanks.