Bug 1775251
Summary: | qemu-kvm crashes when Windows VM is migrated with multiqueue | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Radek Duda <rduda> | ||||||||||
Component: | qemu-kvm-rhev | Assignee: | Dr. David Alan Gilbert <dgilbert> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Li Xiaohui <xiaohli> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | high | ||||||||||||
Version: | 7.7 | CC: | hhuang, jinzhao, juzhang, mrezanin, mtessun, rh-spice-bugs, virt-maint | ||||||||||
Target Milestone: | rc | Keywords: | Regression, TestBlocker, ZStream | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | qemu-kvm-rhev-2.12.0-40.el7 | Doc Type: | If docs needed, set a value | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | |||||||||||||
: | 1776808 1779100 (view as bug list) | Environment: | |||||||||||
Last Closed: | 2020-03-31 14:34:56 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 1776808, 1779100 | ||||||||||||
Attachments: |
|
Created attachment 1638509 [details]
vdsm.log
Hi Radek, Any chance you can get a full backtrace from the destination ? This smells like it's a spice bug somewhere; the: failed to allocate 562949953421312 bytes is something I've not seen before; that's 2^49 ! Did this happen previously - is it something new with perhaps a new guest driver, or new qemu version or new spice? Created attachment 1639445 [details]
qemu-kvm full backtrace
Huh that's fun; this is on the saving side in vhost (not spice related) #4 0x00007fb24fde171c in g_malloc0 (n_bytes=n_bytes@entry=562949953421312) at gmem.c:134 mem = <optimized out> #5 0x00005650c5ef9ac8 in vhost_log_alloc (share=false, size=70368744177664) at /usr/src/debug/qemu-2.12.0/hw/virtio/vhost.c:196 err = 0x0 log = 0x5650c951da80 logsize = 562949953421312 fd = -1 #6 vhost_log_get (size=size@entry=70368744177664, share=<optimized out>) at /usr/src/debug/qemu-2.12.0/hw/virtio/vhost.c:211 log = <optimized out> #7 0x00005650c5efa8e5 in vhost_dev_log_resize (size=70368744177664, dev=0x5650c8593500) at /usr/src/debug/qemu-2.12.0/hw/virtio/vhost.c:263 log = <optimized out> log_base = <optimized out> r = <optimized out> #8 vhost_migration_log (enable=1, listener=0x5650c8593508) at /usr/src/debug/qemu-2.12.0/hw/virtio/vhost.c:791 dev = 0x5650c8593500 r = <optimized out> #9 vhost_log_global_start (listener=0x5650c8593508) at /usr/src/debug/qemu-2.12.0/hw/virtio/vhost.c:805 No locals. #10 0x00005650c5eaabae in memory_global_dirty_log_start () at /usr/src/debug/qemu-2.12.0/memory.c:2541 _listener = 0x5650c8593508 #11 0x00005650c5eb17b6 in ram_init_bitmaps (rs=<optimized out>) at /usr/src/debug/qemu-2.12.0/migration/ram.c:2221 No locals. #12 ram_init_all (rsp=0x5650c6a0bdc0 <ram_state>) at /usr/src/debug/qemu-2.12.0/migration/ram.c:2240 No locals. #13 ram_save_setup (f=0x5650c8b1a000, opaque=0x5650c6a0bdc0 <ram_state>) at /usr/src/debug/qemu-2.12.0/migration/ram.c:2271 rsp = 0x5650c6a0bdc0 <ram_state> block = <optimized out> #14 0x00005650c6079de5 in qemu_savevm_state_setup (f=0x5650c8b1a000) at migration/savevm.c:1004 se = 0x5650c853f760 ret = <optimized out> #15 0x00005650c6075bdf in migration_thread (opaque=0x5650c853a280) at migration/migration.c:2407 s = 0x5650c853a280 setup_start = 1574678886573 #16 0x00007fb237285ea5 in start_thread (arg=0x7fb1559ce700) at pthread_create.c:307 __res = <optimized out> pd = 0x7fb1559ce700 now = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140399622285056, -382482930179187553, 0, 8392704, 3, 140399622285056, 419836370933308575, 420325280375037087}, mask_was_saved = 0}}, priv = { pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> pagesize_m1 = <optimized out> sp = <optimized out> freesize = <optimized out> My guess here is that something somewhere has a size of 2^64-1 and it's getting divided down by the log page size (4k=12 bits, 64-12=52) and then divided down by 8, possibily from vhost_log_bits, so getting from 52 to 49 and the 2^49 we're seeing. I'll build a debug build for you to try. | Did this happen previously - is it something new with perhaps a new guest driver, or new qemu version or new spice? I do not know - last time when I successfully migrated Windows VM was 26.9.2019. I have not been migrating VMs since this date up to now Hi Radek, Can you please try: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=24946820 it should give some debug like: vhost_get_log_size: virtio-net vhost_get_log_size: region 0 last=655359 gpa=0 pre-log_size=0 vhost_get_log_size: region 1 last=3221225471 gpa=c0000 pre-log_size=3 vhost_get_log_size: region 2 last=9663676415 gpa=100000000 pre-log_size=12288 vhost_get_log_size: vq 0 used_phys=9417298496 used_size=2054 last=9417300549 pre-log_size=36864 vhost_get_log_size: vq 1 used_phys=9416602176 used_size=2054 last=9416604229 pre-log_size=36864 vhost_dev_log_resize: virtio-net: size=36864 vhost_log_get: share=0 size=36864 vhost_log_alloc: size=36864 logsize=294912 please give me a full copy of that (probably from the source side). Dave Created attachment 1639764 [details]
qemu-kvm_debug.log
I tried you build. Here is output.
vhost_get_log_size: virtio-net vhost_get_log_size: region 0 last=655359 gpa=0 pre-log_size=0 vhost_get_log_size: region 1 last=3145727999 gpa=c0000 pre-log_size=3 vhost_get_log_size: region 2 last=4093640703 gpa=f0000000 pre-log_size=12000 vhost_get_log_size: region 3 last=4160749567 gpa=f4000000 pre-log_size=15616 vhost_get_log_size: region 4 last=4227858431 gpa=f8000000 pre-log_size=15872 vhost_get_log_size: region 5 last=4236247039 gpa=fc000000 pre-log_size=16128 vhost_get_log_size: region 6 last=4244635647 gpa=fc800000 pre-log_size=16160 vhost_get_log_size: region 7 last=4253024255 gpa=fd000000 pre-log_size=16192 vhost_get_log_size: region 8 last=4261412863 gpa=fd800000 pre-log_size=16224 vhost_get_log_size: region 9 last=4261765119 gpa=fe054000 pre-log_size=16256 vhost_get_log_size: region 10 last=4261773311 gpa=fe056000 pre-log_size=16258 vhost_get_log_size: region 11 last=4261781503 gpa=fe058000 pre-log_size=16258 vhost_get_log_size: region 12 last=4261789695 gpa=fe05a000 pre-log_size=16258 vhost_get_log_size: vq 0 used_phys=449200704 used_size=2052 last=449202755 pre-log_size=16258 vhost_get_log_size: vq 1 used_phys=446005824 used_size=2052 last=446007875 pre-log_size=16258 vhost_dev_log_resize: virtio-net: size=16258 vhost_log_get: share=0 size=16258 vhost_log_alloc: size=16258 logsize=130064 vhost_get_log_size: virtio-net vhost_get_log_size: region 0 last=655359 gpa=0 pre-log_size=0 vhost_get_log_size: region 1 last=3145727999 gpa=c0000 pre-log_size=3 vhost_get_log_size: region 2 last=4093640703 gpa=f0000000 pre-log_size=12000 vhost_get_log_size: region 3 last=4160749567 gpa=f4000000 pre-log_size=15616 vhost_get_log_size: region 4 last=4227858431 gpa=f8000000 pre-log_size=15872 vhost_get_log_size: region 5 last=4236247039 gpa=fc000000 pre-log_size=16128 vhost_get_log_size: region 6 last=4244635647 gpa=fc800000 pre-log_size=16160 vhost_get_log_size: region 7 last=4253024255 gpa=fd000000 pre-log_size=16192 vhost_get_log_size: region 8 last=4261412863 gpa=fd800000 pre-log_size=16224 vhost_get_log_size: region 9 last=4261765119 gpa=fe054000 pre-log_size=16256 vhost_get_log_size: region 10 last=4261773311 gpa=fe056000 pre-log_size=16258 vhost_get_log_size: region 11 last=4261781503 gpa=fe058000 pre-log_size=16258 vhost_get_log_size: region 12 last=4261789695 gpa=fe05a000 pre-log_size=16258 vhost_get_log_size: vq 0 used_phys=448209472 used_size=2052 last=448211523 pre-log_size=16258 vhost_get_log_size: vq 1 used_phys=538899008 used_size=2052 last=538901059 pre-log_size=16258 vhost_dev_log_resize: virtio-net: size=16258 vhost_log_get: share=0 size=16258 vhost_get_log_size: virtio-net vhost_get_log_size: region 0 last=655359 gpa=0 pre-log_size=0 vhost_get_log_size: region 1 last=3145727999 gpa=c0000 pre-log_size=3 vhost_get_log_size: region 2 last=4093640703 gpa=f0000000 pre-log_size=12000 vhost_get_log_size: region 3 last=4160749567 gpa=f4000000 pre-log_size=15616 vhost_get_log_size: region 4 last=4227858431 gpa=f8000000 pre-log_size=15872 vhost_get_log_size: region 5 last=4236247039 gpa=fc000000 pre-log_size=16128 vhost_get_log_size: region 6 last=4244635647 gpa=fc800000 pre-log_size=16160 vhost_get_log_size: region 7 last=4253024255 gpa=fd000000 pre-log_size=16192 vhost_get_log_size: region 8 last=4261412863 gpa=fd800000 pre-log_size=16224 vhost_get_log_size: region 9 last=4261765119 gpa=fe054000 pre-log_size=16256 vhost_get_log_size: region 10 last=4261773311 gpa=fe056000 pre-log_size=16258 vhost_get_log_size: region 11 last=4261781503 gpa=fe058000 pre-log_size=16258 vhost_get_log_size: region 12 last=4261789695 gpa=fe05a000 pre-log_size=16258 vhost_get_log_size: vq 0 used_phys=0 used_size=0 last=-1 pre-log_size=16258 <<<<<<<<<<< vhost_get_log_size: vq 1 used_phys=0 used_size=0 last=-1 pre-log_size=70368744177664 vhost_dev_log_resize: virtio-net: size=70368744177664 vhost_log_get: share=0 size=70368744177664 vhost_log_alloc: size=70368744177664 logsize=562949953421312 This looks like it should be fixed (mostly) by upstream: 240e647 'vhost: fix vhost_log size overflow during migration' reproduce bz on hosts(kernel-3.10.0-1118.el7.x86_64&qemu-img-rhev-2.12.0-39.el7.x86_64), and verified bz on the same hosts, but qemu version is qemu-img-rhev-2.12.0-40.el7.x86_64, migration finished succussfully, guest on dst host run/reboot/shutdown ok add some contents for Comment21: when verify this bz, test with win7/win10/win2019/rhel8.2 guests, they all migrated successfully and reboot/shutdown ok. So make this bz verified according to Comment21&22 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:1216 |
Created attachment 1638503 [details] qemu.log Description of problem: If I migrate Windows VM, qemu-kvm crashes with error: (process:63218): GLib-ERROR **: 15:21:00.329: gmem.c:135: failed to allocate 562949953421312 bytes 2019-11-21 14:21:10.152+0000: shutting down, reason=crashed Version-Release number of selected component (if applicable): hosts rhel7.7: qemu-kvm-rhev-2.12.0-33.el7_7.6.x86_64 vdsm-4.30.38-1.el7ev.x86_64 spice-server-0.14.0-7.el7.x86_64 guest: Win10 rhv-guest-tools 4.3-10.el7 installed Use RHV-4.3.7.2-0.1.el7 to migrate How reproducible: always Steps to Reproduce: 1. Make Windows VM in rhv 2. migrate to other host (press migrate) 3. Actual results: qemu-kvm crashes with error in qemu.log: (process:63218): Spice-DEBUG: 15:21:00.329: reds.c:4308:spice_server_migrate_start: trace tcmalloc: large alloc 562949953421312 bytes == (nil) @ 0x7f5ab473a35f 0x7f5ab475a1e7 0x7f5acc23e6e6 0x563dafbcbac8 0x563dafbcc8e5 0x563dafb7cbae 0x563dafb837b6 0x563dafd4bde5 0x563dafd47bdf 0x7f5ab36e2ea5 0x7f5ab340b8cd (process:63218): GLib-ERROR **: 15:21:00.329: gmem.c:135: failed to allocate 562949953421312 bytes 2019-11-21 14:21:10.152+0000: shutting down, reason=crashed Expected results: no crash, succesfull migration Additional info: I did not reproduced this with rhel VMs