Bug 1691701
Summary: | rhel6 guest reboot after migrate from rhel8 host to destination host | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | jingzhao <jinzhao> |
Component: | spice | Assignee: | Uri Lublin <uril> |
Status: | CLOSED ERRATA | QA Contact: | SPICE QE bug list <spice-qe-bugs> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 8.0 | CC: | cfergeau, chayang, dblechte, dgilbert, ehabkost, juzhang, mdeng, ngu, pezhang, qzhang, rbalakri, tpelka, virt-maint, xiaohli, xiawang, yuhuang |
Target Milestone: | rc | ||
Target Release: | 8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | spice-0.14.2-1.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-11-05 20:59:07 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1691721, 1692212 |
Description
jingzhao
2019-03-22 09:50:57 UTC
Didn't reproduce the issue when I used RHEL.7.6 guest on above scenario Migrate RHEL6.10 guest from RHEL8.0 src host - > RHEL8.0 des host also hit this issue: - PC: fail - Q35: fail Versions: qemu-kvm-3.1.0-20.module+el8+2888+cdc893a8.x86_64 (In reply to Pei Zhang from comment #2) > Migrate RHEL6.10 guest from RHEL8.0 src host - > RHEL8.0 des host also hit > this issue: > - PC: fail > - Q35: fail > > Versions: qemu-kvm-3.1.0-20.module+el8+2888+cdc893a8.x86_64 == Full cmdline with Q35: /usr/libexec/qemu-kvm -name rhel6.10 \ -M q35,kernel-irqchip=split \ -cpu SandyBridge -m 4G \ -device intel-iommu,intremap=true,caching-mode=true \ -smp 4,sockets=1,cores=4,threads=1 \ -device pcie-root-port,id=root.1,chassis=1 \ -device pcie-root-port,id=root.2,chassis=2 \ -device virtio-scsi-pci-transitional,id=virtio_scsi_pci0,bus=root.1 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/mnt/rhel610-64-virtio-scsi.qcow2 \ -device scsi-hd,drive=drive_image1,id=image1,bus=virtio_scsi_pci0.0 \ -vnc :2 \ -vga qxl \ -monitor stdio \ -netdev tap,id=hostnet0,vhost=on \ -device virtio-net-pci-transitional,netdev=hostnet0,id=net0,mac=18:66:da:5f:d1:02,bus=root.2 \ == Full cmdline with PC: /usr/libexec/qemu-kvm -name rhel6.10 \ -M pc \ -cpu SandyBridge -m 4G \ -smp 4,sockets=1,cores=4,threads=1 \ -device virtio-scsi-pci-transitional,id=virtio_scsi_pci0 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/mnt/rhel610-64-virtio-scsi.qcow2 \ -device scsi-hd,drive=drive_image1,id=image1,bus=virtio_scsi_pci0.0 \ -vnc :2 \ -vga qxl \ -monitor stdio \ -netdev tap,id=hostnet0,vhost=on \ -device virtio-net-pci-transitional,netdev=hostnet0,id=net0,mac=18:66:da:5f:d1:02 \ I can reproduce it; for me the guest isn't actually rebooting, it's X in the guest that's crashing, so you see the GUI come back after the X server restarts, but if you check the uptime it's not rebooted. The X crash is always in the QXL driver seg faultin. I've also seen qemu crash on the destination in spice, with a trace like: (qemu) red_qxl_loadvm_commands: id 0, group 0, virt start 0, virt end ffffffffffffffff, generation 0, delta 0 id 1, group 1, virt start 7fab5fc00000, virt end 7fab63bfe000, generation 0, delta 7fab5fc00000 id 2, group 1, virt start 7fab5ba00000, virt end 7fab5fa00000, generation 0, delta 7fab5ba00000 (process:3896): Spice-CRITICAL **: 12:00:05.506: memslot.c:122:memslot_get_virt: address generation is not valid, group_id 1, slot_id 0, gen 6, slot_gen 0 Thread 7 (Thread 0x7fac6ac1b700 (LWP 3905)): #0 0x00007fac8a8a0b44 in read () at /lib64/libpthread.so.0 #1 0x00007fac8ba57c89 in spice_backtrace_gstack () at /lib64/libspice-server.so.1 #2 0x00007fac8ba5f270 in spice_log () at /lib64/libspice-server.so.1 #3 0x00007fac8ba24351 in memslot_get_virt () at /lib64/libspice-server.so.1 #4 0x00007fac8ba2cbc8 in red_get_data_chunks_ptr () at /lib64/libspice-server.so.1 #5 0x00007fac8ba2f2e4 in red_get_cursor_cmd () at /lib64/libspice-server.so.1 #6 0x00007fac8ba3fc5f in red_process_cursor_cmd () at /lib64/libspice-server.so.1 #7 0x00007fac8ba3fe0b in handle_dev_loadvm_commands () at /lib64/libspice-server.so.1 #8 0x00007fac8ba0d7a8 in dispatcher_handle_recv_read () at /lib64/libspice-server.so.1 #9 0x00007fac8ba141bf in watch_func () at /lib64/libspice-server.so.1 #10 0x00007fac8eba089d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 #11 0x00007fac8eba0c68 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0 #12 0x00007fac8eba0f92 in g_main_loop_run () at /lib64/libglib-2.0.so.0 #13 0x00007fac8ba40efe in red_worker_main () at /lib64/libspice-server.so.1 #14 0x00007fac8a8972de in start_thread () at /lib64/libpthread.so.0 #15 0x00007fac8a5c7a63 in clone () at /lib64/libc.so.6 in the cases where X is crashing I've seen messages like: (process:4437): Spice-WARNING **: 12:08:46.153: cursor-channel.c:277:cursor_channel_process_cmd: invalid cursor command 255 and (process:23690): Spice-WARNING **: 12:17:35.665: cursor-channel.c:277:cursor_channel_process_cmd: invalid cursor command 151 so I bet this is one of the fixes for the bad cursor state. Note, if X isn't running in the guest it's fine. Trying upstream, 34.0.0-rc, 3.0.0, 2.12.0 all crashed in the same way when built on rhel 8 2.11.0 apparently works (needed a small hack to memfd to build) 4.0.0 rc works fine if built on rhel7 Hmm lots of differences. spice-server on rhel8: spice-server-devel-0.14.0-7.el8.x86_64 spice-server on rhel7: spice-server-devel-0.14.0-6.el7.x86_64 rhel7 on 4.0.0rc is still fine even with spice-server 0.14.0-7.el7 hmm, except the bug has stopped failing for me at the moment. so not actually sure which cases work. Note similarities to bz 1540919 The segs in the guest X server all seem to be in qxl_garbage_collect_internal, or qxl_mem.c it calls: 12:08:53-2142 backtrace in rhel6 guest: qxl_mem.c:501 qxl_bo_map _bo=0xffffff00fffffc qxl_garbage_collect_internal qxl_grabage_collect qxl_allocnf qx_blo_alloc_internal 16:33:05-2142 qxl_mem.c:501 _bo=0x5aaeaeae00fffffc qxl_garbage_collect_internal qxl_grabage_collect qxl_allocatenf qxl_bo_alloc_internal 11:46:12 qxl_garbage_collect_internal cmd=0x1609 and segs in cmd->type (info_bo = 0x55614fe3e1a0) Christophe points out that rhel8's spice-server is missing some fo the fixes that are in the rhel7 spice server; so lets bounce it over to them. (specifically ones from https://bugzilla.redhat.com/show_bug.cgi?id=1567944 ) Please retest with the latest spice-server packages installed on the destination (0.14.2-1 or newer) If it's fixed then please also retest 1692212 and if that's also fixed mark as a dupe of this bz. *** Bug 1692212 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3392 |