Bug 1451631
Summary: | Keyboard does not work after migration | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | xianwang <xianwang> |
Component: | qemu-kvm-rhev | Assignee: | Laurent Vivier <lvivier> |
Status: | CLOSED ERRATA | QA Contact: | xianwang <xianwang> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.4 | CC: | aliang, chayang, coli, dgibson, dgilbert, hachen, juzhang, knoel, kraxel, lmiksik, lvivier, mdeng, michen, mrezanin, qzhang, virt-maint, xianwang, xuma |
Target Milestone: | rc | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-rhev-2.9.0-10.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-02 04:38:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1376765 |
Description
xianwang
2017-05-17 08:13:46 UTC
This bug is only for powerpc, I have tested this scenario on x86_64, this bug does not exist for x86_64. Host: 3.10.0-666.el7.x86_64 qemu-kvm-rhev-2.9.0-4.el7.x86_64 Guest: 3.10.0-666.el7.x86_64 1)Whether lock the screen or not, the keyboard both do not work 2)This bug is a regression, this bug does not exist on the following version: 3.10.0-666.el7.ppc64le qemu-kvm-rhev-2.6.0-27.el7.ppc64le SLOF-20170303-3.git66d250e.el7.noarch Hi Xianwang, can you add the following info: * What guest image was in use? * In particular what was the guest kernel version? * Re comment 2: could you check and see if it is the host kernel, qemu or guest kernel change which causes the regression? Sorry, I misread your posts, I see that only the qemu version has changed between working and non-working versions. Did the guest image change between working and non-working runs? (In reply to David Gibson from comment #4) > Hi Xianwang, can you add the following info: > > * What guest image was in use? > * In particular what was the guest kernel version? > * Re comment 2: could you check and see if it is the host kernel, qemu or > guest kernel change which causes the regression? I am sorry I forgot to describe the guest kernel version, the guest version is: 3.10.0-666.el7.ppc64le, both for working and non-working versions, and the guest img do not change between working and non-working runs. what's more, I tried to find the first bad commit by "bisect", it seems that I find out the first bad commit, but the result is not same with this bug, the result is as following: the version of host and guest ,and the test steps are all same as bugs. # git bisect good 29ba0cdc1fd1300f910d150c03a0f74236083bf7 # git bisect bad ddc371e5a0a569b9c02522bc6ec26ce16f6e126c Bisecting: 912 revisions left to test after this (roughly 10 steps) [dd3dd4ba7b949662d2c67a4c041549b3d79c4b0e] virtio: check for vring setup in virtio_queue_empty compile and test. test result: after migration: src end: (qemu) info migrate capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off Migration status: completed dst end: [root@ibm-p8-rhevm-05 ~]# sh boot_d.sh QEMU 2.8.50 monitor - type 'help' for more information (qemu) info status VM status: paused (inmigrate) (qemu) 2017-05-18T02:12:36.451619Z qemu-system-ppc64: VQ 0 size 0x80 < last_avail_idx 0x1bf0 - used_idx 0x0 2017-05-18T02:12:36.451679Z qemu-system-ppc64: Failed to load virtio-blk:virtio 2017-05-18T02:12:36.451684Z qemu-system-ppc64: error while loading state for instance 0x0 of device 'pci@800000020000000:05.0/virtio-blk' 2017-05-18T02:12:36.451976Z qemu-system-ppc64: load of migration failed: Operation not permitted # git branch -r --contains dd3dd4ba7b949662d2c67a4c041549b3d79c4b0e origin/preview/2.9.0-rc4 origin/rhv7/master-2.9.0 so, the first commit is: dd3dd4ba7b949662d2c67a4c041549b3d79c4b0e (In reply to xianwang from comment #6) > (In reply to David Gibson from comment #4) > > Hi Xianwang, can you add the following info: > > > > * What guest image was in use? > > * In particular what was the guest kernel version? > > * Re comment 2: could you check and see if it is the host kernel, qemu or > > guest kernel change which causes the regression? > > I am sorry I forgot to describe the guest kernel version, the guest version > is: > 3.10.0-666.el7.ppc64le, both for working and non-working versions, and the > guest img do not change between working and non-working runs. > > what's more, I tried to find the first bad commit by "bisect", it seems that > I find out the first bad commit, but the result is not same with this bug, > the result is as following: > > the version of host and guest ,and the test steps are all same as bugs. > # git bisect good 29ba0cdc1fd1300f910d150c03a0f74236083bf7 > # git bisect bad ddc371e5a0a569b9c02522bc6ec26ce16f6e126c > Bisecting: 912 revisions left to test after this (roughly 10 steps) > [dd3dd4ba7b949662d2c67a4c041549b3d79c4b0e] virtio: check for vring setup in > virtio_queue_empty > > compile and test. > > test result: > after migration: > src end: > (qemu) info migrate > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: > off compress: off events: off postcopy-ram: off x-colo: off release-ram: off > Migration status: completed > dst end: > [root@ibm-p8-rhevm-05 ~]# sh boot_d.sh > QEMU 2.8.50 monitor - type 'help' for more information > (qemu) info status > VM status: paused (inmigrate) > (qemu) 2017-05-18T02:12:36.451619Z qemu-system-ppc64: VQ 0 size 0x80 < > last_avail_idx 0x1bf0 - used_idx 0x0 > 2017-05-18T02:12:36.451679Z qemu-system-ppc64: Failed to load > virtio-blk:virtio > 2017-05-18T02:12:36.451684Z qemu-system-ppc64: error while loading state for > instance 0x0 of device 'pci@800000020000000:05.0/virtio-blk' > 2017-05-18T02:12:36.451976Z qemu-system-ppc64: load of migration failed: > Operation not permitted > > # git branch -r --contains dd3dd4ba7b949662d2c67a4c041549b3d79c4b0e > origin/preview/2.9.0-rc4 > origin/rhv7/master-2.9.0 > > so, the first commit is: > dd3dd4ba7b949662d2c67a4c041549b3d79c4b0e for this test, the qemu cli is as below: /root/qemu-kvm/ppc64-softmmu/qemu-system-ppc64 \ -name 'avocado-vt-vm1' \ -sandbox off \ -nodefaults \ -machine pseries \ -vga std \ -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=03 \ -device nec-usb-xhci,id=usb1,bus=pci.0,addr=04 \ -drive file=/root/RHEL74_1.qcow2,format=qcow2,if=none,id=drive_blk1,werror=stop,rerror=stop \ -device virtio-blk-pci,drive=drive_blk1,id=blk-disk1,bootindex=1,bus=pci.0,addr=05 \ -device virtio-net-pci,mac=9a:7b:7c:7d:7e:72,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0,addr=06 \ -netdev tap,id=idjlQN53,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -m 4G \ -smp 2,maxcpus=4,cores=2,threads=2,sockets=1 \ -cpu host \ -device usb-kbd \ -device usb-mouse \ -qmp tcp:0:8881,server,nowait \ -vnc :1 \ -msg timestamp=on \ -rtc base=localtime,clock=vm,driftfix=slew \ -monitor stdio \ -boot order=cdn,once=c,menu=on,strict=off \ -enable-kvm 1)on the same host as this bug report I re-test this scenario with another guest img that is installed by avocado on the same host, then after migration, the keyboard works well, the qemu cli and the kernel version of guest are same, only the img is different, the img that I installed manually is bad. 2)on another host(host2) that different from that of this bug report test result is same with 1), the img that I installed manually can reproduce this bug, while the img that installed by avocado can not reproduce this bug Hi, David, do you think there is something wrong with my manually installing img? It does seem like there's something wrong with the image, although I can't quite think what it could be. I was also unable to reproduce the problem with my own image (so far, anyway). Using the ssh connection, with the broken image, can you show me the output of "lsusb" on the guest while in the broken state. The output from before the migration would also be useful for comparison. Re: comment 6. Thanks for attempting a bisect, however unless I'm misunderstanding the comment it looks like you didn't complete the bisect, just did the first step. Completing a bisect generally requires testing a number of different versions. (In reply to David Gibson from comment #9) > It does seem like there's something wrong with the image, although I can't > quite think what it could be. > > I was also unable to reproduce the problem with my own image (so far, > anyway). > > Using the ssh connection, with the broken image, can you show me the output > of "lsusb" on the guest while in the broken state. The output from before > the migration would also be useful for comparison. > > > Re: comment 6. Thanks for attempting a bisect, however unless I'm > misunderstanding the comment it looks like you didn't complete the bisect, > just did the first step. Completing a bisect generally requires testing a > number of different versions. "lsusb" on the guest while in the broken state, the result is as following,and it is same with the state before migration.: [root@dhcp70-148 ~]# lsusb Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 003: ID 0627:0001 Adomax Technology Co., Ltd Bus 001 Device 002: ID 0627:0001 Adomax Technology Co., Ltd Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub dst end: (qemu) info usb Device 0.1, Port 1, Speed 480 Mb/s, Product QEMU USB Keyboard Device 0.2, Port 2, Speed 480 Mb/s, Product QEMU USB Mouse and I tried "dmesg | grep usb", it displays no error infomation,but the keyboard can't work and other function is well. Ok, that lsusb looks as expected, so it doesn't appear the device is disappearing entirely across the migration. Unless we find a reproducible way of making an image where this doesn't work, I'm inclined to close this as WORKSFORME. Given the updates on bug 1448810, I'm less inclined to drop this. I think there may be a real bug, even if the triggering circumstances are unclear. Xianwang, are you able to put your disk image which shows this problem somewhere I can access? Unfortunately, even with the image from xianwang, I haven't been able to reproduce this. I do have a few differences from the setup described from xianwang, although none of them seem likely to cause this problem: * I'm using 'user' network instead of 'tap' * I'm using a slightly different hypervisor console and qemu monitor configuration * I'm doing a local migration, which requires that the source and destination have different vnc and qmp ports I also have a slightly newer qemu package: qemu-kvm-rhev-2.9.0-6.el7.ppc64le xianwang, could you see if you're able to reproduce this with the newer qemu? xianwang: Two other suggestions (after you've tried David Gibson's): a) Does the qemu sendkey command work after migration - e.g. sendkey x or sendkey ctrl-alt-f4 b) If the guest is at a text-console (e.g. ctrl-alt-f4 before migration) does it work? c) After the migrate can you do the qemu command: info mice I'm suspicious the usb-mouse isn't really working and it's going a different way. Dave (In reply to David Gibson from comment #15) > Unfortunately, even with the image from xianwang, I haven't been able to > reproduce this. I do have a few differences from the setup described from > xianwang, although none of them seem likely to cause this problem: > > * I'm using 'user' network instead of 'tap' > * I'm using a slightly different hypervisor console and qemu monitor > configuration > * I'm doing a local migration, which requires that the source and > destination have different vnc and qmp ports > > I also have a slightly newer qemu package: > > qemu-kvm-rhev-2.9.0-6.el7.ppc64le > > xianwang, could you see if you're able to reproduce this with the newer qemu? I re-test it with following version: Host: 3.10.0-663.el7.ppc64le qemu-kvm-rhev-2.9.0-6.el7.ppc64le SLOF-20170303-4.git66d250e.el7.noarch Guest: 3.10.0-666.el7.ppc64le test result is as following: a)Yes, with the above host version qemu-kvm-rhev-2.9.0-6.el7.ppc64le, this bug exists. b)For Dave's advice, after migration, I tried: (qemu) sendkey x*********************not work (qemu) sendkey ctrl-alt-f4***********not work (qemu) sendkey x*********************not work (qemu) sendkey ctrl-alt-f4***********not work (qemu) sendkey ctrl-alt-f2***********not work (qemu) sendkey ctrl-alt-f3***********not work (qemu) info mice * Mouse #2: QEMU HID Mouse in guest: # lsusb -v | grep HID iConfiguration 6 HID Mouse HID Device Descriptor: bcdHID 0.01 iConfiguration 8 HID Keyboard HID Device Descriptor: bcdHID 1.11 Xianwang, could you check the following build fixes the problem: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13356214 It reverts the commit found by BZ1448810 Thanks (In reply to Laurent Vivier from comment #18) > Xianwang, > > could you check the following build fixes the problem: > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13356214 > > It reverts the commit found by BZ1448810 > > Thanks Yes, with the same problematic guest img(3.10.0-666.el7.ppc64le) as #c8, I re-test this scenario with qemu-kvm-rhev-2.9.0-8.el7.lvivier201706061743.ppc64le which is specified by Laurent, this bug does not exist. In the same host, with same host kernel version and img as former, this bug exist on qemu-kvm-rhev-2.9.0-7.el7.ppc64le. Host: 3.10.0-675.el7.ppc64le SLOF-20170303-4.git66d250e.el7.noarch Guest: 3.10.0-666.el7.ppc64le *** Bug 1448810 has been marked as a duplicate of this bug. *** Involved commit found in BZ1448810: Bisected to: commit 243afe858b95765b98d16a1f0dd50dca262858ad Author: Gerd Hoffmann <kraxel> Date: Fri Mar 31 12:25:21 2017 +0200 xhci: flush dequeue pointer to endpoint context When done processing a endpoint ring we must update the dequeue pointer in the endpoint context in guest memory. This is needed to make sure the guest has a correct view of things and also to make live migration work properly, because xhci post_load restores alot of the state from xhci data structures in guest memory. Add xhci_set_ep_state() call to do that. The recursive calls stopped by commit ddb603ab6c981c1d67cb42266fc700c33e5b2d8f had the (unintentional) side effect to hiding this bug. xhci_set_ep_state() was called before processing, to set the state to running, which updated the dequeue pointer too. Reported-by: Dr. David Alan Gilbert <dgilbert> Signed-off-by: Gerd Hoffmann <kraxel> Tested-by: Dr. David Alan Gilbert <dgilbert> Message-id: 20170331102521.29253-1-kraxel and that commit was put in to fix https://bugzilla.redhat.com/show_bug.cgi?id=1436616 After migration, in xhci_kick_epctx(), xhci_kick_epctx() returns always 0 and stop the processing loop. xhci_kick_epctx() returns 0, because the XHCITRB structures retrieved with pci_dma_read() is totally cleared. It seems the value of the pointer to the ring is not good one: it changes between the source and the destination. Gerd has fixed the bug with commit from: https://www.kraxel.org/cgit/qemu/log/?h=work/xhci-hid-migration I prepare a build for QE. *** Bug 1454580 has been marked as a duplicate of this bug. *** Fix included in qemu-kvm-rhev-2.9.0-10.el7 This bug is reproduced for qemu-kvm-rhev-2.9.0-5.el7.ppc64le, and verified pass for qemu-kvm-rhev-2.9.0-10.el7.ppc64le Bug reproduction: Host: 3.10.0-679.el7.ppc64le qemu-kvm-rhev-2.9.0-5.el7.ppc64le SLOF-20170303-4.git66d250e.el7.noarch Guest: 3.10.0-666.el7.ppc64le steps: 1.Boot a guest as following qemu cli: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox off \ -nodefaults \ -machine pseries \ -vga std \ -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=03 \ -device qemu-xhci,id=usb1,bus=pci.0,addr=04 \ -drive file=/root/RHEL74_1_bug1451631.qcow2,format=qcow2,if=none,id=drive_blk1,werror=stop,rerror=stop \ -device virtio-blk-pci,drive=drive_blk1,id=blk-disk1,bootindex=1,bus=pci.0,addr=05 \ -device virtio-net-pci,mac=9a:7b:7c:7d:7e:72,id=id9HRc5V,vectors=4,netdev=idjlQN53,bus=pci.0,addr=06 \ -netdev tap,id=idjlQN53,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -m 4G \ -smp 2,maxcpus=4,cores=2,threads=2,sockets=1 \ -cpu host \ -device usb-hub,id=hub1,port=1 \ -device usb-mouse,id=usbmouse,port=1.1\ -device usb-kbd,id=usbkbd,port=1.2\ -device usb-tablet,id=usbtablet,port=1.3\ -device usb-storage,id=storage1,port=1.4,drive=drive1 \ -drive file=/root/data1.qcow2,id=drive1,if=none \ -qmp tcp:0:8881,server,nowait \ -vnc :1 \ -msg timestamp=on \ -rtc base=localtime,clock=vm,driftfix=slew \ -monitor stdio \ -boot order=cdn,once=c,menu=on,strict=off \ -enable-kvm 2.Launch listening mode on same host with appending command: -incoming tcp:0:58001 3.do local migration (qemu) migrate -d tcp:127.0.0.1:5802 4.Check the function of mouse, keyboard and usb-storage Result: The mouse and keyboard can not work, but the usb-storage work well(#fdisk -l, the /dev/sda can be shown). Bug verify: Host: 3.10.0-679.el7.ppc64le qemu-kvm-rhev-2.9.0-10.el7.ppc64le SLOF-20170303-4.git66d250e.el7.noarch Guest: 3.10.0-666.el7.ppc64le steps are same with bug reproduction. result: The mouse, keyboard and the usb-storage all work well. So, this bug is verified pass. I have alos verify this bug for x86_64 on qemu-kvm-rhev-2.9.0-10.el7.x86_64, the result is pass. Host: 3.10.0-679.el7.x86_64 qemu-kvm-rhev-2.9.0-10.el7.x86_64 guest: 3.10.0-679.el7.x86_64 the qemu cli and steps are same as #C32. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 |