Bug 1594135
| Summary: | system_reset many times linux guests cause qemu process Aborted | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | FuXiangChun <xfu> | ||||||||
| Component: | qemu-kvm-rhev | Assignee: | Gerd Hoffmann <kraxel> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | FuXiangChun <xfu> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | high | ||||||||||
| Version: | 7.6 | CC: | chayang, jinzhao, juzhang, kraxel, lersek, mrezanin, virt-maint, xfu | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | qemu-kvm-rhev-2.12.0-8.el7 | Doc Type: | If docs needed, set a value | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2018-11-01 11:10:36 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
Created attachment 1453668 [details]
qemu-cli
Hi Laszlo, I reproduced twice with /usr/share/OVMF/OVMF_VARS.fd via script in comment0. For the first time: (qemu) #####102##### for second time: (qemu) #####137##### But I can not reproduce it with OVMF_VARS.secboot.fd. qemu process still works well when rebooting 600 times. (qemu) #####600##### In addition, I am not sure it is OVMF's bug. I will test seabios later. and update test result to bz. I doubt this is an ovmf issue; the random resets might leave the device model in an unclean state if the xhci reset handler misses someting. In some cases QEMU reacts to invalid guest behavior with an explicit abort() preceded by an error message, but generally not with a failed assert(). Guest behavior is usually not assert()ed, but checked explicitly.
1954 for (i = 0; i < length; i++) {
1955 TRBType type;
1956 type = xhci_ring_fetch(xhci, ring, &xfer->trbs[i], NULL);
1957 assert(type);
1958 }
I believe xhci_ring_fetch() reads from guest memory.
FuXiangChun, can you attach an OVMF debug log that terminates when QEMU aborts? Seeing the guest behavior more closely might help narrow down the QEMU code path.
Gerd, what do you think? Thank you.
(In reply to Laszlo Ersek from comment #4) > I doubt this is an ovmf issue; It's not that clear ... > the random resets might leave the device > model in an unclean state if the xhci reset handler misses someting. In some > cases QEMU reacts to invalid guest behavior with an explicit abort() > preceded by an error message, but generally not with a failed assert(). > Guest behavior is usually not assert()ed, but checked explicitly. > > 1954 for (i = 0; i < length; i++) { > 1955 TRBType type; > 1956 type = xhci_ring_fetch(xhci, ring, &xfer->trbs[i], NULL); > 1957 assert(type); > 1958 } > > I believe xhci_ring_fetch() reads from guest memory. Yes. And there is a "length = xhci_ring_chain_length(xhci, ring);" call before that. It could be a qemu bug (inconsistency between xhci_ring_chain_length and xhci_ring_fetch). It could be a guest bug (OVMF modifying ring entries owned by xhci). qemu indeed should not assert() in that case though. > FuXiangChun, can you attach an OVMF debug log that terminates when QEMU > aborts? Seeing the guest behavior more closely might help narrow down the > QEMU code path. Yep, that would be interesting, especially whenever the assert happens before or after ovmf initializing xhci. Right! I guess: - QEMU could turn the assert() into an error message and an abort(), - the XHCI driver in edk2 likely needs a patch that clears the ring area before exposing the ring to the device (or to the "device model", in the VM case); the issue is likely that the driver expects the ring area to be zero-filled, but that may not hold after system_reset. ... In "MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c", we have functions like CreateTransferRing() and CreateEventRing(). They both call UsbHcAllocateMem(), which seems to zero the memory internally. After UsbHcAllocateMem() returns, both callers seem to zero the memory again. It's not obvious to me where the corruption comes from. Would it be possible to just pause the VM (rather than abort) when the issue is encountered (i.e., with a debug patch)? Then we could attach gdb to QEMU's gdbserver, and poke around in the firmware memory. Well, what xhci should be doing is something like this (untested):
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -1954,7 +1954,12 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid)
for (i = 0; i < length; i++) {
TRBType type;
type = xhci_ring_fetch(xhci, ring, &xfer->trbs[i], NULL);
- assert(type);
+ if (!type) {
+ xhci_die(xhci);
+ xhci_ep_free_xfer(xfer);
+ epctx->kick_active--;
+ return;
+ }
}
xfer->streamid = streamid;
Just add "vm_stop(RUN_STATE_PAUSED)" before return ...
Laszlo, Before ovmf's log is removed. I am trying to reproduce it. will update log to bz asap. I tried to reproduce it locally, using various intervals for "sleep"; I failed. According to comment0. I tried to reproduce it. I hit OVMF(Tianocore) hang. please check ovmf log. I will continue to reproduce this bug. Created attachment 1454550 [details]
ovmf-log
Created attachment 1454597 [details]
qemu process Aborted ovmf log
(In reply to FuXiangChun from comment #10) > According to comment0. I tried to reproduce it. I hit OVMF(Tianocore) hang. > > please check ovmf log. I will continue to reproduce this bug. --------- Loading SMM driver at 0x0007FEF3000 EntryPoint=0x0007FF2C98E VariableSmm.efi mSmmMemLibInternalMaximumSupportAddress = 0xFFFFFFFFF VarCheckLibRegisterSetVariableCheckHandler - 0x7FF2FB43 Success Variable Store header is corrupted ASSERT_EFI_ERROR (Status = Volume Corrupt) ASSERT /builddir/build/BUILD/ovmf-ee3198e672e2/MdeModulePkg/Universal/Variable/RuntimeDxe/VariableSmm.c(936): !EFI_ERROR (Status) --------- This is unrelated to the XHCI issue. The variable store file got corrupted due to a platform reset that was likely initiated in the middle of a flash chip write. If you remove the variable store file, libvirt will recreate it for you (or you can recreate it yourself), from the variable store template at "/usr/share/OVMF/OVMF_VARS.fd" (or "/usr/share/OVMF/OVMF_VARS.secboot.fd"). Ok, crash happens after ovmf initialized xhci, so xhci reset not resetting everything is unlikely I think. What springs to attention is this endless stream of XhcCheckUrbResult: short packet happens messages. Also: there are two xhci controllers added to the guest. Do you need both to trigger this? Or does it also happen with only one xhci controller? (In reply to FuXiangChun from comment #13) > Created attachment 1454597 [details] > qemu process Aborted ovmf log I see two interesting things in this log: (1) There are a *lot* of "XhcCheckUrbResult: short packet happens!" messages. I've split the cumulative log file (collected over the reboot loop) into separate OVMF boot logs, with the "csplit" utility. There are 47 boot logs. In some of them, only a few such messages can be seen (let's say, fewer than 10, per boot) --; that seems normal. The message is logged from "MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c", function XhcCheckUrbResult(): > case TRB_COMPLETION_SHORT_PACKET: > case TRB_COMPLETION_SUCCESS: > if (EvtTrb->Completecode == TRB_COMPLETION_SHORT_PACKET) { > DEBUG ((EFI_D_VERBOSE, "XhcCheckUrbResult: short packet happens!\n")); > } > > TRBType = (UINT8) (TRBPtr->Type); > if ((TRBType == TRB_TYPE_DATA_STAGE) || > (TRBType == TRB_TYPE_NORMAL) || > (TRBType == TRB_TYPE_ISOCH)) { > CheckedUrb->Completed += (((TRANSFER_TRB_NORMAL*)TRBPtr)->Length - EvtTrb->Length); > } > > break; However, in some files, there are extreme numbers of such messages: xx01:2 xx11:2841 xx21:2743 xx31:2759 xx41:5 xx02:5 xx12:2798 xx22:2905 xx32:2927 xx42:2861 xx03:2613 xx13:2862 xx23:5 xx33:2926 xx43:2809 xx04:2877 xx14:2769 xx24:2801 xx34:2919 xx44:5 xx05:2734 xx15:2880 xx25:2773 xx35:2907 xx45:2854 xx06:5 xx16:2754 xx26:2847 xx36:2857 xx46:2773 xx07:2883 xx17:2891 xx27:2889 xx37:5 xx47:2928 xx08:2873 xx18:2832 xx28:2711 xx38:5 xx09:2855 xx19:2898 xx29:2881 xx39:2974 xx10:5 xx20:5 xx30:2895 xx40:2967 (2) In some log files, it looks like we have a "NumLock storm", intermixed with the above "short packet" messages. For example, in the (split out) log file "xx03", we have 8504 XhcCheckUrbResult: short packet happens! 8505 Current toggle state is 0xC2 8506 XhcCheckUrbResult: short packet happens! 8507 Current toggle state is 0xC0 8508 XhcCheckUrbResult: short packet happens! 8509 Current toggle state is 0xC2 8510 UsbBusStart: usb bus starXhcCheckUrbResult: short packet happens! 8511 XhcCheckUrbResult: short packet happens! 8512 Current toggle state is 0xC0 The "Current toggle state" message is printed by "MdeModulePkg/Universal/Console/ConSplitterDxe/ConSplitter.c", function ToggleStateSyncKeyNotify(). The numeric value logged is a bitmask, from "MdePkg/Include/Protocol/SimpleTextInEx.h": > #define EFI_TOGGLE_STATE_VALID 0x80 > #define EFI_KEY_STATE_EXPOSED 0x40 > #define EFI_SCROLL_LOCK_ACTIVE 0x01 > #define EFI_NUM_LOCK_ACTIVE 0x02 > #define EFI_CAPS_LOCK_ACTIVE 0x04 Thus, when the log alternates between 0xC2 and 0XC0, that means we constantly have (EFI_TOGGLE_STATE_VALID | EFI_KEY_STATE_EXPOSED), and EFI_NUM_LOCK_ACTIVE alternates. Now, let's see the # of occurrences of "Current toggle state" in the log files: xx01:0 xx11:1416 xx21:1368 xx31:1374 xx41:1 xx02:1 xx12:1398 xx22:1451 xx32:1462 xx42:1427 xx03:1305 xx13:1426 xx23:1 xx33:1462 xx43:1395 xx04:1436 xx14:1380 xx24:1397 xx34:1456 xx44:1 xx05:1363 xx15:1436 xx25:1385 xx35:1445 xx45:1423 xx06:1 xx16:1374 xx26:1419 xx36:1421 xx46:1383 xx07:1438 xx17:1443 xx27:1442 xx37:1 xx47:1462 xx08:1435 xx18:1413 xx28:1353 xx38:1 xx09:1422 xx19:1446 xx29:1434 xx39:1484 xx10:1 xx20:1 xx30:1443 xx40:1479 This is why I'm calling it this a "NumLock storm" -- it seems very unlikely that someone intentionally presses NumLock thousands of times in any given boot. Notice how the number of "Current toggle state" messages is about half of the "short packet happens" message count, for each log file. ... Strangely, in my testing, if I press NumLock a few times in the OVMF setup TUI, then something gets stuck, and a continuous stream of the following messages is produced, without any further keypresses: > XhcCheckUrbResult: short packet happens! > XhcCheckUrbResult: short packet happens! > Current toggle state is 0xC2 > XhcCheckUrbResult: short packet happens! > XhcCheckUrbResult: short packet happens! > Current toggle state is 0xC0 In fact when I try to move around the menus in this state, some of my keypresses are lost. And, the NumLock state is inverted for real all the time: using KP_4 on the keypad sometimes inserts the digit 4, and sometimes moves the cursor left. More info: - The storm starts when I first *de-activate* NumLock. At the very first NumLock press (i.e., activation), nothing questionable seems to happen. - The storm reproduces with a sole XHCI controller as well. - CapsLock behaves similarly to NumLock, with value 0xC4 replacing 0xC2. (In reply to Laszlo Ersek from comment #17) > More info: > - The storm starts when I first *de-activate* NumLock. At the very first > NumLock > press (i.e., activation), nothing questionable seems to happen. > - The storm reproduces with a sole XHCI controller as well. Closing spice client stops it for me. Hmm. (In reply to Gerd Hoffmann from comment #19) > (In reply to Laszlo Ersek from comment #17) > > More info: > > - The storm starts when I first *de-activate* NumLock. At the very first > > NumLock > > press (i.e., activation), nothing questionable seems to happen. > > - The storm reproduces with a sole XHCI controller as well. > > Closing spice client stops it for me. > Hmm. Spice client actually sends a constant stream of numlock up/down messages. Whoops. Looks like a spice bug. Explains the stream of short reads. Doesn't explain the abort though. I missed a detail: - from comment 0: the guest OS is RHEL-7.5 (also confirmed by comment 2: the disk image is called "/home/rhel75-64-secure-boot.qcow2"), and - from comment 0: we have *two* sleep commands, and their cumulative effect is a 121 second wait between resets. This implies that the RHEL-7.5 guest has enough time to boot up fully, and the system_reset is issued every time after the Linux kernel has taken ownership of the XHCI device. FuXiangChun, in comment 3 you mention that you plan to test with SeaBIOS as well. Can you please do that as next step please? Perhaps we can remove the guest firmware from the picture. Laszlo, I can not reproduce it with (seabios +q35) after system_reset guest 120 times. Script still is running. Tomorrow morning, I will update the latest result to bz. I've rebuilt upstream QEMU (@ 00928a421d47) with Gerd's debug patch from comment 7, and am running a similar reboot loop script. OVMF is also upstream (@ 14923c1a6bf9). I use a 100s sleep time. I'll report back if I can reproduce the issue. (In reply to Laszlo Ersek from comment #24) > I've rebuilt upstream QEMU (@ 00928a421d47) with Gerd's debug patch from > comment 7, and am running a similar reboot loop script. OVMF is also > upstream (@ 14923c1a6bf9). I use a 100s sleep time. I'll report back if I > can reproduce the issue. I couldn't reproduce the issue in 200 system_reset iterations. To be exact, I used the following patch: > diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c > index 721beb5486e5..862c755200bd 100644 > --- a/hw/usb/hcd-xhci.c > +++ b/hw/usb/hcd-xhci.c > @@ -28,6 +28,7 @@ > #include "hw/pci/msix.h" > #include "trace.h" > #include "qapi/error.h" > +#include "sysemu/sysemu.h" > > #include "hcd-xhci.h" > > @@ -1954,7 +1955,13 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, > unsigned int streamid) > for (i = 0; i < length; i++) { > TRBType type; > type = xhci_ring_fetch(xhci, ring, &xfer->trbs[i], NULL); > - assert(type); > + if (!type) { > + xhci_die(xhci); > + xhci_ep_free_xfer(xfer); > + epctx->kick_active--; > + vm_stop(RUN_STATE_PAUSED); > + return; > + } > } > xfer->streamid = streamid; > And, I used the following shell script: > X=0 > while true; do > if ! virsh dominfo ovmf.rhel7.q35 | grep State | grep running; then > echo domain is not running > break > fi > echo $((X++)) > virsh qemu-monitor-command ovmf.rhel7.q35 --hmp system_reset > sleep 100 > done The idea being, if the debug patch pauses the domain, then the script should not send system_reset at the next iteration. And, I verified that the script behaved correctly with a manually paused domain. In addition, I verified that STOP events were logged correctly in my libvirtd log: # grep QEMU_MONITOR_RECV_EVENT /var/log/libvirt/libvirtd.log | grep STOP and none were reported during those 200 iterations. (vm_stop() calls do_vm_stop() calls qapi_event_send_stop().) (In reply to Laszlo Ersek from comment #25) > (In reply to Laszlo Ersek from comment #24) > > I've rebuilt upstream QEMU (@ 00928a421d47) with Gerd's debug patch from > > comment 7, and am running a similar reboot loop script. OVMF is also > > upstream (@ 14923c1a6bf9). I use a 100s sleep time. I'll report back if I > > can reproduce the issue. > > I couldn't reproduce the issue in 200 system_reset iterations. Laszlo, So far,OVMF and seabiso can reproduce this bug. This is test summary. s1). OVMF+qemu-kvm-rhev-2.12.0-4.el7.x86_64--> reproduced (qemu) system_reset (qemu) #####137##### s2).seabios-bin-1.11.0-2.el7.noarch+qemu-kvm-rhev-2.12.0-4.el7.x86_64->reproduced (qemu) system_reset (qemu) #####640##### I will re-test it with the latest qemu-kvm-rhev + kernel + seabios later and update result to this next Monday. In addition, This is seabios's result as below. qemu-kvm: hw/usb/hcd-xhci.c:1957: xhci_kick_epctx: Assertion `type' failed. Program received signal SIGABRT, Aborted. .... (gdb) bt #0 0x00007fffed497207 in raise () at /lib64/libc.so.6 #1 0x00007fffed4988f8 in abort () at /lib64/libc.so.6 #2 0x00007fffed490026 in __assert_fail_base () at /lib64/libc.so.6 #3 0x00007fffed4900d2 in () at /lib64/libc.so.6 #4 0x0000555555a44d1d in xhci_kick_epctx (epctx=0x5555588ccea0, streamid=0) at hw/usb/hcd-xhci.c:1957 #5 0x0000555555b90481 in timerlist_run_timers (timer_list=0x5555570e16c0) at util/qemu-timer.c:536 #6 0x0000555555b90766 in qemu_clock_run_all_timers (type=<optimized out>) at util/qemu-timer.c:547 #7 0x0000555555b90766 in qemu_clock_run_all_timers () at util/qemu-timer.c:674 #8 0x0000555555b90c99 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:528 #9 0x000055555583bbb7 in main () at vl.c:1963 #10 0x000055555583bbb7 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4768 Hello FuXiangChun, thank you very much for your work; I think this result is significant. I'm moving the BZ to qemu-kvm-rhev and retitling it too. can be reproduced with the latest qemu-kvm-rhev + kernel + seabios. qemu-kvm-rhev-2.12.0-6.el7.x86_64 seabios-bin-1.11.0-2.el7.noarch 3.10.0-916.el7.x86_64 upstream commit 8f36ec708834dfad58af6feb0b69bb0be6077f0f Fix included in qemu-kvm-rhev-2.12.0-8.el7 Verified bug with qemu-kvm-rhev-2.12.0-9.el7.x86_64. system_reset guest 210 times. guest works well. (qemu) system_reset (qemu) #####210##### so, this bug is fixed. set bug as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3443 |
Description of problem: Boot RHEL7.5 guest with OVMF, then many times reboot guest via system_reset. sometimes qemu process will be aborted. Version-Release number of selected component (if applicable): host kernel: 3.10.0-906.el7.x86_64 qemu-kvm-rhev-2.12.0-4.el7.x86_64 OVMF-20180508-2.gitee3198e672e2.el7.noarch How reproducible: sometimes 1/102 1/137 Steps to Reproduce: 1.qemu command line in attachment. 2.reboot guest via this small script # cat reboot.sh i=1 while [ $i -lt 1000000 ] do sleep 1 echo "system_reset"|nc -U /tmp/monitor2 i=$(($i+1)) echo "#####$i#####" sleep 120 done 3. Actual results: qemu process Aborted qemu-kvm: hw/usb/hcd-xhci.c:1957: xhci_kick_epctx: Assertion `type' failed. Program received signal SIGABRT, Aborted. (gdb) bt #0 0x00007fffed497207 in raise () from /lib64/libc.so.6 #1 0x00007fffed4988f8 in abort () from /lib64/libc.so.6 #2 0x00007fffed490026 in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007fffed4900d2 in __assert_fail () from /lib64/libc.so.6 #4 0x0000555555a44d1d in xhci_kick_epctx (epctx=0x55555950d290, streamid=0) at hw/usb/hcd-xhci.c:1957 #5 0x00005555559fd7e1 in hid_keyboard_event (dev=0x5555589246b8, src=<optimized out>, evt=<optimized out>) at hw/input/hid.c:245 #6 0x0000555555a9940d in qemu_input_event_send_impl (src=0x0, evt=0x555559945a10) at ui/input.c:346 #7 0x0000555555a99913 in qemu_input_event_send_key (src=0x0, key=0x55555877be80, down=<optimized out>) at ui/input.c:419 #8 0x00007fffeec9ded5 in inputs_channel_handle_message () from /lib64/libspice-server.so.1 #9 0x00007fffeeca8a40 in red_channel_client_receive () from /lib64/libspice-server.so.1 #10 0x00007fffeeca9428 in red_channel_client_event () from /lib64/libspice-server.so.1 #11 0x0000555555b91118 in aio_dispatch_handlers (ctx=ctx@entry=0x555557061680) at util/aio-posix.c:406 #12 0x0000555555b919b8 in aio_dispatch (ctx=0x555557061680) at util/aio-posix.c:437 #13 0x0000555555b8e76e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261 ---Type <return> to continue, or q <return> to quit--- #14 0x00007ffff76c4049 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #15 0x0000555555b90c76 in glib_pollfds_poll () at util/main-loop.c:215 #16 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:263 #17 main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:522 #18 0x000055555583bbb7 in main_loop () at vl.c:1963 #19 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4768 Expected results: works Additional info: