Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1594135

Summary:

system_reset many times linux guests cause qemu process Aborted

Product:

Red Hat Enterprise Linux 7

Reporter:

FuXiangChun <xfu>

Component:

qemu-kvm-rhev

Assignee:

Gerd Hoffmann <kraxel>

Status:

CLOSED ERRATA

QA Contact:

FuXiangChun <xfu>

Severity:

high

Docs Contact:

Priority:

high

Version:

7.6

CC:

chayang, jinzhao, juzhang, kraxel, lersek, mrezanin, virt-maint, xfu

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

qemu-kvm-rhev-2.12.0-8.el7

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-11-01 11:10:36 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
qemu-cli	none
ovmf-log	none
qemu process Aborted ovmf log	none

Description FuXiangChun 2018-06-22 08:21:20 UTC

Description of problem:
Boot RHEL7.5 guest with OVMF, then many times reboot guest via system_reset. sometimes qemu process will be aborted.  

Version-Release number of selected component (if applicable):
host kernel:
3.10.0-906.el7.x86_64
qemu-kvm-rhev-2.12.0-4.el7.x86_64
OVMF-20180508-2.gitee3198e672e2.el7.noarch

How reproducible:
sometimes
1/102
1/137

Steps to Reproduce:
1.qemu command line in attachment.

2.reboot guest via this small script
# cat reboot.sh 
i=1
while [ $i -lt 1000000 ]
do
sleep 1
echo "system_reset"|nc -U /tmp/monitor2
i=$(($i+1))
echo "#####$i#####"
sleep 120
done

3. 

Actual results:
qemu process Aborted

qemu-kvm: hw/usb/hcd-xhci.c:1957: xhci_kick_epctx: Assertion `type' failed.

Program received signal SIGABRT, Aborted.

(gdb) bt
#0  0x00007fffed497207 in raise () from /lib64/libc.so.6
#1  0x00007fffed4988f8 in abort () from /lib64/libc.so.6
#2  0x00007fffed490026 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007fffed4900d2 in __assert_fail () from /lib64/libc.so.6
#4  0x0000555555a44d1d in xhci_kick_epctx (epctx=0x55555950d290, streamid=0)
    at hw/usb/hcd-xhci.c:1957
#5  0x00005555559fd7e1 in hid_keyboard_event (dev=0x5555589246b8, 
    src=<optimized out>, evt=<optimized out>) at hw/input/hid.c:245
#6  0x0000555555a9940d in qemu_input_event_send_impl (src=0x0, 
    evt=0x555559945a10) at ui/input.c:346
#7  0x0000555555a99913 in qemu_input_event_send_key (src=0x0, 
    key=0x55555877be80, down=<optimized out>) at ui/input.c:419
#8  0x00007fffeec9ded5 in inputs_channel_handle_message ()
   from /lib64/libspice-server.so.1
#9  0x00007fffeeca8a40 in red_channel_client_receive ()
   from /lib64/libspice-server.so.1
#10 0x00007fffeeca9428 in red_channel_client_event ()
   from /lib64/libspice-server.so.1
#11 0x0000555555b91118 in aio_dispatch_handlers (ctx=ctx@entry=0x555557061680)
    at util/aio-posix.c:406
#12 0x0000555555b919b8 in aio_dispatch (ctx=0x555557061680)
    at util/aio-posix.c:437
#13 0x0000555555b8e76e in aio_ctx_dispatch (source=<optimized out>, 
    callback=<optimized out>, user_data=<optimized out>) at util/async.c:261
---Type <return> to continue, or q <return> to quit---
#14 0x00007ffff76c4049 in g_main_context_dispatch ()
   from /lib64/libglib-2.0.so.0
#15 0x0000555555b90c76 in glib_pollfds_poll () at util/main-loop.c:215
#16 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:263
#17 main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:522
#18 0x000055555583bbb7 in main_loop () at vl.c:1963
#19 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at vl.c:4768


Expected results:
works

Additional info:

Comment 2 FuXiangChun 2018-06-22 08:22:22 UTC

Created attachment 1453668 [details]
qemu-cli

Comment 3 FuXiangChun 2018-06-22 08:30:29 UTC

Hi Laszlo,

I reproduced twice with /usr/share/OVMF/OVMF_VARS.fd via script in comment0.

For the first time:
(qemu) #####102#####

for second time:
(qemu) #####137#####

But I can not reproduce it with OVMF_VARS.secboot.fd.

qemu process still works well when rebooting 600 times. 
(qemu) #####600#####


In addition, I am not sure it is OVMF's bug.  I will test seabios later. and update test result to bz.

Comment 4 Laszlo Ersek 2018-06-23 19:14:57 UTC

I doubt this is an ovmf issue; the random resets might leave the device model in an unclean state if the xhci reset handler misses someting. In some cases QEMU reacts to invalid guest behavior with an explicit abort() preceded by an error message, but generally not with a failed assert(). Guest behavior is usually not assert()ed, but checked explicitly.

  1954          for (i = 0; i < length; i++) {
  1955              TRBType type;
  1956              type = xhci_ring_fetch(xhci, ring, &xfer->trbs[i], NULL);
  1957              assert(type);
  1958          }

I believe xhci_ring_fetch() reads from guest memory.

FuXiangChun, can you attach an OVMF debug log that terminates when QEMU aborts? Seeing the guest behavior more closely might help narrow down the QEMU code path.

Gerd, what do you think? Thank you.

Comment 5 Gerd Hoffmann 2018-06-25 07:35:50 UTC

(In reply to Laszlo Ersek from comment #4)
> I doubt this is an ovmf issue;

It's not that clear ...

> the random resets might leave the device
> model in an unclean state if the xhci reset handler misses someting. In some
> cases QEMU reacts to invalid guest behavior with an explicit abort()
> preceded by an error message, but generally not with a failed assert().
> Guest behavior is usually not assert()ed, but checked explicitly.
> 
>   1954          for (i = 0; i < length; i++) {
>   1955              TRBType type;
>   1956              type = xhci_ring_fetch(xhci, ring, &xfer->trbs[i], NULL);
>   1957              assert(type);
>   1958          }
> 
> I believe xhci_ring_fetch() reads from guest memory.

Yes.  And there is a "length = xhci_ring_chain_length(xhci, ring);" call before that.

It could be a qemu bug (inconsistency between xhci_ring_chain_length and xhci_ring_fetch).

It could be a guest bug (OVMF modifying ring entries owned by xhci).
qemu indeed should not assert() in that case though.

> FuXiangChun, can you attach an OVMF debug log that terminates when QEMU
> aborts? Seeing the guest behavior more closely might help narrow down the
> QEMU code path.

Yep, that would be interesting, especially whenever the assert happens before or after ovmf initializing xhci.

Comment 6 Laszlo Ersek 2018-06-25 07:54:01 UTC

Right! I guess:

- QEMU could turn the assert() into an error message and an abort(),

- the XHCI driver in edk2 likely needs a patch that clears the ring area
  before exposing the ring to the device (or to the "device model", in the VM
  case); the issue is likely that the driver expects the ring area to be
  zero-filled, but that may not hold after system_reset.

... In "MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c", we have functions like CreateTransferRing() and CreateEventRing(). They both call UsbHcAllocateMem(), which seems to zero the memory internally. After UsbHcAllocateMem() returns, both callers seem to zero the memory again. It's not obvious to me where the corruption comes from.

Would it be possible to just pause the VM (rather than abort) when the issue is encountered (i.e., with a debug patch)? Then we could attach gdb to QEMU's gdbserver, and poke around in the firmware memory.

Comment 7 Gerd Hoffmann 2018-06-25 08:15:24 UTC

Well, what xhci should be doing is something like this (untested):

--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -1954,7 +1954,12 @@ static void xhci_kick_epctx(XHCIEPContext *epctx, unsigned int streamid)
         for (i = 0; i < length; i++) {
             TRBType type;
             type = xhci_ring_fetch(xhci, ring, &xfer->trbs[i], NULL);
-            assert(type);
+            if (!type) {
+                xhci_die(xhci);
+                xhci_ep_free_xfer(xfer);
+                epctx->kick_active--;
+                return;
+            }
         }
         xfer->streamid = streamid;
 

Just add "vm_stop(RUN_STATE_PAUSED)" before return ...

Comment 8 FuXiangChun 2018-06-25 10:35:32 UTC

Laszlo,

Before ovmf's log is removed. I am trying to reproduce it.  will update log to bz asap.

Comment 9 Laszlo Ersek 2018-06-25 16:14:57 UTC

I tried to reproduce it locally, using various intervals for "sleep"; I failed.

Comment 10 FuXiangChun 2018-06-26 05:41:26 UTC

According to comment0. I tried to reproduce it. I hit OVMF(Tianocore) hang.

please check ovmf log.  I will continue to reproduce this bug.

Comment 11 FuXiangChun 2018-06-26 05:42:29 UTC

Created attachment 1454550 [details]
ovmf-log

Comment 13 FuXiangChun 2018-06-26 08:57:02 UTC

Created attachment 1454597 [details]
qemu process Aborted ovmf log

Comment 14 Laszlo Ersek 2018-06-26 11:01:38 UTC

(In reply to FuXiangChun from comment #10)
> According to comment0. I tried to reproduce it. I hit OVMF(Tianocore) hang.
> 
> please check ovmf log.  I will continue to reproduce this bug.

---------
Loading SMM driver at 0x0007FEF3000 EntryPoint=0x0007FF2C98E VariableSmm.efi
mSmmMemLibInternalMaximumSupportAddress = 0xFFFFFFFFF
VarCheckLibRegisterSetVariableCheckHandler - 0x7FF2FB43 Success
Variable Store header is corrupted

ASSERT_EFI_ERROR (Status = Volume Corrupt)
ASSERT /builddir/build/BUILD/ovmf-ee3198e672e2/MdeModulePkg/Universal/Variable/RuntimeDxe/VariableSmm.c(936): !EFI_ERROR (Status)
---------

This is unrelated to the XHCI issue. The variable store file got corrupted due to a platform reset that was likely initiated in the middle of a flash chip write. If you remove the variable store file, libvirt will recreate it for you (or you can recreate it yourself), from the variable store template at "/usr/share/OVMF/OVMF_VARS.fd" (or "/usr/share/OVMF/OVMF_VARS.secboot.fd").

Comment 15 Gerd Hoffmann 2018-06-26 11:02:13 UTC

Ok, crash happens after ovmf initialized xhci, so xhci reset not resetting everything is unlikely I think.

What springs to attention is this endless stream of 

   XhcCheckUrbResult: short packet happens

messages.

Also: there are two xhci controllers added to the guest.  Do you need both to trigger this?  Or does it also happen with only one xhci controller?

Comment 16 Laszlo Ersek 2018-06-26 11:51:20 UTC

(In reply to FuXiangChun from comment #13)
> Created attachment 1454597 [details]
> qemu process Aborted ovmf log

I see two interesting things in this log:

(1) There are a *lot* of "XhcCheckUrbResult: short packet happens!"
messages.

I've split the cumulative log file (collected over the reboot loop) into
separate OVMF boot logs, with the "csplit" utility. There are 47 boot logs.
In some of them, only a few such messages can be seen (let's say, fewer than
10, per boot) --; that seems normal. The message is logged from
"MdeModulePkg/Bus/Pci/XhciDxe/XhciSched.c", function XhcCheckUrbResult():

>       case TRB_COMPLETION_SHORT_PACKET:
>       case TRB_COMPLETION_SUCCESS:
>         if (EvtTrb->Completecode == TRB_COMPLETION_SHORT_PACKET) {
>           DEBUG ((EFI_D_VERBOSE, "XhcCheckUrbResult: short packet happens!\n"));
>         }
>
>         TRBType = (UINT8) (TRBPtr->Type);
>         if ((TRBType == TRB_TYPE_DATA_STAGE) ||
>             (TRBType == TRB_TYPE_NORMAL) ||
>             (TRBType == TRB_TYPE_ISOCH)) {
>           CheckedUrb->Completed += (((TRANSFER_TRB_NORMAL*)TRBPtr)->Length - EvtTrb->Length);
>         }
>
>         break;

However, in some files, there are extreme numbers of such messages:

  xx01:2          xx11:2841       xx21:2743       xx31:2759       xx41:5
  xx02:5          xx12:2798       xx22:2905       xx32:2927       xx42:2861
  xx03:2613       xx13:2862       xx23:5          xx33:2926       xx43:2809
  xx04:2877       xx14:2769       xx24:2801       xx34:2919       xx44:5
  xx05:2734       xx15:2880       xx25:2773       xx35:2907       xx45:2854
  xx06:5          xx16:2754       xx26:2847       xx36:2857       xx46:2773
  xx07:2883       xx17:2891       xx27:2889       xx37:5          xx47:2928
  xx08:2873       xx18:2832       xx28:2711       xx38:5
  xx09:2855       xx19:2898       xx29:2881       xx39:2974
  xx10:5          xx20:5          xx30:2895       xx40:2967

(2) In some log files, it looks like we have a "NumLock storm", intermixed
with the above "short packet" messages. For example, in the (split out) log
file "xx03", we have

  8504 XhcCheckUrbResult: short packet happens!
  8505 Current toggle state is 0xC2
  8506 XhcCheckUrbResult: short packet happens!
  8507 Current toggle state is 0xC0
  8508 XhcCheckUrbResult: short packet happens!
  8509 Current toggle state is 0xC2
  8510 UsbBusStart: usb bus starXhcCheckUrbResult: short packet happens!
  8511 XhcCheckUrbResult: short packet happens!
  8512 Current toggle state is 0xC0

The "Current toggle state" message is printed by
"MdeModulePkg/Universal/Console/ConSplitterDxe/ConSplitter.c", function
ToggleStateSyncKeyNotify(). The numeric value logged is a bitmask, from
"MdePkg/Include/Protocol/SimpleTextInEx.h":

> #define EFI_TOGGLE_STATE_VALID    0x80
> #define EFI_KEY_STATE_EXPOSED     0x40
> #define EFI_SCROLL_LOCK_ACTIVE    0x01
> #define EFI_NUM_LOCK_ACTIVE       0x02
> #define EFI_CAPS_LOCK_ACTIVE      0x04

Thus, when the log alternates between 0xC2 and 0XC0, that means we
constantly have (EFI_TOGGLE_STATE_VALID | EFI_KEY_STATE_EXPOSED), and
EFI_NUM_LOCK_ACTIVE alternates.

Now, let's see the # of occurrences of "Current toggle state" in the log
files:

  xx01:0          xx11:1416       xx21:1368       xx31:1374       xx41:1
  xx02:1          xx12:1398       xx22:1451       xx32:1462       xx42:1427
  xx03:1305       xx13:1426       xx23:1          xx33:1462       xx43:1395
  xx04:1436       xx14:1380       xx24:1397       xx34:1456       xx44:1
  xx05:1363       xx15:1436       xx25:1385       xx35:1445       xx45:1423
  xx06:1          xx16:1374       xx26:1419       xx36:1421       xx46:1383
  xx07:1438       xx17:1443       xx27:1442       xx37:1          xx47:1462
  xx08:1435       xx18:1413       xx28:1353       xx38:1
  xx09:1422       xx19:1446       xx29:1434       xx39:1484
  xx10:1          xx20:1          xx30:1443       xx40:1479

This is why I'm calling it this a "NumLock storm" -- it seems very unlikely
that someone intentionally presses NumLock thousands of times in any given
boot.

Notice how the number of "Current toggle state" messages is about half of
the "short packet happens" message count, for each log file.


... Strangely, in my testing, if I press NumLock a few times in the OVMF
setup TUI, then something gets stuck, and a continuous stream of the
following messages is produced, without any further keypresses:

> XhcCheckUrbResult: short packet happens!
> XhcCheckUrbResult: short packet happens!
> Current toggle state is 0xC2
> XhcCheckUrbResult: short packet happens!
> XhcCheckUrbResult: short packet happens!
> Current toggle state is 0xC0

In fact when I try to move around the menus in this state, some of my
keypresses are lost. And, the NumLock state is inverted for real all the
time: using KP_4 on the keypad sometimes inserts the digit 4, and sometimes
moves the cursor left.

Comment 17 Laszlo Ersek 2018-06-26 11:59:49 UTC

More info:
- The storm starts when I first *de-activate* NumLock. At the very first NumLock
  press (i.e., activation), nothing questionable seems to happen.
- The storm reproduces with a sole XHCI controller as well.

Comment 18 Laszlo Ersek 2018-06-26 12:01:36 UTC

- CapsLock behaves similarly to NumLock, with value 0xC4 replacing 0xC2.

Comment 19 Gerd Hoffmann 2018-06-26 12:55:28 UTC

(In reply to Laszlo Ersek from comment #17)
> More info:
> - The storm starts when I first *de-activate* NumLock. At the very first
> NumLock
>   press (i.e., activation), nothing questionable seems to happen.
> - The storm reproduces with a sole XHCI controller as well.

Closing spice client stops it for me.
Hmm.

Comment 20 Gerd Hoffmann 2018-06-26 13:04:03 UTC

(In reply to Gerd Hoffmann from comment #19)
> (In reply to Laszlo Ersek from comment #17)
> > More info:
> > - The storm starts when I first *de-activate* NumLock. At the very first
> > NumLock
> >   press (i.e., activation), nothing questionable seems to happen.
> > - The storm reproduces with a sole XHCI controller as well.
> 
> Closing spice client stops it for me.
> Hmm.

Spice client actually sends a constant stream of numlock up/down messages.
Whoops.  Looks like a spice bug.  Explains the stream of short reads.

Doesn't explain the abort though.

Comment 21 Laszlo Ersek 2018-06-26 19:45:32 UTC

I missed a detail:
- from comment 0: the guest OS is RHEL-7.5 (also confirmed by comment 2: the
  disk image is called "/home/rhel75-64-secure-boot.qcow2"), and
- from comment 0: we have *two* sleep commands, and their cumulative effect is
  a 121 second wait between resets.

This implies that the RHEL-7.5 guest has enough time to boot up fully, and the system_reset is issued every time after the Linux kernel has taken ownership of the XHCI device.

Comment 22 Laszlo Ersek 2018-06-26 19:47:00 UTC

FuXiangChun, in comment 3 you mention that you plan to test with SeaBIOS as well.  Can you please do that as next step please? Perhaps we can remove the guest firmware from the picture.

Comment 23 FuXiangChun 2018-06-27 10:04:49 UTC

Laszlo,

I can not reproduce it with (seabios +q35) after system_reset guest 120 times. Script still is running.  Tomorrow morning, I will update the latest result to bz.

Comment 24 Laszlo Ersek 2018-06-27 18:13:16 UTC

I've rebuilt upstream QEMU (@ 00928a421d47) with Gerd's debug patch from comment 7, and am running a similar reboot loop script. OVMF is also upstream (@ 14923c1a6bf9). I use a 100s sleep time. I'll report back if I can reproduce the issue.

Comment 25 Laszlo Ersek 2018-06-28 00:04:29 UTC

(In reply to Laszlo Ersek from comment #24)
> I've rebuilt upstream QEMU (@ 00928a421d47) with Gerd's debug patch from
> comment 7, and am running a similar reboot loop script. OVMF is also
> upstream (@ 14923c1a6bf9). I use a 100s sleep time. I'll report back if I
> can reproduce the issue.

I couldn't reproduce the issue in 200 system_reset iterations.

Comment 26 Laszlo Ersek 2018-06-28 13:37:14 UTC

To be exact, I used the following patch:

> diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
> index 721beb5486e5..862c755200bd 100644
> --- a/hw/usb/hcd-xhci.c
> +++ b/hw/usb/hcd-xhci.c
> @@ -28,6 +28,7 @@
>  #include "hw/pci/msix.h"
>  #include "trace.h"
>  #include "qapi/error.h"
> +#include "sysemu/sysemu.h"
>
>  #include "hcd-xhci.h"
>
> @@ -1954,7 +1955,13 @@ static void xhci_kick_epctx(XHCIEPContext *epctx,
> unsigned int streamid)
>          for (i = 0; i < length; i++) {
>              TRBType type;
>              type = xhci_ring_fetch(xhci, ring, &xfer->trbs[i], NULL);
> -            assert(type);
> +            if (!type) {
> +                xhci_die(xhci);
> +                xhci_ep_free_xfer(xfer);
> +                epctx->kick_active--;
> +                vm_stop(RUN_STATE_PAUSED);
> +                return;
> +            }
>          }
>          xfer->streamid = streamid;
>

And, I used the following shell script:

> X=0
> while true; do
>   if ! virsh dominfo ovmf.rhel7.q35 | grep State | grep running; then
>     echo domain is not running
>     break
>   fi
>   echo $((X++))
>   virsh qemu-monitor-command ovmf.rhel7.q35 --hmp system_reset
>   sleep 100
> done

The idea being, if the debug patch pauses the domain, then the script should
not send system_reset at the next iteration. And, I verified that the script
behaved correctly with a manually paused domain. In addition, I verified
that STOP events were logged correctly in my libvirtd log:

# grep QEMU_MONITOR_RECV_EVENT /var/log/libvirt/libvirtd.log | grep STOP

and none were reported during those 200 iterations. (vm_stop() calls
do_vm_stop() calls qapi_event_send_stop().)

Comment 27 FuXiangChun 2018-06-29 02:07:26 UTC

(In reply to Laszlo Ersek from comment #25)
> (In reply to Laszlo Ersek from comment #24)
> > I've rebuilt upstream QEMU (@ 00928a421d47) with Gerd's debug patch from
> > comment 7, and am running a similar reboot loop script. OVMF is also
> > upstream (@ 14923c1a6bf9). I use a 100s sleep time. I'll report back if I
> > can reproduce the issue.
> 
> I couldn't reproduce the issue in 200 system_reset iterations.


Laszlo,

So far,OVMF and seabiso can reproduce this bug. This is test summary. 

s1). OVMF+qemu-kvm-rhev-2.12.0-4.el7.x86_64--> reproduced

(qemu) system_reset
(qemu) #####137#####

s2).seabios-bin-1.11.0-2.el7.noarch+qemu-kvm-rhev-2.12.0-4.el7.x86_64->reproduced

(qemu) system_reset
(qemu) #####640#####


I will re-test it with the latest qemu-kvm-rhev + kernel + seabios later and update result to this next Monday.


In addition, This is seabios's result as below.

qemu-kvm: hw/usb/hcd-xhci.c:1957: xhci_kick_epctx: Assertion `type' failed.
Program received signal SIGABRT, Aborted.
....
(gdb) bt
#0  0x00007fffed497207 in raise () at /lib64/libc.so.6
#1  0x00007fffed4988f8 in abort () at /lib64/libc.so.6
#2  0x00007fffed490026 in __assert_fail_base () at /lib64/libc.so.6
#3  0x00007fffed4900d2 in  () at /lib64/libc.so.6
#4  0x0000555555a44d1d in xhci_kick_epctx (epctx=0x5555588ccea0, streamid=0) at hw/usb/hcd-xhci.c:1957
#5  0x0000555555b90481 in timerlist_run_timers (timer_list=0x5555570e16c0) at util/qemu-timer.c:536
#6  0x0000555555b90766 in qemu_clock_run_all_timers (type=<optimized out>) at util/qemu-timer.c:547
#7  0x0000555555b90766 in qemu_clock_run_all_timers () at util/qemu-timer.c:674
#8  0x0000555555b90c99 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:528
#9  0x000055555583bbb7 in main () at vl.c:1963
#10 0x000055555583bbb7 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4768

Comment 28 Laszlo Ersek 2018-06-29 10:24:28 UTC

Hello FuXiangChun,

thank you very much for your work; I think this result is significant. I'm moving the BZ to qemu-kvm-rhev and retitling it too.

Comment 29 FuXiangChun 2018-07-02 01:42:46 UTC

can be reproduced with the latest qemu-kvm-rhev + kernel + seabios.

qemu-kvm-rhev-2.12.0-6.el7.x86_64
seabios-bin-1.11.0-2.el7.noarch
3.10.0-916.el7.x86_64

Comment 30 Gerd Hoffmann 2018-07-02 16:54:15 UTC

https://patchwork.ozlabs.org/patch/938073/

Comment 31 Gerd Hoffmann 2018-07-04 07:24:05 UTC

upstream commit 8f36ec708834dfad58af6feb0b69bb0be6077f0f

Comment 33 Miroslav Rezanina 2018-07-24 14:33:53 UTC

Fix included in qemu-kvm-rhev-2.12.0-8.el7

Comment 35 FuXiangChun 2018-08-02 10:42:32 UTC

Verified bug with qemu-kvm-rhev-2.12.0-9.el7.x86_64.

system_reset guest 210 times.  guest works well.
(qemu) system_reset
(qemu) #####210#####


so, this bug is fixed.  set bug as verified.

Comment 36 errata-xmlrpc 2018-11-01 11:10:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3443