Bug 1708459

Summary:	qemu-kvm core dumped when repeat "system_reset" multiple times during guest boot
Product:	Red Hat Enterprise Linux 8	Reporter:	Xujun Ma <xuma>
Component:	qemu-kvm	Assignee:	Philippe Mathieu-Daudé <philmd>
qemu-kvm sub component:	General	QA Contact:	Yiqian Wei <yiwei>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	medium	CC:	aadam, chayang, coli, ddepaula, jasowang, jinzhao, juzhang, knoel, maxime.coquelin, mdeng, micai, ngu, philmd, qzhang, rbalakri, rkhan, virt-maint, wchadwic, xianwang, xuma, yfu, yihyu, yiwei, ymankad, yuhuang, zhenyzha
Version:	---	Keywords:	Regression
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:	qemu-kvm-2.12.0-89.module+el8.2.0+4436+f3a2188d	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1692658
Clones:	1717321 (view as bug list)		Environment:
Last Closed:	2020-04-28 15:32:15 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1692658
Bug Blocks:	1717321

Comment 1 Qunfang Zhang 2019-05-10 02:36:29 UTC

Xujun, pls confirm x86 test result since bug 1692658 is reproducible for x86 too.

Comment 2 Xujun Ma 2019-05-10 05:16:25 UTC

Test env:
qemu-kvm-4.0.0-0.module+el8.1.0+3169+3c501422.ppc64le
4.18.0-83.el8.ppc64le

virtio_blk only

How reproducible:5/10

Comment 3 CongLi 2019-05-13 03:18:56 UTC

Met similar issue when testing the same scenario and same version in x86,
but instead of qemu core dump, there is KVM internal error.


Hi David,

Could you please help confirm if they are the same root cause ?

error info:
00:44:23 WARNI| Error occur when update VM address cache: KVM internal error. Suberror: 1
emulation failure
EAX=00000654 EBX=00000000 ECX=00000000 EDX=000300ae
ESI=0000fffe EDI=00000001 EBP=0000069a ESP=0000fff2
EIP=00000058 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =b800 000b8000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

host info (dell-per430-01.lab.eng.bos.redhat.com):
Vendor	GenuineIntel
Model Name	Intel(R) Xeon(R) CPU E5-2623 v4 @ 2.60GHz
Family	6
Model	79
Stepping	1
Speed	3199.93
Processors	8
Cores	4
Sockets	1
Hyper	True
Flags	lm fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local cpufreq
Arch(s)	x86_64

Comment 4 David Gibson 2019-05-14 04:57:54 UTC

So, the original bug was not power specific.

I can't tell from the limited information if this new symptom has the same root cause.  It might be the same bug, or it could be a different all-platforms bug, or a different x86-specific bug.  Whatever it is, it's not a POWER specific bug, so reassigning.

Comment 6 Yumei Huang 2019-06-03 05:55:12 UTC

Hit same issue as comment 3 on x86 when run memory test loop with win2012r2 guest, but didn't reproduce when run the single failed case 100 times.

qemu version:  
qemu-kvm-3.1.0-27.module+el8.0.1+3253+c5371cb3

Failed auto case:
hotplug_memory.before.vm_system_reset.hotplug.backend_ram.policy_default.two
hotplug_memory.before.stress.hotplug.backend_file.policy_default.two

Comment 7 Yanan Fu 2019-06-03 06:36:34 UTC

Hi Cong and Yumei,

I do not which guest Cong is using for comment 3.
But for win2012.r2 guest,  kvm internal error is another bug i think, bz 1493501


Best regards
Yanan　Fu

Comment 8 Yumei Huang 2019-06-03 06:49:50 UTC

(In reply to Yanan Fu from comment #7)
> Hi Cong and Yumei,
> 
> I do not which guest Cong is using for comment 3.
> But for win2012.r2 guest,  kvm internal error is another bug i think, bz
> 1493501

Yeah, seems it's that one. I met it on rhel8, maybe we should clone it to rhel8? 

> 
> 
> Best regards
> Yanan　Fu

Comment 9 CongLi 2019-06-03 08:10:05 UTC

Hi Jason,

Could you please help confirm if the core dump in comment 0 and 
KVM internal error in comment 3 are same issue ?

If not, could you please also help confirm is it same as bz1493501 
which mentioned in comment 7 ?

Thanks in advance.

Comment 15 Xujun Ma 2019-08-07 23:58:45 UTC

Tested this case 100 times with upstream qemu-v3.0.0,and all passed.
Confirm there is no this issue on upstream qemu-v3.0.0.

Comment 16 Xujun Ma 2019-08-08 05:31:02 UTC

Upstream qemu-v3.1.0 pass
Upstream qemu-v3.1.1 pass
Upstream qemu-v4.0.0 fail
Upstream qemu-v4.0.0-rc0 fail

So the problem occurs due to patch between qemu-v3.1.1 to qemu-v4.0.0-rc0.

Comment 17 Xujun Ma 2019-08-08 07:24:55 UTC

upstream qemu bug:
https://bugs.launchpad.net/qemu/+bug/1839428

Comment 18 jason wang 2019-08-08 07:50:54 UTC

(In reply to Xujun Ma from comment #16)
> Upstream qemu-v3.1.0 pass
> Upstream qemu-v3.1.1 pass
> Upstream qemu-v4.0.0 fail
> Upstream qemu-v4.0.0-rc0 fail

Does it fail on 4.1-rc3? If yes, please do git bisection[1] between 3.1.1 and 4.0.0 to see the commit that introduces the issue. If not, please try to do reverse [swap good and bad] bisection between 4.0.0 and 4.1-rc3 to find the commit that fixes the problem.

> 
> So the problem occurs due to patch between qemu-v3.1.1 to qemu-v4.0.0-rc0.

[1] https://git-scm.com/docs/git-bisect

Thanks

Comment 27 Xujun Ma 2019-08-27 01:49:16 UTC

Still exist this issue on latest qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.ppc64le

(qemu) qemu-kvm: /builddir/build/BUILD/qemu-4.1.0/hw/virtio/virtio.c:225: vring_get_region_caches: Assertion `caches != NULL' failed.
smp_4.m_4096.virtio_scsi.virtio_net.monitor.serial_stdio.sh: line 12: 33587 Aborted                 (core dumped) /usr/libexec/qemu-kvm -smp 4 -m 4096 -nodefaults -chardev stdio,mux=on,id=serial_id_serial0,server,nowait,signal=off -device spapr-vty,id=serial111,chardev=serial_id_serial0 -mon chardev=serial_id_serial0,mode=readline -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=1,bus=pci.0,addr=0x7 -drive file=rhel810-ppc64le-virtio.qcow2,if=none,id=drive_image1,format=qcow2,cache=none -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c4:e7:84 -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on

Comment 29 Xujun Ma 2019-08-27 06:25:24 UTC

Reproduce this issue on x86 platform with latest qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64

(qemu) qemu-kvm: /builddir/build/BUILD/qemu-4.1.0/hw/virtio/virtio.c:225: vring_get_region_caches: Assertion `caches != NULL' failed.
test.sh: line 9: 16913 Aborted                 (core dumped) /usr/libexec/qemu-kvm -m 4096 -smp 8 -boot menu=on -device virtio-blk-pci,id=image1,drive=drivec
[root@hp-dl385g10-05 ~]# Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              32
On-line CPU(s) list: 0-31
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           2
NUMA node(s):        8
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               1
Model name:          AMD EPYC 7251 8-Core Processor
Stepping:            2
CPU MHz:             2513.410
CPU max MHz:         2100.0000
CPU min MHz:         1200.0000
BogoMIPS:            4191.50
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           64K
L2 cache:            512K
L3 cache:            4096K
NUMA node0 CPU(s):   0,1,16,17
NUMA node1 CPU(s):   2,3,18,19
NUMA node2 CPU(s):   4,5,20,21
NUMA node3 CPU(s):   6,7,22,23
NUMA node4 CPU(s):   8,9,24,25
NUMA node5 CPU(s):   10,11,26,27
NUMA node6 CPU(s):   12,13,28,29
NUMA node7 CPU(s):   14,15,30,31
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gba
[root@hp-dl385g10-05 ~]# qemu-kvm-common-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64
qemu-kvm-core-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64
qemu-kvm-block-rbd-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64
qemu-kvm-block-iscsi-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64
qemu-kvm-block-gluster-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64
qemu-kvm-block-curl-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64
qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64
qemu-kvm-block-ssh-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64

Comment 37 Danilo de Paula 2019-09-25 14:24:47 UTC

QA_ACK, please?

Comment 40 Yiqian Wei 2019-10-15 08:05:35 UTC

Reproduce 
host version:
qemu-kvm-2.12.0-88.module+el8.1.0+4233+bc44be3f.x86_64
kernel-4.18.0-147.el8.x86_64
guest:win2019

test steps:
1.boot a win2019 guest
2.Repeat "system_reset" multiple times
{'execute': 'system_reset'}

test results:
(qemu) qemu-kvm: /builddir/build/BUILD/qemu-2.12.0/hw/virtio/virtio.c:211: vring_get_region_caches: Assertion `caches != NULL' failed.
bz.sh: line 21:  5933 Aborted                 (core dumped) /usr/libexec/qemu-kvm -M pc -S -cpu EPYC-IBPB,enforce -nodefaults -rtc base=utc -m 4G -smp 4,sockets=2,cores=1,threads=2 -enable-kvm -uuid 990ea161-6b67-47b2-b803-19fb01d30d12 -k en-us -qmp tcp:0:6667,server,nowait -vga qxl -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/win2019-64-virtio.qcow2 -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0 -device virtio-net-pci,netdev=tap10,mac=9a:6a:6b:6c:6d:6e -netdev tap,id=tap10,vhost=on -monitor stdio -vnc :1 -monitor unix:/tmp/monitor2,server,nowait


Verified the bug with "qemu-kvm-2.12.0-89.module+el8.2.0+4436+f3a2188d.x86_64" version with the same test steps.
test results:
qemu should not encounter core dumps, guest work well after repeat "system_reset" multiple times.

Comment 42 Ademar Reis 2020-02-05 22:57:37 UTC

QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 44 errata-xmlrpc 2020-04-28 15:32:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:1587