Bug 1708459
Summary: | qemu-kvm core dumped when repeat "system_reset" multiple times during guest boot | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Xujun Ma <xuma> | |
Component: | qemu-kvm | Assignee: | Philippe Mathieu-Daudé <philmd> | |
qemu-kvm sub component: | General | QA Contact: | Yiqian Wei <yiwei> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | medium | CC: | aadam, chayang, coli, ddepaula, jasowang, jinzhao, juzhang, knoel, maxime.coquelin, mdeng, micai, ngu, philmd, qzhang, rbalakri, rkhan, virt-maint, wchadwic, xianwang, xuma, yfu, yihyu, yiwei, ymankad, yuhuang, zhenyzha | |
Version: | --- | Keywords: | Regression | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | qemu-kvm-2.12.0-89.module+el8.2.0+4436+f3a2188d | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1692658 | |||
: | 1717321 (view as bug list) | Environment: | ||
Last Closed: | 2020-04-28 15:32:15 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1692658 | |||
Bug Blocks: | 1717321 |
Comment 1
Qunfang Zhang
2019-05-10 02:36:29 UTC
Test env: qemu-kvm-4.0.0-0.module+el8.1.0+3169+3c501422.ppc64le 4.18.0-83.el8.ppc64le virtio_blk only How reproducible:5/10 Met similar issue when testing the same scenario and same version in x86, but instead of qemu core dump, there is KVM internal error. Hi David, Could you please help confirm if they are the same root cause ? error info: 00:44:23 WARNI| Error occur when update VM address cache: KVM internal error. Suberror: 1 emulation failure EAX=00000654 EBX=00000000 ECX=00000000 EDX=000300ae ESI=0000fffe EDI=00000001 EBP=0000069a ESP=0000fff2 EIP=00000058 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 00009300 CS =b800 000b8000 0000ffff 00009b00 SS =0000 00000000 0000ffff 00009300 DS =0000 00000000 0000ffff 00009300 FS =0000 00000000 0000ffff 00009300 GS =0000 00000000 0000ffff 00009300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 host info (dell-per430-01.lab.eng.bos.redhat.com): Vendor GenuineIntel Model Name Intel(R) Xeon(R) CPU E5-2623 v4 @ 2.60GHz Family 6 Model 79 Stepping 1 Speed 3199.93 Processors 8 Cores 4 Sockets 1 Hyper True Flags lm fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local cpufreq Arch(s) x86_64 So, the original bug was not power specific. I can't tell from the limited information if this new symptom has the same root cause. It might be the same bug, or it could be a different all-platforms bug, or a different x86-specific bug. Whatever it is, it's not a POWER specific bug, so reassigning. Hit same issue as comment 3 on x86 when run memory test loop with win2012r2 guest, but didn't reproduce when run the single failed case 100 times. qemu version: qemu-kvm-3.1.0-27.module+el8.0.1+3253+c5371cb3 Failed auto case: hotplug_memory.before.vm_system_reset.hotplug.backend_ram.policy_default.two hotplug_memory.before.stress.hotplug.backend_file.policy_default.two Hi Cong and Yumei, I do not which guest Cong is using for comment 3. But for win2012.r2 guest, kvm internal error is another bug i think, bz 1493501 Best regards Yanan Fu (In reply to Yanan Fu from comment #7) > Hi Cong and Yumei, > > I do not which guest Cong is using for comment 3. > But for win2012.r2 guest, kvm internal error is another bug i think, bz > 1493501 Yeah, seems it's that one. I met it on rhel8, maybe we should clone it to rhel8? > > > Best regards > Yanan Fu Hi Jason, Could you please help confirm if the core dump in comment 0 and KVM internal error in comment 3 are same issue ? If not, could you please also help confirm is it same as bz1493501 which mentioned in comment 7 ? Thanks in advance. Tested this case 100 times with upstream qemu-v3.0.0,and all passed. Confirm there is no this issue on upstream qemu-v3.0.0. Upstream qemu-v3.1.0 pass Upstream qemu-v3.1.1 pass Upstream qemu-v4.0.0 fail Upstream qemu-v4.0.0-rc0 fail So the problem occurs due to patch between qemu-v3.1.1 to qemu-v4.0.0-rc0. upstream qemu bug: https://bugs.launchpad.net/qemu/+bug/1839428 (In reply to Xujun Ma from comment #16) > Upstream qemu-v3.1.0 pass > Upstream qemu-v3.1.1 pass > Upstream qemu-v4.0.0 fail > Upstream qemu-v4.0.0-rc0 fail Does it fail on 4.1-rc3? If yes, please do git bisection[1] between 3.1.1 and 4.0.0 to see the commit that introduces the issue. If not, please try to do reverse [swap good and bad] bisection between 4.0.0 and 4.1-rc3 to find the commit that fixes the problem. > > So the problem occurs due to patch between qemu-v3.1.1 to qemu-v4.0.0-rc0. [1] https://git-scm.com/docs/git-bisect Thanks Still exist this issue on latest qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.ppc64le (qemu) qemu-kvm: /builddir/build/BUILD/qemu-4.1.0/hw/virtio/virtio.c:225: vring_get_region_caches: Assertion `caches != NULL' failed. smp_4.m_4096.virtio_scsi.virtio_net.monitor.serial_stdio.sh: line 12: 33587 Aborted (core dumped) /usr/libexec/qemu-kvm -smp 4 -m 4096 -nodefaults -chardev stdio,mux=on,id=serial_id_serial0,server,nowait,signal=off -device spapr-vty,id=serial111,chardev=serial_id_serial0 -mon chardev=serial_id_serial0,mode=readline -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=1,bus=pci.0,addr=0x7 -drive file=rhel810-ppc64le-virtio.qcow2,if=none,id=drive_image1,format=qcow2,cache=none -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c4:e7:84 -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on Reproduce this issue on x86 platform with latest qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64 (qemu) qemu-kvm: /builddir/build/BUILD/qemu-4.1.0/hw/virtio/virtio.c:225: vring_get_region_caches: Assertion `caches != NULL' failed. test.sh: line 9: 16913 Aborted (core dumped) /usr/libexec/qemu-kvm -m 4096 -smp 8 -boot menu=on -device virtio-blk-pci,id=image1,drive=drivec [root@hp-dl385g10-05 ~]# Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 8 Vendor ID: AuthenticAMD CPU family: 23 Model: 1 Model name: AMD EPYC 7251 8-Core Processor Stepping: 2 CPU MHz: 2513.410 CPU max MHz: 2100.0000 CPU min MHz: 1200.0000 BogoMIPS: 4191.50 Virtualization: AMD-V L1d cache: 32K L1i cache: 64K L2 cache: 512K L3 cache: 4096K NUMA node0 CPU(s): 0,1,16,17 NUMA node1 CPU(s): 2,3,18,19 NUMA node2 CPU(s): 4,5,20,21 NUMA node3 CPU(s): 6,7,22,23 NUMA node4 CPU(s): 8,9,24,25 NUMA node5 CPU(s): 10,11,26,27 NUMA node6 CPU(s): 12,13,28,29 NUMA node7 CPU(s): 14,15,30,31 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gba [root@hp-dl385g10-05 ~]# qemu-kvm-common-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64 qemu-kvm-core-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64 qemu-kvm-block-rbd-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64 qemu-kvm-block-iscsi-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64 qemu-kvm-block-gluster-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64 qemu-kvm-block-curl-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64 qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64 qemu-kvm-block-ssh-4.1.0-4.module+el8.1.0+4020+16089f93.x86_64 QA_ACK, please? Reproduce host version: qemu-kvm-2.12.0-88.module+el8.1.0+4233+bc44be3f.x86_64 kernel-4.18.0-147.el8.x86_64 guest:win2019 test steps: 1.boot a win2019 guest 2.Repeat "system_reset" multiple times {'execute': 'system_reset'} test results: (qemu) qemu-kvm: /builddir/build/BUILD/qemu-2.12.0/hw/virtio/virtio.c:211: vring_get_region_caches: Assertion `caches != NULL' failed. bz.sh: line 21: 5933 Aborted (core dumped) /usr/libexec/qemu-kvm -M pc -S -cpu EPYC-IBPB,enforce -nodefaults -rtc base=utc -m 4G -smp 4,sockets=2,cores=1,threads=2 -enable-kvm -uuid 990ea161-6b67-47b2-b803-19fb01d30d12 -k en-us -qmp tcp:0:6667,server,nowait -vga qxl -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/win2019-64-virtio.qcow2 -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0 -device virtio-net-pci,netdev=tap10,mac=9a:6a:6b:6c:6d:6e -netdev tap,id=tap10,vhost=on -monitor stdio -vnc :1 -monitor unix:/tmp/monitor2,server,nowait Verified the bug with "qemu-kvm-2.12.0-89.module+el8.2.0+4436+f3a2188d.x86_64" version with the same test steps. test results: qemu should not encounter core dumps, guest work well after repeat "system_reset" multiple times. QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:1587 |