Bug 1849483
| Summary: | Failed to boot up guest when hotplugging vcpus on bios stage | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Xujun Ma <xuma> | |
| Component: | qemu-kvm | Assignee: | Laurent Vivier <lvivier> | |
| qemu-kvm sub component: | General | QA Contact: | Xujun Ma <xuma> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | medium | |||
| Priority: | high | CC: | bugproxy, dgibson, hannsj_uhl, jinzhao, juzhang, lvivier, qzhang, virt-maint | |
| Version: | 8.3 | Keywords: | Patch, Triaged | |
| Target Milestone: | rc | |||
| Target Release: | 8.3 | |||
| Hardware: | ppc64le | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | qemu-kvm-5.1.0-7.module+el8.3.0+8099+dba2fe3e | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1895948 (view as bug list) | Environment: | ||
| Last Closed: | 2020-11-17 17:49:16 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1776265, 1854692, 1895948 | |||
Hi Xujun, Could check if this is a regression? (In reply to Laurent Vivier from comment #1) > Hi Xujun, > > Could check if this is a regression? Not a regression. Laurent, this may be a SLOF bug, so I hope you can look at this when you return. This sounds like it might possibly be related to the issue fixed by patches for https://bugzilla.redhat.com/show_bug.cgi?id=1854692 In that case only SVM guests seem to trigger it, but maybe SLOF can trigger it in some cases even without SVM in play. I wasn't able to reproduce the crash, but using the latest rhel-av-8.3.0 tree I can reproduce a permanent hang inside SLOF about 2 out of 5 tries using the script below:
#!/bin/bash
(sleep .1 && echo "device_add host-spapr-cpu-core,core-id=2,id=core2" | nc -U monitor) &
/usr/libexec/qemu-kvm \
-m 4096 \
-smp 2,maxcpus=4,cores=1,threads=2,sockets=1 \
-nodefaults \^ and v keys to change the selection.
-chardev stdio,mux=on,id=serial_id_serial0,server,nowait,signal=off \.
-device spapr-vty,id=serial111,chardev=serial_id_serial0 \
-mon chardev=serial_id_serial0,mode=readline \
-device virtio-scsi-pci,bus=pci.0 \
-device scsi-hd,id=scsi-hd0,drive=scsi-hd0-dr0,bootindex=0 \
-drive file=rhel8-guest.qcow2,if=none,id=scsi-hd0-dr0,format=qcow2,cache=none,snapshot=on \
-device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c4:e7:84 \
-netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \
-mon chardev=monitor,mode=readline -chardev socket,path=monitor,id=monitor,server,nowait,signal=off -nographic
With the following patches for bz1854692 applied I am no longer able to reproduce the issue:
https://lists.nongnu.org/archive/html/qemu-arm/2020-08/msg00705.html
I've made a brew build with these patches applied, please test:
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=31009866
Xujun, Can you test this bug with the build provided by Michael in comment 5? Thanks. (In reply to Qunfang Zhang from comment #6) > Xujun, Can you test this bug with the build provided by Michael in comment > 5? Thanks. I tested this build and didn't hit this bug again. Patches are in David's today PR: https://patchew.org/QEMU/20200904034719.673626-1-david@gibson.dropbear.id.au/ bb5d765a8d33 ("target/arm: Move start-powered-off property to generic CPUState") a79d25aab2c8 ("target/arm: Move setting of CPU halted state to generic code") 695d615e4ac9 ("ppc/spapr: Use start-powered-off CPUState property") Merged upstream: 554c2169e925 ppc/spapr: Use start-powered-off CPUState property 6ad1da667c8e target/arm: Move setting of CPU halted state to generic code c1b701587e59 target/arm: Move start-powered-off property to generic CPUState I have tested it with build qemu-kvm-5.1.0-8.module+el8.3.0+8141+3cd9cd43.ppc64le. And didn't hit this bug again.So the bug has beed fixed in this build.set it to verified. Reset bug priority to high according to the test result and bug criteria for evaluation. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5137 |
Description of problem: Failed to boot up guest when hotplugging vcpus on bios stage Version-Release number of selected component (if applicable): qemu-kvm-5.0.0-0.scrmod+el8.3.0+7066+61d99e35.wrb200617.ppc64le SLOF-20200327-1.git8e012d6f.scrmod+el8.3.0+7066+61d99e35.noarch How reproducible: 100% Steps to Reproduce: 1.Boot up guest with command /usr/libexec/qemu-kvm \ -smp 2,maxcpus=4,cores=1,threads=2,sockets=1 \ -m 4096 \ -nodefaults \ -device virtio-scsi-pci,bus=pci.0 \ -device scsi-hd,id=scsi-hd0,drive=scsi-hd0-dr0,bootindex=0 \ -drive file=rhel830-ppc64le-virtio-scsi.qcow2,if=none,id=scsi-hd0-dr0,format=qcow2,cache=none \ -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c4:e7:84 \ -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \ -chardev stdio,mux=on,id=serial_id_serial0,server,nowait,signal=off \ -device spapr-vty,id=serial111,chardev=serial_id_serial0 \ -mon chardev=serial_id_serial0,mode=readline \ 2.Hotplug vcpus when running bios slof (qemu)device_add host-spapr-cpu-core,core-id=2,id=core2 3. Actual results: Guest stop booting with error as follwing: Trying to load: from: /pci@800000020000000/scsi@0/disk@100000000000000 ... ( 700 ) (qemu) Program Exception [ 7dc6cfff ] R0 .. R7 R8 .. R15 R16 .. R23 R24 .. R31 000000007dbf0308 000000007dc63650 0000000000000053 0000000000008000 000000007e67eff0 000000007dc63658 000000007e747c07 000000000000f003 000000007dc23100 000000007e47b010 000000007e72c44e 0000000000000006 000000007dc65000 000000007dc63100 000000007e72c44e 000000007dc19800 0000000000000000 0000000000000000 000000007e47b010 000000007dc1e040 000000007dc6cfff 0000000000000000 000000007e748218 0000000000000003 000000007dc1fcf0 0000000000000000 000000007e747a60 000000000000f001 fffffffffffffff8 0000000000000000 000000007dc1e210 ffffffffffffffff CR / XER LR / CTR SRR0 / SRR1 DAR / DSISR 80000004 000000007dbf37b0 000000007e748218 0000000000000000 0000000020040000 000000007e748218 0000000000081000 00000000 cb > Expected results: Guest boot smoothly. Additional info: No this problem when option threads=1.