Bug 1895948
| Summary: | Failed to boot up guest when hotplugging vcpus on bios stage | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Xujun Ma <xuma> |
| Component: | qemu-kvm | Assignee: | Daniel Henrique Barboza (IBM) <dbarboza> |
| qemu-kvm sub component: | General | QA Contact: | Xujun Ma <xuma> |
| Status: | CLOSED WONTFIX | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | bugproxy, dbarboza, dgibson, hannsj_uhl, jinzhao, juzhang, lvivier, qzhang, virt-maint |
| Version: | 8.4 | Keywords: | Patch, Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | 8.4 | ||
| Hardware: | ppc64le | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1849483 | Environment: | |
| Last Closed: | 2021-01-03 23:53:38 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1849483 | ||
| Bug Blocks: | 1796871, 1854692 | ||
|
Comment 2
Xujun Ma
2020-11-09 14:26:13 UTC
Reset bug priority to high according to the test result and bug criteria for evaluation. Hi David Could you help have look at this bug, it was cloned from RHELAV8.3. I think it need to be fixed in slow train because it make bios crash and affect the booting process of guest. That's not acceptable for customer, I think. Xujun, that's a reasonable point. I've asked Daniel to take a look at this, though if the backport is difficult we might have to reconsider. Just did the backport. Xujun, can you please test before I send it downstream? https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=33622919 (In reply to Daniel Henrique Barboza from comment #8) > Just did the backport. Xujun, can you please test before I send it > downstream? > > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=33622919 Hi Daniel Sorry,I missed it,Could you provide a new one? (In reply to Xujun Ma from comment #9) > (In reply to Daniel Henrique Barboza from comment #8) > > Just did the backport. Xujun, can you please test before I send it > > downstream? > > > > > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=33622919 > > Hi Daniel > Sorry,I missed it,Could you provide a new one? No worries. Here's another one: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=33733785 Hi Daniel Still have this problem with this build you provided. Just took it for a test now. Indeed, I can reproduce the bug almost 50% of the time with the backport. I can't reproduce the bug in RHEL-AV 8.3 and upstream QEMU. This makes me believe that the backport from the 1849483 fix is not enough to fix the issue in RHEL 8.3 codebase. We would need more investigation to understand what we're missing from RHEL 8.3 AV code that fixes the bug there. David, it turns out that this isn't a straight backport and fix, and more work is needed to fix the bug in slow train. I'm not sure if this bug is worth the extra effort but I don't have a strong opinion against doing it either. Whatever you decide to do (go for it and fix for slow train or leave it alone) is fine by me. (In reply to Daniel Henrique Barboza from comment #12) > Just took it for a test now. Indeed, I can reproduce the bug almost 50% of > the > time with the backport. I can't reproduce the bug in RHEL-AV 8.3 and > upstream QEMU. > > This makes me believe that the backport from the 1849483 fix is not enough to > fix the issue in RHEL 8.3 codebase. We would need more investigation to > understand > what we're missing from RHEL 8.3 AV code that fixes the bug there. Did you try patched RHEL-8.3.0 qemu with RHEL-AV-8.3.0 SLOF to see if the bug is fixed by a change in SLOF? (In reply to Laurent Vivier from comment #13) > (In reply to Daniel Henrique Barboza from comment #12) > > Just took it for a test now. Indeed, I can reproduce the bug almost 50% of > > the > > time with the backport. I can't reproduce the bug in RHEL-AV 8.3 and > > upstream QEMU. > > > > This makes me believe that the backport from the 1849483 fix is not enough to > > fix the issue in RHEL 8.3 codebase. We would need more investigation to > > understand > > what we're missing from RHEL 8.3 AV code that fixes the bug there. > > > Did you try patched RHEL-8.3.0 qemu with RHEL-AV-8.3.0 SLOF to see if the > bug is fixed by a change in SLOF? Yes, I forgot to mention that I was running SLOF from RHEL 8.3-AV. To be sure, I just tried with an even more recent SLOF version (SLOF-20200717-1.gite18ddad8.scrmod+el8.4.0+8960+f63fed48). I can still reproduce the "Exception #700" error quite often when doing a vcpu hotplug right at the guest start. Taking a quick look between the differences in both code bases, in hw/ppc/ files, there's nothing that caught my immediate attention. There are a lot of CAS related work with hotplug/unplug from Greg, but CAS isn't relevant at this stage of the boot. I concur with Daniel's reasoning in comment 12. Given that this is not a straightforward backport, I don't think it's worth fixing in slow train. The expected use case for slow train qemu is manual virtualization, so users can manually avoid this problem by not hotplugging early. The problem was more important in AV where the hotplugs might occur from automated management actions, which makes guest reboots harder to avoid in practice. |