Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1895948

Summary: Failed to boot up guest when hotplugging vcpus on bios stage
Product: Red Hat Enterprise Linux 8 Reporter: Xujun Ma <xuma>
Component: qemu-kvmAssignee: Daniel Henrique Barboza (IBM) <dbarboza>
qemu-kvm sub component: General QA Contact: Xujun Ma <xuma>
Status: CLOSED WONTFIX Docs Contact:
Severity: high    
Priority: high CC: bugproxy, dbarboza, dgibson, hannsj_uhl, jinzhao, juzhang, lvivier, qzhang, virt-maint
Version: 8.4Keywords: Patch, Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.4   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1849483 Environment:
Last Closed: 2021-01-03 23:53:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1849483    
Bug Blocks: 1796871, 1854692    

Comment 2 Xujun Ma 2020-11-09 14:26:13 UTC
Test environment:
host:
kernel-4.18.0-240.10.el8.ppc64le
qemu-kvm-4.2.0-35.module+el8.4.0+8453+f5da6c50.ppc64le

Comment 5 Xujun Ma 2020-11-11 07:33:07 UTC
Reset bug priority to high according to the test result and bug criteria for evaluation.

Comment 6 Xujun Ma 2020-11-25 09:44:41 UTC
Hi David

Could you help have look at this bug, it was cloned from RHELAV8.3.
I think it need to be fixed in slow train because it make bios crash and affect the booting process of guest.
That's not acceptable for customer, I think.

Comment 7 David Gibson 2020-12-04 03:06:18 UTC
Xujun, that's a reasonable point.  I've asked Daniel to take a look at this, though if the backport is difficult we might have to reconsider.

Comment 8 Daniel Henrique Barboza (IBM) 2020-12-10 09:17:29 UTC
Just did the backport. Xujun, can you please test before I send it downstream?


https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=33622919

Comment 9 Xujun Ma 2020-12-14 10:23:01 UTC
(In reply to Daniel Henrique Barboza from comment #8)
> Just did the backport. Xujun, can you please test before I send it
> downstream?
> 
> 
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=33622919

Hi Daniel
Sorry,I missed it,Could you provide a new one?

Comment 10 Daniel Henrique Barboza (IBM) 2020-12-14 19:01:55 UTC
(In reply to Xujun Ma from comment #9)
> (In reply to Daniel Henrique Barboza from comment #8)
> > Just did the backport. Xujun, can you please test before I send it
> > downstream?
> > 
> > 
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=33622919
> 
> Hi Daniel
> Sorry,I missed it,Could you provide a new one?

No worries. Here's another one:

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=33733785

Comment 11 Xujun Ma 2020-12-15 06:02:07 UTC
Hi Daniel

Still have this problem with this build you provided.

Comment 12 Daniel Henrique Barboza (IBM) 2020-12-15 09:27:32 UTC
Just took it for a test now. Indeed, I can reproduce the bug almost 50% of the
time with the backport. I can't reproduce the bug in RHEL-AV 8.3 and
upstream QEMU.

This makes me believe that the backport from the 1849483 fix is not enough to
fix the issue in RHEL 8.3 codebase. We would need more investigation to understand
what we're missing from RHEL 8.3 AV code that fixes the bug there.


David, it turns out that this isn't a straight backport and fix, and more work
is needed to fix the bug in slow train. I'm not sure if this bug is worth
the extra effort but I don't have a strong opinion against doing it either.
Whatever you decide to do (go for it and fix for slow train or leave it alone)
is fine by me.

Comment 13 Laurent Vivier 2020-12-15 09:40:44 UTC
(In reply to Daniel Henrique Barboza from comment #12)
> Just took it for a test now. Indeed, I can reproduce the bug almost 50% of
> the
> time with the backport. I can't reproduce the bug in RHEL-AV 8.3 and
> upstream QEMU.
> 
> This makes me believe that the backport from the 1849483 fix is not enough to
> fix the issue in RHEL 8.3 codebase. We would need more investigation to
> understand
> what we're missing from RHEL 8.3 AV code that fixes the bug there.


Did you try patched RHEL-8.3.0 qemu with RHEL-AV-8.3.0 SLOF to see if the bug is fixed by a change in SLOF?

Comment 14 Daniel Henrique Barboza (IBM) 2020-12-15 11:02:43 UTC
(In reply to Laurent Vivier from comment #13)
> (In reply to Daniel Henrique Barboza from comment #12)
> > Just took it for a test now. Indeed, I can reproduce the bug almost 50% of
> > the
> > time with the backport. I can't reproduce the bug in RHEL-AV 8.3 and
> > upstream QEMU.
> > 
> > This makes me believe that the backport from the 1849483 fix is not enough to
> > fix the issue in RHEL 8.3 codebase. We would need more investigation to
> > understand
> > what we're missing from RHEL 8.3 AV code that fixes the bug there.
> 
> 
> Did you try patched RHEL-8.3.0 qemu with RHEL-AV-8.3.0 SLOF to see if the
> bug is fixed by a change in SLOF?

Yes, I forgot to mention that I was running SLOF from RHEL 8.3-AV.

To be sure, I just tried with an even more recent SLOF version
(SLOF-20200717-1.gite18ddad8.scrmod+el8.4.0+8960+f63fed48). I can still reproduce
the "Exception #700" error quite often when doing a vcpu hotplug right at the
guest start.

Taking a quick look between the differences in both code bases, in hw/ppc/ files,
there's nothing that caught my immediate attention. There are a lot of CAS related
work with hotplug/unplug from Greg, but CAS isn't relevant at this stage of the boot.

Comment 15 David Gibson 2021-01-03 23:53:38 UTC
I concur with Daniel's reasoning in comment 12.  Given that this is not a straightforward backport, I don't think it's worth fixing in slow train.

The expected use case for slow train qemu is manual virtualization, so users can manually avoid this problem by not hotplugging early.  The problem was more important in AV where the hotplugs might occur from automated management actions, which makes guest reboots harder to avoid in practice.