1911414 – Failed to hotunplug the cpu when it's not last one

Bug 1911414 - Failed to hotunplug the cpu when it's not last one

Summary: Failed to hotunplug the cpu when it's not last one

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	8.4
Hardware:	ppc64le
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	8.5
Assignee:	Daniel Henrique Barboza (IBM)
QA Contact:	Xujun Ma
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-12-29 10:41 UTC by Xujun Ma
Modified:	2021-05-24 04:18 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-24 04:18:45 UTC
Type:	---
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Xujun Ma 2020-12-29 10:41:02 UTC

Description of problem:
Failed to hotunplug the cpu when it's not last one

Version-Release number of selected component (if applicable):
host:
kernel-4.18.0-266.el8.ppc64le
qemu-kvm-5.2.0-2.module+el8.4.0+9186+ec44380f.ppc64le
guest:
kernel-4.18.0-259.el8.ppc64le

How reproducible:
100%

Steps to Reproduce:
1.Boot up guest with command
/usr/libexec/qemu-kvm \
 -smp 1,maxcpus=2,cores=2,threads=1,sockets=1  \
 -m 4096 \
 -nodefaults \
 -device virtio-scsi-pci,bus=pci.0 \
 -device scsi-hd,id=scsi-hd0,drive=scsi-hd0-dr0,bootindex=0 \
 -drive file=rhel840-ppc64le-virtio-scsi.qcow2,if=none,id=scsi-hd0-dr0,format=qcow2,cache=none \
 -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c4:e7:84 \
 -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \
 -chardev stdio,mux=on,id=serial_id_serial0,server,nowait,signal=off \
 -device spapr-vty,id=serial111,chardev=serial_id_serial0 \
 -mon chardev=serial_id_serial0,mode=readline \
2.Hotplug one vcpu  
(qemu)device_add host-spapr-cpu-core,core-id=1,id=core1
3.Disable vcpu0 in guest
chcpu -d 0
4.Hotunplug vcpu1
(qemu)device_del core1
5.enable vcpu0 in guest
chcpu -e 0
6.Hotunplug vcpu1 again
(qemu)device_del core1


Actual results:
Failed to hotunplug vcpu1
Expected results:
Can hotunplug vcpu1 because it's not last vcpu now.

Additional info:

Comment 1 Qunfang Zhang 2021-01-05 02:29:37 UTC

Xujun, can you confirm the “Hardware” filed?

Comment 2 David Gibson 2021-01-05 03:56:30 UTC

Xujun, can we get the actual error message that qemu produces when you try to unplug the core?

Comment 3 Xujun Ma 2021-01-05 07:40:51 UTC

(In reply to Qunfang Zhang from comment #1)
> Xujun, can you confirm the “Hardware” filed?

ppc only.x86 has no this problem.

Comment 4 Xujun Ma 2021-01-05 07:49:27 UTC

(In reply to David Gibson from comment #2)
> Xujun, can we get the actual error message that qemu produces when you try
> to unplug the core?

step4:
(qemu) device_del core1
(qemu) [  169.162520] pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16
step6:
No any message.

Comment 5 David Gibson 2021-01-07 23:59:21 UTC

Daniel, can you take a look at this one please.

Comment 6 Daniel Henrique Barboza (IBM) 2021-01-08 18:58:51 UTC

I am able to reproduce the bug using Power 8 and Power 9 servers. I'll investigate.

Comment 7 David Gibson 2021-01-11 06:24:26 UTC

Xujun, is this a regression?

Comment 8 Xujun Ma 2021-01-11 08:53:17 UTC

(In reply to David Gibson from comment #7)
> Xujun, is this a regression?

I'm not sure,I need to try.will give feedback after testing.

Comment 9 Xujun Ma 2021-01-12 02:21:35 UTC

(In reply to David Gibson from comment #7)
> Xujun, is this a regression?

Not a regression,hit this problem with slow 8.4 train and 8.3 fast train.

Comment 10 Daniel Henrique Barboza (IBM) 2021-01-14 18:39:55 UTC

(In reply to Xujun Ma from comment #9)
> (In reply to David Gibson from comment #7)
> > Xujun, is this a regression?
> 
> Not a regression,hit this problem with slow 8.4 train and 8.3 fast train.


In fact this is also happening upstream. This was never handled.

The reason why this is happening is because we're attempting to hotunplug the
last vcpu online of the guest, and during the process we're doing things such
as detaching the Core DRC and so on. The guest refuses to do it because it's
the last online vcpu and the 'unplug success' callback in the QEMU side is
never called.

My solution is to check whether the CPU core is the last online in the
guest before attempting the hotunplug. The patches were posted upstream
for review:

https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg03349.html

Comment 11 Xujun Ma 2021-01-18 07:42:01 UTC

Hi Daniel

Could you help estimate and set a right ITM for this bug?

Comment 12 Daniel Henrique Barboza (IBM) 2021-01-20 00:21:19 UTC

(In reply to Xujun Ma from comment #11)
> Hi Daniel
> 
> Could you help estimate and set a right ITM for this bug?


What is ITM? If it's a Bugzilla flag I don't appear to have access to it.



Regarding the bug, the fix is not as trivial as I first suggested in comment #10,
unfortunately. Turns out that the solution proposed in comment #10 is also flawed,
because we don't have any guarantees that the guest will not offline a CPU in the
middle of the unplug process, making our assumptions pre-unplug obsolete and prone
to the same error.

Mailing list discussions led me to try another approach where I opened up the hotplug
IRQ queue to be fired up at all 'device_del' attempts of removing the CPU core, regardless
of having a previous unplug request pending. This has been disputed because this opens
the possibility of a IRQ event flood in the guest kernel (although I wasn't able to
make the guest misbehave by flooding it), but none of the alternatives to fix this
problem in the QEMU level are clear winners.


The discussions are still happening. Let's wait a bit to see where we're going with this.

Comment 13 David Gibson 2021-01-28 03:35:31 UTC

Sorry Daniel,  ITM's an internal scheduling thing, I'll look after it.

Honestly, we're getting a bit late to get a bugfix into AV-8.4, and given that this is not a regression, I'm not sure there's a compelling reason to push for it.  So I'm going to punt this one back to 8.5.

Comment 14 Daniel Henrique Barboza (IBM) 2021-04-22 13:50:34 UTC

There has been a lot of action in this bug, and not a lot of updates in this
Bugzilla from my end.

From what I've mentioned in comment #12, we went all the way into implementing
a 'CPU hotunplug timeout' mechanism. This logic almost got into 6.0.0. Further
discussions in the mailing list, when evaluating a new QAPI event to report
the timeout, led us to believe that the timeout mechanism isn't a good idea
after all. Telling Libvirt that "a timeout happened, and perhaps something
wrong happened in the guest" doesn't do much. Libvirt would need to inspect the
guest anyway to see if the hotunplug succeeded or not.

This code got reverted and we went for the logic I mentioned in comment #12,
where we'll allow multiple CPU hotunplug requests for the same CPU.


All that said, up to that point we were operating under the assumption that
the kernel does not provide a callback mechanism for hotunplug errors. This
is about to change in kernel v5.13. I've proposed a way to use one of the
existing RTAS calls to signal device removal errors in the kernel, starting
with CPUs. I'm using a hypercall that is used in the device configuration
(RTAS set-indicator) to signal the platform/hypervisor that the kernel found
an error when doing the device removal. This use of the hcall is a no-op in QEMU,
and I checked with the partition firmware folks in IBM and it's also a no-op
for phyp (PowerVM), so it's a viable way of doing it without breaking existing
hypervisors.

The kernel patches were queued to powerpc-next [1]. I've also patched QEMU to
handle this new kernel behavior and David accepted it to his ppc-6.1 tree.
This means that we'll finally have some form of reliable hotunplug error
callback mechanism in pSeries.


Going back to this bug, we can either go for the approach that will be available
in QEMU 6.0.0 (allowing multiple hotunplug requests) or this new mechanism I've
implemented that requires code from kernel v5.13. The former will get the bug
fixed faster via rebase and will not required kernel side changes, so perhaps
this approach is preferred here.

Comment 15 Daniel Henrique Barboza (IBM) 2021-04-22 14:24:32 UTC

(In reply to Daniel Henrique Barboza from comment #14)
> The kernel patches were queued to powerpc-next [1]. I've also patched QEMU to
> handle this new kernel behavior and David accepted it to his ppc-6.1 tree.
> This means that we'll finally have some form of reliable hotunplug error
> callback mechanism in pSeries.

I forgot the link:

[1] https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=239586&state=*

Comment 16 David Gibson 2021-05-17 05:09:13 UTC

Xujun,

While Daniel and I have some more work to polish this in general, I think the fixes already committed to qemu-6.0 will already fix the original problem reported for this BZ.  Specifically, the first device_add will still do nothing (without an explicit error) , but the second one should now retry and succeed.

Can you please retest with the rebased qemu-6.0 based package?

Comment 17 Xujun Ma 2021-05-18 04:33:40 UTC

(In reply to David Gibson from comment #16)
> Xujun,
> 
> While Daniel and I have some more work to polish this in general, I think
> the fixes already committed to qemu-6.0 will already fix the original
> problem reported for this BZ.  Specifically, the first device_add will still
> do nothing (without an explicit error) , but the second one should now retry
> and succeed.
> 
> Can you please retest with the rebased qemu-6.0 based package?

Has no this problem on qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.ppc64le,bug has been fixed.

Comment 18 Qunfang Zhang 2021-05-18 05:55:24 UTC

Thanks Xujun for confirmation.  David, can we close this bug as CURRENTRELEASE?  Thanks.

Comment 19 David Gibson 2021-05-24 04:18:45 UTC

Yes, closing as CURRENTRELEASE.  Thanks for verifying this.

Note You need to log in before you can comment on or make changes to this bug.