Description of problem:
Failed to hotunplug the cpu when it's not last one
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.Boot up guest with command
-smp 1,maxcpus=2,cores=2,threads=1,sockets=1 \
-m 4096 \
-device virtio-scsi-pci,bus=pci.0 \
-device scsi-hd,id=scsi-hd0,drive=scsi-hd0-dr0,bootindex=0 \
-drive file=rhel840-ppc64le-virtio-scsi.qcow2,if=none,id=scsi-hd0-dr0,format=qcow2,cache=none \
-device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c4:e7:84 \
-netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \
-chardev stdio,mux=on,id=serial_id_serial0,server,nowait,signal=off \
-device spapr-vty,id=serial111,chardev=serial_id_serial0 \
-mon chardev=serial_id_serial0,mode=readline \
2.Hotplug one vcpu
3.Disable vcpu0 in guest
chcpu -d 0
5.enable vcpu0 in guest
chcpu -e 0
6.Hotunplug vcpu1 again
Failed to hotunplug vcpu1
Can hotunplug vcpu1 because it's not last vcpu now.
Xujun, can you confirm the “Hardware” filed?
Xujun, can we get the actual error message that qemu produces when you try to unplug the core?
(In reply to Qunfang Zhang from comment #1)
> Xujun, can you confirm the “Hardware” filed?
ppc only.x86 has no this problem.
(In reply to David Gibson from comment #2)
> Xujun, can we get the actual error message that qemu produces when you try
> to unplug the core?
(qemu) device_del core1
(qemu) [ 169.162520] pseries-hotplug-cpu: Failed to offline CPU <NULL>, rc: -16
No any message.
Daniel, can you take a look at this one please.
I am able to reproduce the bug using Power 8 and Power 9 servers. I'll investigate.
Xujun, is this a regression?
(In reply to David Gibson from comment #7)
> Xujun, is this a regression?
I'm not sure,I need to try.will give feedback after testing.
(In reply to David Gibson from comment #7)
> Xujun, is this a regression?
Not a regression,hit this problem with slow 8.4 train and 8.3 fast train.
(In reply to Xujun Ma from comment #9)
> (In reply to David Gibson from comment #7)
> > Xujun, is this a regression?
> Not a regression,hit this problem with slow 8.4 train and 8.3 fast train.
In fact this is also happening upstream. This was never handled.
The reason why this is happening is because we're attempting to hotunplug the
last vcpu online of the guest, and during the process we're doing things such
as detaching the Core DRC and so on. The guest refuses to do it because it's
the last online vcpu and the 'unplug success' callback in the QEMU side is
My solution is to check whether the CPU core is the last online in the
guest before attempting the hotunplug. The patches were posted upstream
Could you help estimate and set a right ITM for this bug?
(In reply to Xujun Ma from comment #11)
> Hi Daniel
> Could you help estimate and set a right ITM for this bug?
What is ITM? If it's a Bugzilla flag I don't appear to have access to it.
Regarding the bug, the fix is not as trivial as I first suggested in comment #10,
unfortunately. Turns out that the solution proposed in comment #10 is also flawed,
because we don't have any guarantees that the guest will not offline a CPU in the
middle of the unplug process, making our assumptions pre-unplug obsolete and prone
to the same error.
Mailing list discussions led me to try another approach where I opened up the hotplug
IRQ queue to be fired up at all 'device_del' attempts of removing the CPU core, regardless
of having a previous unplug request pending. This has been disputed because this opens
the possibility of a IRQ event flood in the guest kernel (although I wasn't able to
make the guest misbehave by flooding it), but none of the alternatives to fix this
problem in the QEMU level are clear winners.
The discussions are still happening. Let's wait a bit to see where we're going with this.
Sorry Daniel, ITM's an internal scheduling thing, I'll look after it.
Honestly, we're getting a bit late to get a bugfix into AV-8.4, and given that this is not a regression, I'm not sure there's a compelling reason to push for it. So I'm going to punt this one back to 8.5.
There has been a lot of action in this bug, and not a lot of updates in this
Bugzilla from my end.
From what I've mentioned in comment #12, we went all the way into implementing
a 'CPU hotunplug timeout' mechanism. This logic almost got into 6.0.0. Further
discussions in the mailing list, when evaluating a new QAPI event to report
the timeout, led us to believe that the timeout mechanism isn't a good idea
after all. Telling Libvirt that "a timeout happened, and perhaps something
wrong happened in the guest" doesn't do much. Libvirt would need to inspect the
guest anyway to see if the hotunplug succeeded or not.
This code got reverted and we went for the logic I mentioned in comment #12,
where we'll allow multiple CPU hotunplug requests for the same CPU.
All that said, up to that point we were operating under the assumption that
the kernel does not provide a callback mechanism for hotunplug errors. This
is about to change in kernel v5.13. I've proposed a way to use one of the
existing RTAS calls to signal device removal errors in the kernel, starting
with CPUs. I'm using a hypercall that is used in the device configuration
(RTAS set-indicator) to signal the platform/hypervisor that the kernel found
an error when doing the device removal. This use of the hcall is a no-op in QEMU,
and I checked with the partition firmware folks in IBM and it's also a no-op
for phyp (PowerVM), so it's a viable way of doing it without breaking existing
The kernel patches were queued to powerpc-next . I've also patched QEMU to
handle this new kernel behavior and David accepted it to his ppc-6.1 tree.
This means that we'll finally have some form of reliable hotunplug error
callback mechanism in pSeries.
Going back to this bug, we can either go for the approach that will be available
in QEMU 6.0.0 (allowing multiple hotunplug requests) or this new mechanism I've
implemented that requires code from kernel v5.13. The former will get the bug
fixed faster via rebase and will not required kernel side changes, so perhaps
this approach is preferred here.
(In reply to Daniel Henrique Barboza from comment #14)
> The kernel patches were queued to powerpc-next . I've also patched QEMU to
> handle this new kernel behavior and David accepted it to his ppc-6.1 tree.
> This means that we'll finally have some form of reliable hotunplug error
> callback mechanism in pSeries.
I forgot the link:
While Daniel and I have some more work to polish this in general, I think the fixes already committed to qemu-6.0 will already fix the original problem reported for this BZ. Specifically, the first device_add will still do nothing (without an explicit error) , but the second one should now retry and succeed.
Can you please retest with the rebased qemu-6.0 based package?
(In reply to David Gibson from comment #16)
> While Daniel and I have some more work to polish this in general, I think
> the fixes already committed to qemu-6.0 will already fix the original
> problem reported for this BZ. Specifically, the first device_add will still
> do nothing (without an explicit error) , but the second one should now retry
> and succeed.
> Can you please retest with the rebased qemu-6.0 based package?
Has no this problem on qemu-kvm-6.0.0-16.module+el8.5.0+10848+2dccc46d.ppc64le,bug has been fixed.
Thanks Xujun for confirmation. David, can we close this bug as CURRENTRELEASE? Thanks.
Yes, closing as CURRENTRELEASE. Thanks for verifying this.