Bug 808033 - kvm guest doesn't see all hotplugged vcpus when 'virsh setvcpus 64 --live ' or hot-plugged devices when they added fast enough
kvm guest doesn't see all hotplugged vcpus when 'virsh setvcpus 64 --live ' o...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: seabios (Show other bugs)
6.3
x86_64 Linux
medium Severity high
: rc
: ---
Assigned To: Igor Mammedov
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-29 08:00 EDT by Igor Mammedov
Modified: 2012-06-20 08:55 EDT (History)
9 users (show)

See Also:
Fixed In Version: seabios-0.6.1.2-17.el6
Doc Type: Bug Fix
Doc Text:
Cause: Before fix seabios used level handling of GPE events (_Lxx methods). 1. read event bit from GPE0.sts register 2. mask event in GPR0.en register 3. execute _Lxx method from bios (could take long time) 4. clear event in GPE0.sts register 5. unmask event in GPE0.en register Consequence: It was a large enough race window, to loose a hot-plug event. If a new device was hot-plugged while guest were executing a previous hot-plug event in steps 1-4, then in step 5 guest would clear a new hot-plug event and therefore loose it. Fix: This fix switches from level (_Lxx) to edge (_Exx) methods for handling hot-plug GPE. Result: Using edge GPE handling, guest reads and clears GPE0.sts register first and only then executes event method. This results in that hot-add event can't be lost because of a new hot-plug event will be set in GPE0.sts after it has been cleared or guest will handle several hot-plugged devices while executing current event method.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-20 08:55:23 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
[RHEL 6.3 qemu-kvm PATCH] Do not loose cpu-hotplug event when guest handles PRSC method (2.04 KB, patch)
2012-04-02 09:16 EDT, Igor Mammedov
no flags Details | Diff
Upstream: [PATCH] Replace level gpe event with edge gpe event for hot-plug handlers (2.02 KB, patch)
2012-04-04 07:14 EDT, Igor Mammedov
no flags Details | Diff
[RHEL6.3 seabios PATCH] Replace level gpe event with edge gpe event for hot-plug handlers (2.24 KB, patch)
2012-04-05 08:49 EDT, Igor Mammedov
no flags Details | Diff

  None (edit)
Description Igor Mammedov 2012-03-29 08:00:42 EDT
When hop-plugging multiple vcpus in one attempt,
host may not see all hot-plugged vcpus, while via
qemu-monitor all hot-plugged vcpus are visible.

Version-Release number of selected component (if applicable):
seabios-0.6.1.2-15.el6.x86_64
qemu-img-0.12.1.2-2.265.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. boot rhel6.3 guest with -smp1,maxcpus=64
2. virsh setvcpus rhel6x64kvm 64 --live

  
Actual results:
guest only sees up to ~4-20 of hot-plugged vcpus

Expected results:
guest should see all 64 vcpus

Additional info:
if delay are added between adding each vcpu then guest sees them all.
Comment 2 Igor Mammedov 2012-04-02 08:27:39 EDT
When hot-plugging multiple cpus fast enough, guest, while executing PRSC AML method, sees only first(and sometimes second) byte of cpu bitmap populated.

The reason for this is that cpu-bitmap is changing under its feet, so guest hot-plugs it have seen when PRSC was executed. After handling hot-plug event,
guest resets cpu-hotplug bit in gpe.sts and as result it wont see cpus added
between PRSC and resetting gpe.sts.
Comment 3 Igor Mammedov 2012-04-02 09:16:22 EDT
Created attachment 574501 [details]
[RHEL 6.3 qemu-kvm PATCH] Do not loose cpu-hotplug event when guest handles PRSC method
Comment 4 Igor Mammedov 2012-04-04 07:11:42 EDT
It could be fixed with a smaller patch in seabios, so moving ti seabios component.
posted upstream: http://www.seabios.org/pipermail/seabios/2012-April/003549.html
waiting for commit id before re-posting.
Comment 5 Igor Mammedov 2012-04-04 07:14:25 EDT
Created attachment 575088 [details]
Upstream: [PATCH] Replace level gpe event with edge gpe event for hot-plug  handlers
Comment 7 Igor Mammedov 2012-04-04 09:24:55 EDT
to trigger race for pci-hotadd I've used following command:
 ./QMP/qmp device_add --driver=e1000 && sleep 0.X && ./QMP/qmp device_add --driver=e1000
Comment 8 Igor Mammedov 2012-04-05 08:47:11 EDT
upstream commit 9c6635bd48d39a1d17d0a73df6e577ef6bd0037c
Comment 9 Igor Mammedov 2012-04-05 08:49:19 EDT
Created attachment 575398 [details]
[RHEL6.3 seabios PATCH] Replace level gpe event with edge gpe event for hot-plug handlers
Comment 14 Igor Mammedov 2012-04-19 12:27:56 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause:
Before fix seabios used level handling of GPE events (_Lxx methods). 
 1. read event bit from GPE0.sts register
 2. mask event in GPR0.en register
 3. execute _Lxx method from bios (could take long time)
 4. clear event in GPE0.sts register
 5. unmask event in GPE0.en register

Consequence:
It was a large enough race window, to loose a hot-plug event. If a new device was hot-plugged while guest were executing a previous hot-plug event in steps 1-4, then in step 5 guest would clear a new hot-plug event and therefore loose it.

Fix:
This fix switches from level (_Lxx) to edge (_Exx) methods for handling hot-plug GPE.

Result:
Using edge GPE handling, guest  reads and clears GPE0.sts register first and only then executes event method.
This results in that hot-add event can't be lost because of a new hot-plug event will be set in GPE0.sts after it has been cleared or guest will handle several hot-plugged devices while executing current event method.
Comment 15 FuXiangChun 2012-05-10 23:47:56 EDT
1.reproduce this issue with seabios-0.6.1.2-15.el6.x86_64

steps to reproduce 
1.1 boot rhel6.3 guest
/usr/libexec/qemu-kvm -M rhel6.3.0 -cpu host --enable-kvm -m 2048M -smp 1,maxcpus=161 -name rhel6.3 -uuid ddcbfb49-3411-1701-3c36-6bdbc00bedbb -rtc base=utc,clock=host,driftfix=slew -drive file=/home/rhel6.3-64.qcow2,if=none,id=ide,format=qcow2,cache=none,werror=stop,rerror=stop -device ide-drive,drive=ide,id=drive-ide0-0-0,bootindex=1 -netdev tap,id=hostnet1 -device e1000,netdev=hostnet1,id=net1,mac=86:12:50:a4:35:74 -spice port=5913,disable-ticketing -vga qxl -device sga -chardev socket,id=serial0,path=/var/test3,server,nowait -device isa-serial,chardev=serial0 -balloon virtio -monitor unix:/tmp/monitor3,server,nowait -monitor stdio

1.2 hotplug vcpu via script without delay
i=1
while [ $i -lt 65 ]
do
echo "cpu_set $i online"|nc -U /tmp/monitor3
i=$(($i+1))
done

1.3 check guest vcpu number

testing result:
guest only get 46 vcpu.  'info cpus' get 64 via monitor

2.verify this issue with seabios-0.6.1.2-19.el6.x86_64

steps to verify
same as above steps

testing result:
guest get 64 vcpus.

so, this bug is fixed.
Comment 18 errata-xmlrpc 2012-06-20 08:55:23 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0802.html

Note You need to log in before you can comment on or make changes to this bug.