Bug 808033 - kvm guest doesn't see all hotplugged vcpus when 'virsh setvcpus 64 --live ' or hot-plugged devices when they added fast enough
Summary: kvm guest doesn't see all hotplugged vcpus when 'virsh setvcpus 64 --live ' o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: seabios
Version: 6.3
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Igor Mammedov
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-29 12:00 UTC by Igor Mammedov
Modified: 2012-06-20 12:55 UTC (History)
9 users (show)

Fixed In Version: seabios-0.6.1.2-17.el6
Doc Type: Bug Fix
Doc Text:
Cause: Before fix seabios used level handling of GPE events (_Lxx methods). 1. read event bit from GPE0.sts register 2. mask event in GPR0.en register 3. execute _Lxx method from bios (could take long time) 4. clear event in GPE0.sts register 5. unmask event in GPE0.en register Consequence: It was a large enough race window, to loose a hot-plug event. If a new device was hot-plugged while guest were executing a previous hot-plug event in steps 1-4, then in step 5 guest would clear a new hot-plug event and therefore loose it. Fix: This fix switches from level (_Lxx) to edge (_Exx) methods for handling hot-plug GPE. Result: Using edge GPE handling, guest reads and clears GPE0.sts register first and only then executes event method. This results in that hot-add event can't be lost because of a new hot-plug event will be set in GPE0.sts after it has been cleared or guest will handle several hot-plugged devices while executing current event method.
Clone Of:
Environment:
Last Closed: 2012-06-20 12:55:23 UTC
Target Upstream Version:


Attachments (Terms of Use)
[RHEL 6.3 qemu-kvm PATCH] Do not loose cpu-hotplug event when guest handles PRSC method (2.04 KB, patch)
2012-04-02 13:16 UTC, Igor Mammedov
no flags Details | Diff
Upstream: [PATCH] Replace level gpe event with edge gpe event for hot-plug handlers (2.02 KB, patch)
2012-04-04 11:14 UTC, Igor Mammedov
no flags Details | Diff
[RHEL6.3 seabios PATCH] Replace level gpe event with edge gpe event for hot-plug handlers (2.24 KB, patch)
2012-04-05 12:49 UTC, Igor Mammedov
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2012:0802 0 normal SHIPPED_LIVE seabios bug fix and enhancement update 2012-06-19 19:51:36 UTC

Description Igor Mammedov 2012-03-29 12:00:42 UTC
When hop-plugging multiple vcpus in one attempt,
host may not see all hot-plugged vcpus, while via
qemu-monitor all hot-plugged vcpus are visible.

Version-Release number of selected component (if applicable):
seabios-0.6.1.2-15.el6.x86_64
qemu-img-0.12.1.2-2.265.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. boot rhel6.3 guest with -smp1,maxcpus=64
2. virsh setvcpus rhel6x64kvm 64 --live

  
Actual results:
guest only sees up to ~4-20 of hot-plugged vcpus

Expected results:
guest should see all 64 vcpus

Additional info:
if delay are added between adding each vcpu then guest sees them all.

Comment 2 Igor Mammedov 2012-04-02 12:27:39 UTC
When hot-plugging multiple cpus fast enough, guest, while executing PRSC AML method, sees only first(and sometimes second) byte of cpu bitmap populated.

The reason for this is that cpu-bitmap is changing under its feet, so guest hot-plugs it have seen when PRSC was executed. After handling hot-plug event,
guest resets cpu-hotplug bit in gpe.sts and as result it wont see cpus added
between PRSC and resetting gpe.sts.

Comment 3 Igor Mammedov 2012-04-02 13:16:22 UTC
Created attachment 574501 [details]
[RHEL 6.3 qemu-kvm PATCH] Do not loose cpu-hotplug event when guest handles PRSC method

Comment 4 Igor Mammedov 2012-04-04 11:11:42 UTC
It could be fixed with a smaller patch in seabios, so moving ti seabios component.
posted upstream: http://www.seabios.org/pipermail/seabios/2012-April/003549.html
waiting for commit id before re-posting.

Comment 5 Igor Mammedov 2012-04-04 11:14:25 UTC
Created attachment 575088 [details]
Upstream: [PATCH] Replace level gpe event with edge gpe event for hot-plug  handlers

Comment 7 Igor Mammedov 2012-04-04 13:24:55 UTC
to trigger race for pci-hotadd I've used following command:
 ./QMP/qmp device_add --driver=e1000 && sleep 0.X && ./QMP/qmp device_add --driver=e1000

Comment 8 Igor Mammedov 2012-04-05 12:47:11 UTC
upstream commit 9c6635bd48d39a1d17d0a73df6e577ef6bd0037c

Comment 9 Igor Mammedov 2012-04-05 12:49:19 UTC
Created attachment 575398 [details]
[RHEL6.3 seabios PATCH] Replace level gpe event with edge gpe event for hot-plug handlers

Comment 14 Igor Mammedov 2012-04-19 16:27:56 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause:
Before fix seabios used level handling of GPE events (_Lxx methods). 
 1. read event bit from GPE0.sts register
 2. mask event in GPR0.en register
 3. execute _Lxx method from bios (could take long time)
 4. clear event in GPE0.sts register
 5. unmask event in GPE0.en register

Consequence:
It was a large enough race window, to loose a hot-plug event. If a new device was hot-plugged while guest were executing a previous hot-plug event in steps 1-4, then in step 5 guest would clear a new hot-plug event and therefore loose it.

Fix:
This fix switches from level (_Lxx) to edge (_Exx) methods for handling hot-plug GPE.

Result:
Using edge GPE handling, guest  reads and clears GPE0.sts register first and only then executes event method.
This results in that hot-add event can't be lost because of a new hot-plug event will be set in GPE0.sts after it has been cleared or guest will handle several hot-plugged devices while executing current event method.

Comment 15 FuXiangChun 2012-05-11 03:47:56 UTC
1.reproduce this issue with seabios-0.6.1.2-15.el6.x86_64

steps to reproduce 
1.1 boot rhel6.3 guest
/usr/libexec/qemu-kvm -M rhel6.3.0 -cpu host --enable-kvm -m 2048M -smp 1,maxcpus=161 -name rhel6.3 -uuid ddcbfb49-3411-1701-3c36-6bdbc00bedbb -rtc base=utc,clock=host,driftfix=slew -drive file=/home/rhel6.3-64.qcow2,if=none,id=ide,format=qcow2,cache=none,werror=stop,rerror=stop -device ide-drive,drive=ide,id=drive-ide0-0-0,bootindex=1 -netdev tap,id=hostnet1 -device e1000,netdev=hostnet1,id=net1,mac=86:12:50:a4:35:74 -spice port=5913,disable-ticketing -vga qxl -device sga -chardev socket,id=serial0,path=/var/test3,server,nowait -device isa-serial,chardev=serial0 -balloon virtio -monitor unix:/tmp/monitor3,server,nowait -monitor stdio

1.2 hotplug vcpu via script without delay
i=1
while [ $i -lt 65 ]
do
echo "cpu_set $i online"|nc -U /tmp/monitor3
i=$(($i+1))
done

1.3 check guest vcpu number

testing result:
guest only get 46 vcpu.  'info cpus' get 64 via monitor

2.verify this issue with seabios-0.6.1.2-19.el6.x86_64

steps to verify
same as above steps

testing result:
guest get 64 vcpus.

so, this bug is fixed.

Comment 18 errata-xmlrpc 2012-06-20 12:55:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0802.html


Note You need to log in before you can comment on or make changes to this bug.