Hide Forgot
When hop-plugging multiple vcpus in one attempt, host may not see all hot-plugged vcpus, while via qemu-monitor all hot-plugged vcpus are visible. Version-Release number of selected component (if applicable): seabios-0.6.1.2-15.el6.x86_64 qemu-img-0.12.1.2-2.265.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. boot rhel6.3 guest with -smp1,maxcpus=64 2. virsh setvcpus rhel6x64kvm 64 --live Actual results: guest only sees up to ~4-20 of hot-plugged vcpus Expected results: guest should see all 64 vcpus Additional info: if delay are added between adding each vcpu then guest sees them all.
When hot-plugging multiple cpus fast enough, guest, while executing PRSC AML method, sees only first(and sometimes second) byte of cpu bitmap populated. The reason for this is that cpu-bitmap is changing under its feet, so guest hot-plugs it have seen when PRSC was executed. After handling hot-plug event, guest resets cpu-hotplug bit in gpe.sts and as result it wont see cpus added between PRSC and resetting gpe.sts.
Created attachment 574501 [details] [RHEL 6.3 qemu-kvm PATCH] Do not loose cpu-hotplug event when guest handles PRSC method
It could be fixed with a smaller patch in seabios, so moving ti seabios component. posted upstream: http://www.seabios.org/pipermail/seabios/2012-April/003549.html waiting for commit id before re-posting.
Created attachment 575088 [details] Upstream: [PATCH] Replace level gpe event with edge gpe event for hot-plug handlers
to trigger race for pci-hotadd I've used following command: ./QMP/qmp device_add --driver=e1000 && sleep 0.X && ./QMP/qmp device_add --driver=e1000
upstream commit 9c6635bd48d39a1d17d0a73df6e577ef6bd0037c
Created attachment 575398 [details] [RHEL6.3 seabios PATCH] Replace level gpe event with edge gpe event for hot-plug handlers
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Before fix seabios used level handling of GPE events (_Lxx methods). 1. read event bit from GPE0.sts register 2. mask event in GPR0.en register 3. execute _Lxx method from bios (could take long time) 4. clear event in GPE0.sts register 5. unmask event in GPE0.en register Consequence: It was a large enough race window, to loose a hot-plug event. If a new device was hot-plugged while guest were executing a previous hot-plug event in steps 1-4, then in step 5 guest would clear a new hot-plug event and therefore loose it. Fix: This fix switches from level (_Lxx) to edge (_Exx) methods for handling hot-plug GPE. Result: Using edge GPE handling, guest reads and clears GPE0.sts register first and only then executes event method. This results in that hot-add event can't be lost because of a new hot-plug event will be set in GPE0.sts after it has been cleared or guest will handle several hot-plugged devices while executing current event method.
1.reproduce this issue with seabios-0.6.1.2-15.el6.x86_64 steps to reproduce 1.1 boot rhel6.3 guest /usr/libexec/qemu-kvm -M rhel6.3.0 -cpu host --enable-kvm -m 2048M -smp 1,maxcpus=161 -name rhel6.3 -uuid ddcbfb49-3411-1701-3c36-6bdbc00bedbb -rtc base=utc,clock=host,driftfix=slew -drive file=/home/rhel6.3-64.qcow2,if=none,id=ide,format=qcow2,cache=none,werror=stop,rerror=stop -device ide-drive,drive=ide,id=drive-ide0-0-0,bootindex=1 -netdev tap,id=hostnet1 -device e1000,netdev=hostnet1,id=net1,mac=86:12:50:a4:35:74 -spice port=5913,disable-ticketing -vga qxl -device sga -chardev socket,id=serial0,path=/var/test3,server,nowait -device isa-serial,chardev=serial0 -balloon virtio -monitor unix:/tmp/monitor3,server,nowait -monitor stdio 1.2 hotplug vcpu via script without delay i=1 while [ $i -lt 65 ] do echo "cpu_set $i online"|nc -U /tmp/monitor3 i=$(($i+1)) done 1.3 check guest vcpu number testing result: guest only get 46 vcpu. 'info cpus' get 64 via monitor 2.verify this issue with seabios-0.6.1.2-19.el6.x86_64 steps to verify same as above steps testing result: guest get 64 vcpus. so, this bug is fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0802.html