Hide Forgot
When kvm guest uses kvmclock, it may hang on vcpu hot-plug. This is caused by an overflow in pvclock_get_nsec_offset, u64 delta = tsc - shadow->tsc_timestamp; which in turn is caused by an undefined values from percpu hv_clock that hasn't been initialized yet. Uninitialized clock on being booted cpu is accessed from start_secondary -> smp_callin -> smp_store_cpu_info -> identify_secondary_cpu -> mtrr_ap_init -> mtrr_restore -> stop_machine_from_inactive_cpu -> queue_stop_cpus_work ... -> sched_clock -> kvm_clock_read which is well before x86_cpuinit.setup_percpu_clockev call in start_secondary, where percpu clock is initialized. Upstream fix: http://www.spinics.net/lists/kvm/ How reproducible: 50-100% Steps to Reproduce: 1. hot-plug vcpu 1 in qemu 2. in guest echo 1 > /sys/devices/system/cpu/cpuX/online Actual results: guest usually hangs after printing: [ 947.071059] Booting Node 0 Processor 1 APIC 0x1 [ 947.072148] smpboot cpu 1: start_ip = 9a000 Expected results: guest should not hang when vcpu is onlined.
Correct link to upstream fix: http://www.spinics.net/lists/kvm/msg68054.html
Created attachment 560267 [details] introduce x86_cpuinit.early_percpu_clock_init hook
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
can not test it, because it's blocked by bug 562886, need cmd set_cpus to run cpu hotplug. ack it first.
I realized this is a public BZ, so I'm inlining (and condensing) target of the previously added link: Before the patch: start_secondary [arch/x86/kernel/smpboot.c] smp_callin smp_store_cpu_info identify_secondary_cpu [arch/x86/kernel/cpu/common.c] mtrr_ap_init [arch/x86/kernel/cpu/mtrr/main.c] set_mtrr_from_inactive_cpu stop_machine_from_inactive_cpu [kernel/stop_machine.c] queue_stop_cpus_work cpu_stop_queue_work wake_up_process [kernel/sched.c] try_to_wake_up activate_task enqueue_task update_rq_clock sched_clock_cpu [kernel/sched_clock.c] sched_clock_local sched_clock [arch/x86/kernel/tsc.c] paravirt_sched_clock [arch/x86/include/asm/paravirt.h] kvm_clock_read [arch/x86/kernel/kvmclock.c] (1) pvclock_clocksource_read [arch/x86/kernel/pvclock.c] pvclock_get_nsec_offset() <-- access to uninited clock sets "last_value" to huge value kvm_setup_secondary_clock() [arch/x86/kernel/kvmclock.c] (2) kvm_register_clock() setup_secondary_APIC_clock [arch/x86/kernel/apic/apic.c] setup_APIC_timer() (1) via "pv_time_ops.sched_clock", set by kvmclock_init() in [arch/x86/kernel/kvmclock.c] (2) via "x86_cpuinit.setup_percpu_clockev", set by kvmclock_init() in [arch/x86/kernel/kvmclock.c] Patch: - Adds new "early_percpu_clock_init" hook member to "x86_cpuinit_ops" struct type. - New "x86_cpuinit.early_percpu_clock_init" defaults to x86_init_noop(). - kvmclock_init() overrides the new hook to kvm_setup_secondary_clock(), *leaves* old hook ("setup_percpu_clockev") at the default setup_secondary_APIC_clock(). - The patch removes the setup_secondary_APIC_clock() invocation from kvm_setup_secondary_clock(). - start_secondary() calls the new hook (x86_cpuinit.early_percpu_clock_init) before smp_callin(). New call tree on the bare metal: - the new hook defaults to no-op. - the patch doesn't change how the pre-existent hook is set up on the bare-metal. New call tree in KVM guest: start_secondary [arch/x86/kernel/smpboot.c] kvm_setup_secondary_clock [arch/x86/kernel/kvmclock.c] (1) kvm_register_clock smp_callin [arch/x86/kernel/smpboot.c] smp_store_cpu_info identify_secondary_cpu [arch/x86/kernel/cpu/common.c] mtrr_ap_init [arch/x86/kernel/cpu/mtrr/main.c] set_mtrr_from_inactive_cpu stop_machine_from_inactive_cpu [kernel/stop_machine.c] queue_stop_cpus_work cpu_stop_queue_work wake_up_process [kernel/sched.c] try_to_wake_up activate_task enqueue_task update_rq_clock sched_clock_cpu [kernel/sched_clock.c] sched_clock_local sched_clock [arch/x86/kernel/tsc.c] paravirt_sched_clock [arch/x86/include/asm/paravirt.h] kvm_clock_read [arch/x86/kernel/kvmclock.c] (1) pvclock_clocksource_read [arch/x86/kernel/pvclock.c] pvclock_get_nsec_offset() <-- clock already inited setup_secondary_APIC_clock [arch/x86/kernel/apic/apic.c] (2) setup_APIC_timer() (1) via new early hook (2) via preexistent hook which now has a different (= default) value.
Patch(es) available on kernel-2.6.32-235.el6
reproduce this issue with kernel 2.6.32-220.el6.x86_64 steps to reproduce: 1.boot a guest /usr/libexec/qemu-kvm -M rhel6.2.0 -m 2048 -smp 1,sockets=1,cores=1,threads=1,maxcpus=6 -enable-kvm -uuid 4541c99e-efbe-4624-beb0-13ca5193fc79 -k en-us -drive file=/home/RHEL-Server-6.2-64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=koTUXQrb,cache=none,werror=stop,rerror=stop,aio=native -device ide-drive,bus=ide.0,unit=1,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -net none -monitor stdio -vnc :1 -serial unix:/home/unix.socket,server,nowait 2. hot plug a vcpu via monitor (qemu) cpu_set 1 online 3. check guest actual result: guest hang (qemu) info status VM status: paused verify this issue with kernel 2.6.32-270.el6.x86_64 repeat step1 step2 and step3 actual result: guest work well, and hotplug vcpu successful so this bug is fixed
Moving to VERIFIED as per Comment #10
*** Bug 799180 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0862.html
*** Bug 832946 has been marked as a duplicate of this bug. ***
*** Bug 831899 has been marked as a duplicate of this bug. ***
*** Bug 970968 has been marked as a duplicate of this bug. ***