Bug 788562

Summary: kvm guest hangs when hot-plugged vcpu is onlined due to uninitialized hv_clock
Product: Red Hat Enterprise Linux 6 Reporter: Igor Mammedov <imammedo>
Component: kernelAssignee: Igor Mammedov <imammedo>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.3CC: chayang, drjones, dyuan, juzhang, kzhang, lersek, mzhan, shuang, sluo, uobergfe, xfu, ydu, yunzheng, yupzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-2.6.32-235.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 08:23:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
introduce x86_cpuinit.early_percpu_clock_init hook none

Description Igor Mammedov 2012-02-08 13:38:35 UTC
When kvm guest uses kvmclock, it may hang on vcpu hot-plug.
This is caused by an overflow in pvclock_get_nsec_offset,

    u64 delta = tsc - shadow->tsc_timestamp;

which in turn is caused by an undefined values from percpu
hv_clock that hasn't been initialized yet.
Uninitialized clock on being booted cpu is accessed from
   start_secondary
    -> smp_callin
      ->  smp_store_cpu_info
        -> identify_secondary_cpu
          -> mtrr_ap_init
            -> mtrr_restore
              -> stop_machine_from_inactive_cpu
                -> queue_stop_cpus_work
                  ...
                    -> sched_clock
                      -> kvm_clock_read
which is well before x86_cpuinit.setup_percpu_clockev call in
start_secondary, where percpu clock is initialized.

Upstream fix: http://www.spinics.net/lists/kvm/

How reproducible:
50-100%

Steps to Reproduce:
1. hot-plug vcpu 1 in qemu
2. in guest
  echo 1 > /sys/devices/system/cpu/cpuX/online

  
Actual results:

guest usually hangs after printing:

   [  947.071059] Booting Node 0 Processor 1 APIC 0x1
   [  947.072148] smpboot cpu 1: start_ip = 9a000


Expected results:

guest should not hang when vcpu is onlined.

Comment 1 Igor Mammedov 2012-02-08 13:53:24 UTC
Correct link to upstream fix: http://www.spinics.net/lists/kvm/msg68054.html

Comment 2 Igor Mammedov 2012-02-08 14:00:24 UTC
Created attachment 560267 [details]
introduce x86_cpuinit.early_percpu_clock_init hook

Comment 3 RHEL Program Management 2012-02-08 18:14:46 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 4 Suqin Huang 2012-02-13 08:39:48 UTC
can not test it, because it's blocked by bug 562886, need cmd set_cpus to run cpu hotplug. 

ack it first.

Comment 6 Laszlo Ersek 2012-02-13 16:27:25 UTC
I realized this is a public BZ, so I'm inlining (and condensing) target of the
previously added link:

Before the patch:

start_secondary [arch/x86/kernel/smpboot.c]
 smp_callin
  smp_store_cpu_info
   identify_secondary_cpu [arch/x86/kernel/cpu/common.c]
    mtrr_ap_init [arch/x86/kernel/cpu/mtrr/main.c]
     set_mtrr_from_inactive_cpu
      stop_machine_from_inactive_cpu [kernel/stop_machine.c]
       queue_stop_cpus_work
        cpu_stop_queue_work
         wake_up_process [kernel/sched.c]
          try_to_wake_up
           activate_task
            enqueue_task
             update_rq_clock
              sched_clock_cpu [kernel/sched_clock.c]
               sched_clock_local
                sched_clock [arch/x86/kernel/tsc.c]
                 paravirt_sched_clock [arch/x86/include/asm/paravirt.h]
                  kvm_clock_read [arch/x86/kernel/kvmclock.c] (1)
                   pvclock_clocksource_read [arch/x86/kernel/pvclock.c]
                    pvclock_get_nsec_offset() <-- access to uninited clock
                    sets "last_value" to huge value
 kvm_setup_secondary_clock() [arch/x86/kernel/kvmclock.c] (2)
  kvm_register_clock()
  setup_secondary_APIC_clock [arch/x86/kernel/apic/apic.c]
   setup_APIC_timer()

(1) via "pv_time_ops.sched_clock", set by kvmclock_init() in
    [arch/x86/kernel/kvmclock.c]
(2) via "x86_cpuinit.setup_percpu_clockev", set by kvmclock_init() in
    [arch/x86/kernel/kvmclock.c]

Patch:

- Adds new "early_percpu_clock_init" hook member to "x86_cpuinit_ops"
  struct type.

- New "x86_cpuinit.early_percpu_clock_init" defaults to x86_init_noop().

- kvmclock_init() overrides the new hook to kvm_setup_secondary_clock(),
  *leaves* old hook ("setup_percpu_clockev") at the default
  setup_secondary_APIC_clock().

- The patch removes the setup_secondary_APIC_clock() invocation from
  kvm_setup_secondary_clock().

- start_secondary() calls the new hook
  (x86_cpuinit.early_percpu_clock_init) before smp_callin().

New call tree on the bare metal:

- the new hook defaults to no-op.

- the patch doesn't change how the pre-existent hook is set up on the
  bare-metal.

New call tree in KVM guest:

start_secondary [arch/x86/kernel/smpboot.c]
 kvm_setup_secondary_clock [arch/x86/kernel/kvmclock.c] (1)
  kvm_register_clock
 smp_callin [arch/x86/kernel/smpboot.c]
  smp_store_cpu_info
   identify_secondary_cpu [arch/x86/kernel/cpu/common.c]
    mtrr_ap_init [arch/x86/kernel/cpu/mtrr/main.c]
     set_mtrr_from_inactive_cpu
      stop_machine_from_inactive_cpu [kernel/stop_machine.c]
       queue_stop_cpus_work
        cpu_stop_queue_work
         wake_up_process [kernel/sched.c]
          try_to_wake_up
           activate_task
            enqueue_task
             update_rq_clock
              sched_clock_cpu [kernel/sched_clock.c]
               sched_clock_local
                sched_clock [arch/x86/kernel/tsc.c]
                 paravirt_sched_clock [arch/x86/include/asm/paravirt.h]
                  kvm_clock_read [arch/x86/kernel/kvmclock.c] (1)
                   pvclock_clocksource_read [arch/x86/kernel/pvclock.c]
                    pvclock_get_nsec_offset() <-- clock already inited
 setup_secondary_APIC_clock [arch/x86/kernel/apic/apic.c] (2)
  setup_APIC_timer()

(1) via new early hook
(2) via preexistent hook which now has a different (= default) value.

Comment 7 Aristeu Rozanski 2012-02-21 19:49:22 UTC
Patch(es) available on kernel-2.6.32-235.el6

Comment 10 FuXiangChun 2012-05-18 02:01:16 UTC
reproduce this issue with kernel 2.6.32-220.el6.x86_64

steps to reproduce:
1.boot a guest
 /usr/libexec/qemu-kvm -M rhel6.2.0 -m 2048 -smp 1,sockets=1,cores=1,threads=1,maxcpus=6 -enable-kvm -uuid 4541c99e-efbe-4624-beb0-13ca5193fc79 -k en-us -drive file=/home/RHEL-Server-6.2-64.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,serial=koTUXQrb,cache=none,werror=stop,rerror=stop,aio=native -device ide-drive,bus=ide.0,unit=1,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -net none -monitor stdio -vnc :1 -serial unix:/home/unix.socket,server,nowait

2. hot plug a vcpu via monitor
(qemu) cpu_set 1 online

3. check guest

actual result:
 guest hang
(qemu) info status
VM status: paused


verify this issue with kernel 2.6.32-270.el6.x86_64

repeat step1 step2 and step3

actual result:
guest work well, and hotplug vcpu successful

so this bug is fixed

Comment 11 Chao Yang 2012-05-18 02:29:54 UTC
Moving to VERIFIED as per Comment #10

Comment 12 Igor Mammedov 2012-05-23 21:31:12 UTC
*** Bug 799180 has been marked as a duplicate of this bug. ***

Comment 14 errata-xmlrpc 2012-06-20 08:23:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0862.html

Comment 15 Igor Mammedov 2012-06-22 09:08:23 UTC
*** Bug 832946 has been marked as a duplicate of this bug. ***

Comment 16 Igor Mammedov 2012-06-25 09:06:43 UTC
*** Bug 831899 has been marked as a duplicate of this bug. ***

Comment 17 Igor Mammedov 2013-06-12 11:31:06 UTC
*** Bug 970968 has been marked as a duplicate of this bug. ***