Login
[x]
Log in using an account from:
Fedora Account System
Red Hat Associate
Red Hat Customer
Or login using a Red Hat Bugzilla account
Forgot Password
Login:
Hide Forgot
Create an Account
Red Hat Bugzilla – Attachment 870935 Details for
Bug 968147
enable online multiple hot-added CPUs cause RHEL7.0 guest hang(soft lockup)
[?]
New
Simple Search
Advanced Search
My Links
Browse
Requests
Reports
Current State
Search
Tabular reports
Graphical reports
Duplicates
Other Reports
User Changes
Plotly Reports
Bug Status
Bug Severity
Non-Defaults
|
Product Dashboard
Help
Page Help!
Bug Writing Guidelines
What's new
Browser Support Policy
5.0.4.rh83 Release notes
FAQ
Guides index
User guide
Web Services
Contact
Legal
This site requires JavaScript to be enabled to function correctly, please enable it.
[patch]
[RHEL 7.0 PATCH] abort secondary CPU bring-up gracefully if do_boot_cpu timed out on cpu_callin_mask
0001-abort-secondary-CPU-bring-up-gracefully-if-do_boot_c.patch (text/plain), 7.16 KB, created by
Igor Mammedov
on 2014-03-05 12:34:09 UTC
(
hide
)
Description:
[RHEL 7.0 PATCH] abort secondary CPU bring-up gracefully if do_boot_cpu timed out on cpu_callin_mask
Filename:
MIME Type:
Creator:
Igor Mammedov
Created:
2014-03-05 12:34:09 UTC
Size:
7.16 KB
patch
obsolete
>From 04ecaf460b5f40783c93cd83bae68c7a4b44d0c8 Mon Sep 17 00:00:00 2001 >From: Igor Mammedov <imammedo@redhat.com> >Date: Thu, 27 Feb 2014 17:12:48 +0100 >Subject: [RHEL 7.0 PATCH] abort secondary CPU bring-up gracefully if do_boot_cpu timed out on cpu_callin_mask > >Brew: https://brewweb.devel.redhat.com/taskinfo?taskID=7122768 >Bugzilla: 968147 >Test status: > boot tested and passed overnight cpu online/offline test on > overcommitted host >Upstream: n/a, RHEL only path >Forward-port of RHEL6 only path: ccda2747245997d79a3e110cfc72de388fb9361c > >Master CPU may timeout before cpu_callin_mask is set and cancel >booting CPU, but being onlined CPU still continues to boot, sets >cpu_active_mask (CPU_STARTING notifiers) and spins in >check_tsc_sync_target() for master cpu to arrive. Following attempt >to online another cpu hangs in stop_machine, initiated from here: > >smp_callin -> > smp_store_cpu_info -> > identify_secondary_cpu -> > mtrr_ap_init -> set_mtrr_from_inactive_cpu > >stop_machine waits on completion of stop_work on all CPUs from >cpu_active_mask including a failed CPU that spins in check_tsc_sync_target(). > >Issue is fixed if being onlined CPU continues to boot and calls >notify_cpu_starting(cpuid) only when master CPU waits for it to >come online. If master CPU times out on cpu_callin_mask and goes via >cancel path, the being onlined CPU should gracefully shutdown itself. > >Patch introduces cpu_may_complete_boot_mask to notify a being onlined >CPU that it may call notify_cpu_starting(cpuid) and continue to boot >when master CPU goes via normal boot path and going to wait till the >being onlined CPU completes its initialization. > >- normal boot sequence will look like: > master CPU1 being onlined CPU2 > > * wait for CPU2 in cpu_callin_mask >--------------------------------------------------------------------- > * set CPU2 in cpu_callin_mask > * wait till CPU1 set CPU2 bit > in cpu_may_complete_boot_mask >--------------------------------------------------------------------- > * set CPU2 bit in > cpu_may_complete_boot_mask > * return from do_boot_cpu() and > wait in > - check_tsc_sync_source() or > - while (!cpu_online(CPU2)) >--------------------------------------------------------------------- > * call notify_cpu_starting() > and continue CPU2 initialization > * mark itself as ONLINE >--------------------------------------------------------------------- > * return to _cpu_up and call > cpu_notify(CPU_ONLINE, ...) > >- cancel/error path will look like: > master CPU1 being onlined CPU2 > > * time out on cpu_callin_mask >--------------------------------------------------------------------- > * set CPU2 in cpu_callin_mask > * wait till CPU2 is set in > cpu_may_complete_boot_mask or > cleared in cpu_callout_mask >--------------------------------------------------------------------- > * clear CPU2 in cpu_callout_mask > and return with error >--------------------------------------------------------------------- > * do cleanups and play_dead() > >Signed-off-by: Igor Mammedov <imammedo@redhat.com> >--- > arch/x86/include/asm/cpumask.h | 1 + > arch/x86/kernel/cpu/common.c | 2 ++ > arch/x86/kernel/smpboot.c | 37 +++++++++++++++++++++++++++++++++++-- > 3 files changed, 38 insertions(+), 2 deletions(-) > >diff --git a/arch/x86/include/asm/cpumask.h b/arch/x86/include/asm/cpumask.h >index 61c852f..eacd269 100644 >--- a/arch/x86/include/asm/cpumask.h >+++ b/arch/x86/include/asm/cpumask.h >@@ -7,6 +7,7 @@ extern cpumask_var_t cpu_callin_mask; > extern cpumask_var_t cpu_callout_mask; > extern cpumask_var_t cpu_initialized_mask; > extern cpumask_var_t cpu_sibling_setup_mask; >+extern cpumask_var_t cpu_may_complete_boot_mask; > > extern void setup_cpu_local_masks(void); > >diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c >index d9c8c09..ef31dbe 100644 >--- a/arch/x86/kernel/cpu/common.c >+++ b/arch/x86/kernel/cpu/common.c >@@ -50,6 +50,7 @@ > cpumask_var_t cpu_initialized_mask; > cpumask_var_t cpu_callout_mask; > cpumask_var_t cpu_callin_mask; >+cpumask_var_t cpu_may_complete_boot_mask; > > /* representing cpus for which sibling maps can be computed */ > cpumask_var_t cpu_sibling_setup_mask; >@@ -61,6 +62,7 @@ void __init setup_cpu_local_masks(void) > alloc_bootmem_cpumask_var(&cpu_callin_mask); > alloc_bootmem_cpumask_var(&cpu_callout_mask); > alloc_bootmem_cpumask_var(&cpu_sibling_setup_mask); >+ alloc_bootmem_cpumask_var(&cpu_may_complete_boot_mask); > } > > static void __cpuinit default_init(struct cpuinfo_x86 *c) >diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c >index 2053607..eafd369 100644 >--- a/arch/x86/kernel/smpboot.c >+++ b/arch/x86/kernel/smpboot.c >@@ -128,6 +128,8 @@ EXPORT_PER_CPU_SYMBOL(rh_cpu_info); > > atomic_t init_deasserted; > >+static void remove_siblinginfo(int cpu); >+ > /* > * Report back to the Boot Processor during boot time or to the caller processor > * during CPU online. >@@ -226,12 +228,38 @@ static void __cpuinit smp_callin(void) > set_cpu_sibling_map(raw_smp_processor_id()); > wmb(); > >- notify_cpu_starting(cpuid); >- > /* > * Allow the master to continue. > */ > cpumask_set_cpu(cpuid, cpu_callin_mask); >+ >+ /* >+ * Wait for signal from master CPU to continue or abort. >+ */ >+ while (!cpumask_test_cpu(cpuid, cpu_may_complete_boot_mask) && >+ cpumask_test_cpu(cpuid, cpu_callout_mask)) { >+ cpu_relax(); >+ } >+ >+ /* die if master cancelled cpu_up */ >+ if (!cpumask_test_cpu(cpuid, cpu_may_complete_boot_mask)) >+ goto die; >+ >+ notify_cpu_starting(cpuid); >+ return; >+ >+die: >+#ifdef CONFIG_HOTPLUG_CPU >+ /* was set by smp_store_cpu_info->...->numa_add_cpu */ >+ numa_remove_cpu(cpuid); >+ remove_siblinginfo(cpuid); >+ clear_local_APIC(); >+ /* was set by cpu_init() */ >+ cpumask_clear_cpu(cpuid, cpu_initialized_mask); >+ cpumask_clear_cpu(cpuid, cpu_callin_mask); >+ play_dead(); >+#endif >+ panic("%s: Failed to online CPU%d!\n", __func__, cpuid); > } > > static int cpu0_logical_apicid; >@@ -826,6 +854,8 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle) > } > > if (cpumask_test_cpu(cpu, cpu_callin_mask)) { >+ /* Signal AP that it may continue to boot */ >+ cpumask_set_cpu(cpu, cpu_may_complete_boot_mask); > print_cpu_msr(&cpu_data(cpu)); > pr_debug("CPU%d: has booted.\n", cpu); > } else { >@@ -1084,6 +1114,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus) > */ > smp_store_boot_cpu_info(); /* Final full version of the data */ > cpumask_copy(cpu_callin_mask, cpumask_of(0)); >+ cpumask_copy(cpu_may_complete_boot_mask, cpumask_of(0)); > mb(); > > current_thread_info()->cpu = 0; /* needed? */ >@@ -1293,6 +1324,8 @@ static void __ref remove_cpu_from_maps(int cpu) > cpumask_clear_cpu(cpu, cpu_callin_mask); > /* was set by cpu_init() */ > cpumask_clear_cpu(cpu, cpu_initialized_mask); >+ /* set by do_boot_cpu() */ >+ cpumask_clear_cpu(cpu, cpu_may_complete_boot_mask); > numa_remove_cpu(cpu); > } > >-- >1.7.1 >
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Diff
View Attachment As Raw
Actions:
View
|
Diff
Attachments on
bug 968147
:
754159
|
870935
|
936529
|
938465
|
938466
|
938468
|
938469