Bug 455436 - x86: fix cpu hotplug crash
x86: fix cpu hotplug crash
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
All Linux
low Severity low
: rc
: ---
Assigned To: Prarit Bhargava
Martin Jenner
:
Depends On:
Blocks: 455409
  Show dependency treegraph
 
Reported: 2008-07-15 10:21 EDT by Prarit Bhargava
Modified: 2008-10-28 08:58 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-10-28 08:58:05 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Upstream fix for this issue (4.91 KB, patch)
2008-07-15 10:21 EDT, Prarit Bhargava
no flags Details | Diff

  None (edit)
Description Prarit Bhargava 2008-07-15 10:21:10 EDT
Backport 

Gitweb:    
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fcb43042ef55d2f46b0efa5d7746967cef38f056
Commit:     fcb43042ef55d2f46b0efa5d7746967cef38f056
Parent:     0b1faeef5f9243bb5fc5713a34bbf1ceab0de562
Author:     Zhang, Yanmin <yanmin_zhang@linux.intel.com>
AuthorDate: Tue Jun 24 16:06:23 2008 +0800
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon Jun 30 13:15:43 2008 +0200

   x86: fix cpu hotplug crash
    
    Vegard Nossum reported crashes during cpu hotplug tests:
    
      http://marc.info/?l=linux-kernel&m=121413950227884&w=4
    
    In function _cpu_up, the panic happens when calling
    __raw_notifier_call_chain at the second time. Kernel doesn't panic when
    calling it at the first time. If just say because of nr_cpu_ids, that's
    not right.
    
    By checking the source code, I found that function do_boot_cpu is the culprit.
    Consider below call chain:
     _cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.
    
    So do_boot_cpu is called in the end. In do_boot_cpu, if
    boot_error==true, cpu_clear(cpu, cpu_possible_map) is executed. So later
    on, when _cpu_up calls __raw_notifier_call_chain at the second time to
    report CPU_UP_CANCELED, because this cpu is already cleared from
    cpu_possible_map, get_cpu_sysdev returns NULL.
    
    Many resources are related to cpu_possible_map, so it's better not to
    change it.
    
    Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in
    cpu_possible_map.
    
    Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
    Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
    Acked-by: Rusty Russell <rusty@rustcorp.com.au>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
Comment 1 Prarit Bhargava 2008-07-15 10:21:10 EDT
Created attachment 311839 [details]
Upstream fix for this issue
Comment 2 Prarit Bhargava 2008-07-23 07:29:42 EDT
I tried virtual cpu removal and addition and it seems to work.  However, if you
remove cpu A, then B, then re-add A, then B ... you get a nasty panic very
similar to what is described in the link above.

ie) I can reproduce this issue in RHEL5.

Patch above seems to make everything happy.

P.
Comment 3 RHEL Product and Program Management 2008-07-25 13:01:53 EDT
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".
Comment 4 Ludek Smid 2008-07-25 17:53:51 EDT
Unfortunately the previous automated notification about the
non-inclusion of this request in Red Hat Enterprise Linux 5.3 used
the wrong text template. It should have read: this request has been
reviewed by Product Management and is not planned for inclusion
in the current minor release of Red Hat Enterprise Linux.

If you would like this request to be reviewed for the next minor
release, ask your support representative to set the next rhel-x.y
flag to "?" or raise an exception.

Note You need to log in before you can comment on or make changes to this bug.