Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 455436

Summary: x86: fix cpu hotplug crash
Product: Red Hat Enterprise Linux 5 Reporter: Prarit Bhargava <prarit>
Component: kernelAssignee: Prarit Bhargava <prarit>
Status: CLOSED WONTFIX QA Contact: Martin Jenner <mjenner>
Severity: low Docs Contact:
Priority: low    
Version: 5.3CC: dzickus
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-10-28 12:58:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 455409    
Attachments:
Description Flags
Upstream fix for this issue none

Description Prarit Bhargava 2008-07-15 14:21:10 UTC
Backport 

Gitweb:    
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fcb43042ef55d2f46b0efa5d7746967cef38f056
Commit:     fcb43042ef55d2f46b0efa5d7746967cef38f056
Parent:     0b1faeef5f9243bb5fc5713a34bbf1ceab0de562
Author:     Zhang, Yanmin <yanmin_zhang.com>
AuthorDate: Tue Jun 24 16:06:23 2008 +0800
Committer:  Ingo Molnar <mingo>
CommitDate: Mon Jun 30 13:15:43 2008 +0200

   x86: fix cpu hotplug crash
    
    Vegard Nossum reported crashes during cpu hotplug tests:
    
      http://marc.info/?l=linux-kernel&m=121413950227884&w=4
    
    In function _cpu_up, the panic happens when calling
    __raw_notifier_call_chain at the second time. Kernel doesn't panic when
    calling it at the first time. If just say because of nr_cpu_ids, that's
    not right.
    
    By checking the source code, I found that function do_boot_cpu is the culprit.
    Consider below call chain:
     _cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu.
    
    So do_boot_cpu is called in the end. In do_boot_cpu, if
    boot_error==true, cpu_clear(cpu, cpu_possible_map) is executed. So later
    on, when _cpu_up calls __raw_notifier_call_chain at the second time to
    report CPU_UP_CANCELED, because this cpu is already cleared from
    cpu_possible_map, get_cpu_sysdev returns NULL.
    
    Many resources are related to cpu_possible_map, so it's better not to
    change it.
    
    Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in
    cpu_possible_map.
    
    Signed-off-by: Zhang Yanmin <yanmin_zhang.com>
    Tested-by: Vegard Nossum <vegard.nossum>
    Acked-by: Rusty Russell <rusty.au>
    Signed-off-by: Ingo Molnar <mingo>

Comment 1 Prarit Bhargava 2008-07-15 14:21:10 UTC
Created attachment 311839 [details]
Upstream fix for this issue

Comment 2 Prarit Bhargava 2008-07-23 11:29:42 UTC
I tried virtual cpu removal and addition and it seems to work.  However, if you
remove cpu A, then B, then re-add A, then B ... you get a nasty panic very
similar to what is described in the link above.

ie) I can reproduce this issue in RHEL5.

Patch above seems to make everything happy.

P.

Comment 3 RHEL Program Management 2008-07-25 17:01:53 UTC
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".

Comment 4 Ludek Smid 2008-07-25 21:53:51 UTC
Unfortunately the previous automated notification about the
non-inclusion of this request in Red Hat Enterprise Linux 5.3 used
the wrong text template. It should have read: this request has been
reviewed by Product Management and is not planned for inclusion
in the current minor release of Red Hat Enterprise Linux.

If you would like this request to be reviewed for the next minor
release, ask your support representative to set the next rhel-x.y
flag to "?" or raise an exception.