Backport Gitweb: http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fcb43042ef55d2f46b0efa5d7746967cef38f056 Commit: fcb43042ef55d2f46b0efa5d7746967cef38f056 Parent: 0b1faeef5f9243bb5fc5713a34bbf1ceab0de562 Author: Zhang, Yanmin <yanmin_zhang.com> AuthorDate: Tue Jun 24 16:06:23 2008 +0800 Committer: Ingo Molnar <mingo> CommitDate: Mon Jun 30 13:15:43 2008 +0200 x86: fix cpu hotplug crash Vegard Nossum reported crashes during cpu hotplug tests: http://marc.info/?l=linux-kernel&m=121413950227884&w=4 In function _cpu_up, the panic happens when calling __raw_notifier_call_chain at the second time. Kernel doesn't panic when calling it at the first time. If just say because of nr_cpu_ids, that's not right. By checking the source code, I found that function do_boot_cpu is the culprit. Consider below call chain: _cpu_up=>__cpu_up=>smp_ops.cpu_up=>native_cpu_up=>do_boot_cpu. So do_boot_cpu is called in the end. In do_boot_cpu, if boot_error==true, cpu_clear(cpu, cpu_possible_map) is executed. So later on, when _cpu_up calls __raw_notifier_call_chain at the second time to report CPU_UP_CANCELED, because this cpu is already cleared from cpu_possible_map, get_cpu_sysdev returns NULL. Many resources are related to cpu_possible_map, so it's better not to change it. Below patch against 2.6.26-rc7 fixes it by removing the bit clearing in cpu_possible_map. Signed-off-by: Zhang Yanmin <yanmin_zhang.com> Tested-by: Vegard Nossum <vegard.nossum> Acked-by: Rusty Russell <rusty.au> Signed-off-by: Ingo Molnar <mingo>
Created attachment 311839 [details] Upstream fix for this issue
I tried virtual cpu removal and addition and it seems to work. However, if you remove cpu A, then B, then re-add A, then B ... you get a nasty panic very similar to what is described in the link above. ie) I can reproduce this issue in RHEL5. Patch above seems to make everything happy. P.
This request was evaluated by Red Hat Product Management for inclusion, but this component is not scheduled to be updated in the current Red Hat Enterprise Linux release. If you would like this request to be reviewed for the next minor release, ask your support representative to set the next rhel-x.y flag to "?".
Unfortunately the previous automated notification about the non-inclusion of this request in Red Hat Enterprise Linux 5.3 used the wrong text template. It should have read: this request has been reviewed by Product Management and is not planned for inclusion in the current minor release of Red Hat Enterprise Linux. If you would like this request to be reviewed for the next minor release, ask your support representative to set the next rhel-x.y flag to "?" or raise an exception.