Hide Forgot
Andrey Vagin from Parallels Linux Kernel team found an issue with incorrect apic initialization on nodes with more than 8 CPUs in case CONFIG_HOTPLUG_CPU is disabled. Current RHEL5 kernel (2.6.18-238.1.1) contains the following code: arch/x86_64/kernel/genapic.c 41 void __init clustered_apic_check(void) 42 { ... 48 #ifdef CONFIG_HOTPLUG_CPU ... 50 genapic = &apic_physflat; ... 58 if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) { 59 for (i = 0; i < NR_CPUS; i++) { 60 if (bios_cpu_apicid[i] == BAD_APICID) 61 continue; 62 if (bios_cpu_apicid[i] > max_apic) 63 max_apic = bios_cpu_apicid[i]; 64 } 65 66 if (max_apic <= 8) 67 genapic = &apic_flat; 68 } 69 #else 70 genapic = &apic_flat; 71 #endif This means if the kernel is compiled without CONFIG_HOTPLUG_CPU a node with more than 8 CPUs will try to use "apic_flat", which won't work. This is a real issue: we experienced a 64 CPUs node hang on boot (the last line on log was "Jan 25 14:28:35 test-server Brought up 64 CPUs". The change was introduced by the following patch: linux-2.6-x86_64-unify-apic-mapping-code.patch NOTE 1: native Redhat 2.6.18-238.1.1 kernel is not affected because it is built with CONFIG_HOTPLUG_CPU config option enabled. NOTE2: In the current code in Redhat code apic_flat is used only on AMD nodes with less than 8 CPUs. Why the node must be AMD? Mainstream code (2.6.37) do not differ AMD nodes from Intel ones in this case: ./arch/x86/kernel/apic/probe_64.c 52 /* 53 * Check the APIC IDs in bios_cpu_apicid and choose the APIC mode. 54 */ 55 void __init default_setup_apic_routing(void) ... 73 if (apic == &apic_flat && num_possible_cpus() > 8) 74 apic = &apic_physflat; ... i wonder do you have any issues with Intel nodes with less than 8 cpus using apic_flat? Thank you.
> NOTE 1: native Redhat 2.6.18-238.1.1 kernel is not affected because it is built > with CONFIG_HOTPLUG_CPU config option enabled. > Right -- we have not tested the !CONFIG_HOTPLUG_CPU case. > NOTE2: In the current code in Redhat code apic_flat is used only on AMD nodes > with less than 8 CPUs. Why the node must be AMD? Mainstream code (2.6.37) do > not differ AMD nodes from Intel ones in this case: > ./arch/x86/kernel/apic/probe_64.c > 52 /* > 53 * Check the APIC IDs in bios_cpu_apicid and choose the APIC mode. > 54 */ > 55 void __init default_setup_apic_routing(void) > ... > 73 if (apic == &apic_flat && num_possible_cpus() > 8) > 74 apic = &apic_physflat; > ... > > i wonder do you have any issues with Intel nodes with less than 8 cpus using > apic_flat? None that I know of. P. > > Thank you.
> > i wonder do you have any issues with Intel nodes with less than 8 cpus using > > apic_flat? > > None that I know of. Thank you for handling this, and as there are no issues with Intel nodes with less than 8 cpus using apic_flat, may be it is better to let Intel node also benefit from apic_flat performance as it is done in mainstream nowadays? Are you doing to make this change?
(In reply to comment #2) > > > i wonder do you have any issues with Intel nodes with less than 8 cpus using > > > apic_flat? > > > > None that I know of. > > Thank you for handling this, and as there are no issues with Intel nodes with > less than 8 cpus using apic_flat, may be it is better to let Intel node also > benefit from apic_flat performance as it is done in mainstream nowadays? > Are you doing to make this change? It is unlikely that we will change to apic_flat unless we see a significant bug against physical flat. Most of our customers run with apic flat and have been doing so for some time without issue. P.