Bug 539684

Summary: EXPERIMENTAL EX/MC: Fix Xen NUMA [rhel-5.4.z]
Product: Red Hat Enterprise Linux 5 Reporter: Benjamin Kahn <bkahn>
Component: kernel-xenAssignee: Jiri Pirko <jpirko>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.4CC: anton, berrange, bnagendr, clalance, dhoward, dzickus, emcnabb, lcm, pbonzini, pm-eus, rdoty, rkhan, xen-maint, yugzhang
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.18-164.7.1.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-15 17:20:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 526051    
Bug Blocks:    

Description Benjamin Kahn 2009-11-20 20:19:07 UTC
This bug has been copied from bug #526051 and has been proposed
to be backported to 5.4 z-stream (EUS).

Comment 2 Anton Arapov 2009-11-21 21:45:39 UTC
in kernel-2.6.18-164.7.1.el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.

Comment 3 Igor Zhang 2009-11-24 03:27:45 UTC
From Comment #8 in
https://bugzilla.redhat.com/show_bug.cgi?id=526051
we know that this bug cannot be reproduced everytime. I try testing it on amd-dinar-04.lab.bos.redhat.com(x86 dom0, numa=on, K10 platform) and cannot reproduce it.

Comment 4 Chris Lalancette 2009-11-24 07:46:19 UTC
(In reply to comment #3)
> From Comment #8 in
> https://bugzilla.redhat.com/show_bug.cgi?id=526051
> we know that this bug cannot be reproduced everytime. I try testing it on
> amd-dinar-04.lab.bos.redhat.com(x86 dom0, numa=on, K10 platform) and cannot
> reproduce it.  

Hm, I'm not sure we should give up so easily.  I'm under the impression that with the right hardware, this can be reproduced every single time.  Bhavna, what hardware do we need to be able to reproduce this and then confirm the fix?

Chris Lalancette

Comment 6 Igor Zhang 2009-11-30 09:32:26 UTC
(In reply to comment #4)

Borrow comment #5 from BZ526051
> Ah, cool, good stuff.  Interestingly, I booted my Barcelona with exactly the
> same parameters this morning (i386 dom0, numa=on), and it booted just fine. 
> Maybe it has to do with how the memory is laid out, I'm not sure.  In any case,
> good that we have the fix now.
> 
> Chris Lalancette  

Just FYI, numa support isn't configured into the i386 brew kernel by default.

Comment 7 Igor Zhang 2009-11-30 09:59:15 UTC
(In reply to comment #3)
> From Comment #8 in
> https://bugzilla.redhat.com/show_bug.cgi?id=526051
> we know that this bug cannot be reproduced everytime. I try testing it on
> amd-dinar-04.lab.bos.redhat.com(x86 dom0, numa=on, K10 platform) and cannot
> reproduce it.  

I think I did a wrong test on the machine(amd-dinar-04.lab.bos.redhat.com) for knowing little about numa. Today I retest it on the same machine and reproduced it with kernel-xen-2.6.18-164.6.1.el5.x86_64.rpm.

Comment 8 Chris Lalancette 2009-11-30 10:07:39 UTC
(In reply to comment #7)
> (In reply to comment #3)
> > From Comment #8 in
> > https://bugzilla.redhat.com/show_bug.cgi?id=526051
> > we know that this bug cannot be reproduced everytime. I try testing it on
> > amd-dinar-04.lab.bos.redhat.com(x86 dom0, numa=on, K10 platform) and cannot
> > reproduce it.  
> 
> I think I did a wrong test on the machine(amd-dinar-04.lab.bos.redhat.com) for
> knowing little about numa. Today I retest it on the same machine and reproduced
> it with kernel-xen-2.6.18-164.6.1.el5.x86_64.rpm.  

OK, great!  Just so you know...

(In reply to comment #6)
> (In reply to comment #4)
> 
> Borrow comment #5 from BZ526051
> > Ah, cool, good stuff.  Interestingly, I booted my Barcelona with exactly the
> > same parameters this morning (i386 dom0, numa=on), and it booted just fine. 
> > Maybe it has to do with how the memory is laid out, I'm not sure.  In any case,
> > good that we have the fix now.
> > 
> > Chris Lalancette  
> 
> Just FYI, numa support isn't configured into the i386 brew kernel by default.  

NUMA support *is* configured into the i386 Xen hypervisor, however.

Chris Lalancette

Comment 9 Igor Zhang 2009-11-30 10:37:15 UTC
Some guy checked the configure items for us and it shew:
<caiqian> kernel-2.6.18-i686-PAE.config:# CONFIG_X86_NUMAQ is not set
<caiqian> kernel-2.6.18-i686-PAE.config:# CONFIG_NUMA is not set
<caiqian> kernel-2.6.18-i686-PAE.config:CONFIG_ACPI_NUMA=y
<caiqian> kernel-2.6.18-i686-xen.config:# CONFIG_X86_NUMAQ is not set
<caiqian> kernel-2.6.18-i686-xen.config:# CONFIG_NUMA is not set
<caiqian> kernel-2.6.18-i686-xen.config:CONFIG_ACPI_NUMA=y
<caiqian> kernel-2.6.18-ia64.config:CONFIG_NUMA=y
<caiqian> kernel-2.6.18-ia64.config:CONFIG_ACPI_NUMA=y
...
From that probably we can guess it isn't supported in i386 Xen hypervisor either.

Comment 10 Daniel Berrangé 2009-11-30 10:47:23 UTC
> Just FYI, numa support isn't configured into the i386 brew kernel by default.  

It does not need to be configured in the kernels. NUMA is managed exclusively by the Xen hypervisor, neither Dom0, nor DomU gets to see any NUMA topology when running on the hypervisor.


> <caiqian> kernel-2.6.18-i686-xen.config:# CONFIG_X86_NUMAQ is not set
> <caiqian> kernel-2.6.18-i686-xen.config:# CONFIG_NUMA is not set
> <caiqian> kernel-2.6.18-i686-xen.config:CONFIG_ACPI_NUMA=y
> <caiqian> kernel-2.6.18-ia64.config:CONFIG_NUMA=y
> <caiqian> kernel-2.6.18-ia64.config:CONFIG_ACPI_NUMA=y
> ...
> From that probably we can guess it isn't supported in i386 Xen hypervisor
> either.  

AFAIK, the  .config settings have *no* effect on the way the hypervisor is built. The hypervisor has very few compile config settings, pretty much everything is on by default all the time.

Comment 11 Igor Zhang 2009-11-30 11:17:05 UTC
Many thanks to Danie and Chris. I can verify this bus on kernel-2.6.18-164.8.1.el5xen now. After booting successfully, I check numa with virsh cmd. It shows:
[root@amd-dinar-04 boot]# virsh  nodeinfo
CPU model:           x86_64
CPU(s):              24
CPU frequency:       2094 MHz
CPU socket(s):       1
Core(s) per socket:  12
Thread(s) per core:  1
NUMA cell(s):        4
Memory size:         7337984 kB

And I cannot check numa in usual ways.
Then is this ok?

Igor.

Comment 12 Chris Lalancette 2009-11-30 11:38:01 UTC
(In reply to comment #11)
> Many thanks to Danie and Chris. I can verify this bus on
> kernel-2.6.18-164.8.1.el5xen now. After booting successfully, I check numa with
> virsh cmd. It shows:
> [root@amd-dinar-04 boot]# virsh  nodeinfo
> CPU model:           x86_64
> CPU(s):              24
> CPU frequency:       2094 MHz
> CPU socket(s):       1
> Core(s) per socket:  12
> Thread(s) per core:  1
> NUMA cell(s):        4
> Memory size:         7337984 kB
> 
> And I cannot check numa in usual ways.
> Then is this ok?

That looks reasonable to me.  The fact that the system did not crash, and that it is showing NUMA cell(s): 4 seems to say it is working.  You can do one final test and get the output from "xm info" which will show a bit more detailed information, but I would say this is VERIFIED.

Chris Lalancette

Comment 13 Igor Zhang 2009-11-30 12:53:21 UTC
Yes, it doesn't crash. I just want to ensure numa is working normally when it can be bootstraped. Final test shows:
[root@amd-dinar-04 boot]# xm info
host                   : amd-dinar-04.lab.bos.redhat.com
release                : 2.6.18-164.8.1.el5xen
version                : #1 SMP Mon Nov 23 13:10:23 EST 2009
machine                : x86_64
nr_cpus                : 24
nr_nodes               : 4
sockets_per_node       : 0
cores_per_socket       : 12
threads_per_core       : 1
cpu_mhz                : 2094
hw_caps                : 178bfbff:efd3fbff:00000000:00000110:00802009:00000000:000837ff
total_memory           : 7166
free_memory            : 383
node_to_cpu            : node0:0-5
                         node1:6-11
                         node2:18-23
                         node3:12-17
xen_major              : 3
xen_minor              : 1
xen_extra              : .2-164.8.1.el5
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 
xen_pagesize           : 4096
platform_params        : virt_start=0xffff800000000000
xen_changeset          : unavailable
cc_compiler            : gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
cc_compile_by          : mockbuild
cc_compile_domain      : redhat.com
cc_compile_date        : Mon Nov 23 13:00:50 EST 2009
xend_config_format     : 2

Comment 16 errata-xmlrpc 2009-12-15 17:20:29 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1670.html