Bug 506590
Summary: | libvirt should ignore NUMA cells with missing topology | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | erikj | ||||||||
Component: | libvirt | Assignee: | Daniel Berrangé <berrange> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 11 | CC: | berrange, clalance, crobinso, itamar, j, markmc, veillard, virt-maint | ||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | 0.6.2-17.fc11 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2009-09-04 04:10:06 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 480594 | ||||||||||
Attachments: |
|
Description
erikj
2009-06-17 21:00:55 UTC
I forgot to mention that, on Nahelem systems, each socket is a separate node. This is a difference from non-Nahelem systems. Thanks to Simon Phatigaraphong, here is some interesting output from the numactl command on this problem system. Kannan Somangili told us that RHEL5.3 showed sequential node numbers instead of skipping node 1 like it does here. [root@cct201 tmp]# numactl --hardware available: 3 nodes (0-2) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 4087 MB node 0 free: 3410 MB libnuma: Warning: /sys not mounted or invalid. Assuming one node: No such file or directory node 1 cpus: node 1 size: <not available> node 1 free: <not available> node 2 cpus: node 2 size: 4096 MB node 2 free: 3962 MB No distance information available. I will file a separate issue on numactl. That's because I don't think libvirtd should become useless if the topology isn't right. So bug on numactl to follow. I think I'll hold off on a separate bug as I think it would just be a dupe of 499633. I'm open to advice. Created attachment 348421 [details]
Ignore NUMA cells with missing topology
Could you try applying this patch and seeing if it solves the problems you have. It'll make libvirt just ignore NUMA cells with missing topology
Oh, and if this works can you attach the libvirt 'virsh capabilities' XML that results, just so we can make sure its still correct. I filed BZ 506795 as a request to change numactl to the upstream version that doesn't tip over for non-sequential nodes. I filed BZ 506805 from the kernel angle, with links to a community discussion/patches. I'm going to run the test for this PV shortly. Yes! I re-build libvirt with the patch from comment #5 and it worked. note: I'm losing access to one of the few systems I can duplicate this on late today. Please let me know if there is anything else you want me to run on this. I'm not sure when I'll have access again :( Per comment #6, here is the output: [root@cct201 ~]# virsh capabilities <capabilities> <host> <cpu> <arch>x86_64</arch> </cpu> <topology> <cells num='3'> <cell id='0'> <cpus num='8'> <cpu id='0'/> <cpu id='1'/> <cpu id='2'/> <cpu id='3'/> <cpu id='4'/> <cpu id='5'/> <cpu id='6'/> <cpu id='7'/> </cpus> </cell> <cell id='1'> <cpus num='64'> <cpu id='0'/> <cpu id='1'/> <cpu id='2'/> <cpu id='3'/> <cpu id='4'/> <cpu id='5'/> <cpu id='6'/> <cpu id='7'/> <cpu id='8'/> <cpu id='9'/> <cpu id='10'/> <cpu id='11'/> <cpu id='12'/> <cpu id='13'/> <cpu id='14'/> <cpu id='15'/> <cpu id='16'/> <cpu id='17'/> <cpu id='18'/> <cpu id='19'/> <cpu id='20'/> <cpu id='21'/> <cpu id='22'/> <cpu id='23'/> <cpu id='24'/> <cpu id='25'/> <cpu id='26'/> <cpu id='27'/> <cpu id='28'/> <cpu id='29'/> <cpu id='30'/> <cpu id='31'/> <cpu id='32'/> <cpu id='33'/> <cpu id='34'/> <cpu id='35'/> <cpu id='36'/> <cpu id='37'/> <cpu id='38'/> <cpu id='39'/> <cpu id='40'/> <cpu id='41'/> <cpu id='42'/> <cpu id='43'/> <cpu id='44'/> <cpu id='45'/> <cpu id='46'/> <cpu id='47'/> <cpu id='48'/> <cpu id='49'/> <cpu id='50'/> <cpu id='51'/> <cpu id='52'/> <cpu id='53'/> <cpu id='54'/> <cpu id='55'/> <cpu id='56'/> <cpu id='57'/> <cpu id='58'/> <cpu id='59'/> <cpu id='60'/> <cpu id='61'/> <cpu id='62'/> <cpu id='63'/> </cpus> </cell> <cell id='2'> <cpus num='0'> </cpus> </cell> </cells> </topology> </host> <guest> <os_type>hvm</os_type> <arch name='i686'> <wordsize>32</wordsize> <emulator>/usr/bin/qemu</emulator> <machine>pc</machine> <machine>isapc</machine> <domain type='qemu'> </domain> <domain type='kvm'> <emulator>/usr/bin/qemu-kvm</emulator> </domain> </arch> <features> <pae/> <nonpae/> <acpi default='on' toggle='yes'/> <apic default='on' toggle='no'/> </features> </guest> <guest> <os_type>hvm</os_type> <arch name='x86_64'> <wordsize>64</wordsize> <emulator>/usr/bin/qemu-system-x86_64</emulator> <machine>pc</machine> <machine>isapc</machine> <domain type='qemu'> </domain> <domain type='kvm'> <emulator>/usr/bin/qemu-kvm</emulator> </domain> </arch> <features> <acpi default='on' toggle='yes'/> <apic default='on' toggle='no'/> </features> </guest> <guest> <os_type>hvm</os_type> <arch name='mips'> <wordsize>32</wordsize> <emulator>/usr/bin/qemu-system-mips</emulator> <machine>mips</machine> <domain type='qemu'> </domain> </arch> </guest> <guest> <os_type>hvm</os_type> <arch name='mipsel'> <wordsize>32</wordsize> <emulator>/usr/bin/qemu-system-mipsel</emulator> <machine>mips</machine> <domain type='qemu'> </domain> </arch> </guest> <guest> <os_type>hvm</os_type> <arch name='sparc'> <wordsize>32</wordsize> <emulator>/usr/bin/qemu-system-sparc</emulator> <machine>sun4m</machine> <domain type='qemu'> </domain> </arch> </guest> <guest> <os_type>hvm</os_type> <arch name='ppc'> <wordsize>32</wordsize> <emulator>/usr/bin/qemu-system-ppc</emulator> <machine>g3bw</machine> <machine>mac99</machine> <machine>prep</machine> <domain type='qemu'> </domain> </arch> </guest> </capabilities> Thanks for testing Erik Okay, I applied the patch upstream, it will be in 0.6.5 next week I'm not sure it really need to be backported to F-11 though, thanks ! Daniel Erik: I think the issue is resolved in F-11 for you by numactl-2.0.3-1.fc11 ... please re-open if you think we should backport the libvirt fix to F-11. Just an FYI that I still use a patched libvirt because, even with the numactl fix, I was still seeing the QEMU Memory error issue that prevented proper startup. Is that scary? Maybe I should provide more information on this if it's important. For the record, I'm having the same problem (dual socket Nehalem, Supermicro X8DTT-F motherboard, 48GB DDR3), fully updated F11, libvirt-0.6.2-12.fc11.x86_64, numactl-2.0.3-1.fc11.x86_64. This bug seems pretty serious (no functionality at all, and no workaround) but I guess the number of folks who would want to do virtualization on this level of hardware under F11 is small so I can understand why a backport might not be forthcoming. Fortunately I can build my own packages; anyone who happens across this ticket is welcome to grab my scratch build at http://koji.fedoraproject.org/koji/taskinfo?taskID=1491137 (at least until it expires). (In reply to comment #12) > Just an FYI that I still use a patched libvirt because, even with the numactl > fix, I was still seeing the QEMU Memory error issue that prevented proper > startup. > > Is that scary? Yes, it certainly is. I don't really follow exactly what's going on here, but if the libvirt patch fixes it, we should just backport it. Jes points out related kernel bug #507033 and bug #506805 Created attachment 357304 [details]
Ignore NUMA initialization failures
libvirt-0.6.2-15.fc11 has been submitted as an update for Fedora 11. http://admin.fedoraproject.org/updates/libvirt-0.6.2-15.fc11 libvirt-0.6.2-15.fc11 has been pushed to the Fedora 11 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update libvirt'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-8598 libvirt-0.6.2-16.fc11 has been submitted as an update for Fedora 11. http://admin.fedoraproject.org/updates/libvirt-0.6.2-16.fc11 libvirt-0.6.2-17.fc11 has been pushed to the Fedora 11 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update libvirt'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-8790 libvirt-0.6.2-17.fc11 has been pushed to the Fedora 11 stable repository. If problems still persist, please make note of it in this bug report. |