Created attachment 342850 [details] log of starting up libvirtd by hand I've got a Dell R710 and am trying to run virtualization on it. Here is some debug info danpb_ltop wanted: [root@tessellate log]# rpm -q numactl numactl-2.0.2-4.fc11.x86_64 [root@tessellate log]# ls /sys/devices/system/ clocksource/ cpu/ i8237/ i8259/ ioapic/ irqrouter/ kvm/ lapic/ machinecheck/ node/ timekeeping/ [root@tessellate log]# ls /sys/devices/system/node/ has_cpu has_normal_memory node1 online possible [root@tessellate log]# cat /sys/devices/system/node/has_normal_memory 1 [root@tessellate log]# cat /sys/devices/system/node/node1/cpu cpu0/ cpu1/ cpu10/ cpu11/ cpu12/ cpu13/ cpu14/ cpu15/ cpu2/ cpu3/ cpu4/ cpu5/ cpu6/ cpu7/ cpu8/ cpu9/ cpulist cpumap [root@tessellate log]# cat /sys/devices/system/node/node1/cpu cpu0/ cpu1/ cpu10/ cpu11/ cpu12/ cpu13/ cpu14/ cpu15/ cpu2/ cpu3/ cpu4/ cpu5/ cpu6/ cpu7/ cpu8/ cpu9/ cpulist cpumap [root@tessellate log]# cat /sys/devices/system/node/node1/cpumap 00000000,0000ffff [root@tessellate log]# numactl --hardware available: 2 nodes (0-1) libnuma: Warning: /sys not mounted or invalid. Assuming one node: No such file or directory node 0 cpus: node 0 size: <not available> node 0 free: <not available> node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 1 size: 3062 MB node 1 free: 2494 MB libnuma: Warning: Cannot parse distance information in sysfs: No such file or directory No distance information available. [root@tessellate log]# numactl --show libnuma: Warning: /sys not mounted or invalid. Assuming one node: No such file or directory policy: default preferred node: current physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 cpubind: 1 nodebind: 1 membind: 1 [root@tessellate log]#
The BIOS has options for node interleaving of memory. Enabling it seems to work around this problem: [root@tessellate tjb]# numactl --hardware available: 1 nodes (0-0) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 0 size: 3062 MB node 0 free: 2579 MB node distances: node 0 0: 10 [root@tessellate tjb]# numactl --show policy: default preferred node: current physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 cpubind: 0 nodebind: 0 membind: 0 [root@tessellate tjb]#
Pretty sure this is a numactl issue, so moving there
This is working as designed. Virt guests are by default not numa aware (they don't expose the sysfs files that libnuma uses to parse the node topology, hence the warning). Although the broken layout is odd. I've seen this work on several virt guests. Does this work with numactl 2.0.3? It should. If so, I'll just pull that version into rawhide. Thanks!
This isn't about numactl inside the guest, its about the host OS, and numactl failiures breaking libvirt. What's happening is that the topology declares 2 NUMA nodes sysfs however, has only populated a single directory # ls /sys/devices/system/node/ has_cpu has_normal_memory node1 online possible notice there is no 'node0' directory there. So when libvirt asks for the CPU map for node 0, it gets back an error from libnumactl. I think there's several things going on here - libvirt should be more robust against this. If we fail to get NUMA toplogy, we should just disable NUMA bits of libvirtd and carry on, not deactivate the whole QEMU driver - The kernel should have created a node0 directory, and put an empty cpumap there, since the topology clearly says there are 2 nodes, 1 just happens to be empty - If the kernel doesn't want todo this, numactl, should catch this scenario and return an empty CPU map for node 0, and not an error. - Potentially there's a BIOS bug at play here too, because reporter claims the kernel is missing 24 GB of his memory...quite possibly that should have been in node0
Yes, that would explain the odd behavior despite the lack of finding the right directory when libnuma initalized. Again, I think numactl 2.0.3 has fixed at least a subset of this. Do me a favor and try it out on your system, and let me know whats fixed and whats not. I'll pull in that version and fix up the remaining bugs.
I don't see any 2.0.3 builds of numactl in koji. Where should I get it?
I was figuring that you'd just build from upstream sources, but I'll build you a package in koji if it helps...
http://koji.fedoraproject.org/koji/taskinfo?taskID=1341052 There you go
I installed the new version: [root@tessellate tjb]# rpm -q numactl numactl-2.0.3-1.fc12.x86_64 [root@tessellate tjb]# numactl --show libnuma: Warning: /sys not mounted or invalid. Assuming one node: No such file or directory policy: default preferred node: current physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 cpubind: 1 nodebind: 1 membind: 1 [root@tessellate tjb]# numactl --hardware available: 1 nodes (1) node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 1 size: 3062 MB node 1 free: 2675 MB libnuma: Warning: Cannot parse distance information in sysfs: No such file or directory No distance information available. [root@tessellate tjb]# I still get the out of memory error when starting libvirtd.
Ok, it looks like the output is reasonably sane there, but we still seem to be getting some warnings we shouldn't. You said above that your guest registered node1 as available but never node0. That is odd to say the least, but it could happen in the event that node0 never came online. That would be indicative of something else wrong in the hardware I imaigine, but not something that should fail us. I'll write a libnuma patch and spin you a new package shortly.
Hey do me a favor and provide me the directory listing of /sys/devices/system/node/node1 on this system please? I need to know that to figure out which file open its preforming that failing for one of the two ENOENT errors here. Thanks!
tessellate> ls -l /sys/devices/system/node/node1 total 0 lrwxrwxrwx. 1 root root 0 2009-05-08 10:32 cpu0 -> ../../cpu/cpu0/ lrwxrwxrwx. 1 root root 0 2009-05-08 10:32 cpu1 -> ../../cpu/cpu1/ lrwxrwxrwx. 1 root root 0 2009-05-08 10:32 cpu10 -> ../../cpu/cpu10/ lrwxrwxrwx. 1 root root 0 2009-05-08 10:32 cpu11 -> ../../cpu/cpu11/ lrwxrwxrwx. 1 root root 0 2009-05-08 10:32 cpu12 -> ../../cpu/cpu12/ lrwxrwxrwx. 1 root root 0 2009-05-08 10:32 cpu13 -> ../../cpu/cpu13/ lrwxrwxrwx. 1 root root 0 2009-05-08 10:32 cpu14 -> ../../cpu/cpu14/ lrwxrwxrwx. 1 root root 0 2009-05-08 10:32 cpu15 -> ../../cpu/cpu15/ lrwxrwxrwx. 1 root root 0 2009-05-08 10:32 cpu2 -> ../../cpu/cpu2/ lrwxrwxrwx. 1 root root 0 2009-05-08 10:32 cpu3 -> ../../cpu/cpu3/ lrwxrwxrwx. 1 root root 0 2009-05-08 10:32 cpu4 -> ../../cpu/cpu4/ lrwxrwxrwx. 1 root root 0 2009-05-08 10:32 cpu5 -> ../../cpu/cpu5/ lrwxrwxrwx. 1 root root 0 2009-05-08 10:32 cpu6 -> ../../cpu/cpu6/ lrwxrwxrwx. 1 root root 0 2009-05-08 10:32 cpu7 -> ../../cpu/cpu7/ lrwxrwxrwx. 1 root root 0 2009-05-08 10:32 cpu8 -> ../../cpu/cpu8/ lrwxrwxrwx. 1 root root 0 2009-05-08 10:32 cpu9 -> ../../cpu/cpu9/ -r--r--r--. 1 root root 4096 2009-05-08 10:32 cpulist -r--r--r--. 1 root root 4096 2009-05-08 08:50 cpumap -r--r--r--. 1 root root 4096 2009-05-08 10:32 distance -r--r--r--. 1 root root 4096 2009-05-08 08:51 meminfo -r--r--r--. 1 root root 4096 2009-05-08 10:32 numastat -rw-r--r--. 1 root root 4096 2009-05-08 10:32 scan_unevictable_pages tessellate>
Ok, I see whats going on. This new build should fix the errors in the numactl ouput: http://koji.fedoraproject.org/koji/taskinfo?taskID=1343283 I can't guarantee that they'll fix libvirtd though, it depends entirely on how it uses the library (i.e. if it assumes node0 is the first node). you may need to open a separate bug for that. Let me know how this package goes.
[root@tessellate Download]# rpm -q numactl numactl-2.0.3-2.fc12.x86_64 [root@tessellate Download]# numactl --hardware available: 1 nodes (1) node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 1 size: 3062 MB node 1 free: 2665 MB No distance information available. [root@tessellate Download]# numactl --show libnuma: Warning: /sys not mounted or invalid. Assuming one node: No such file or directory policy: default preferred node: current physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 cpubind: 1 nodebind: 1 membind: 1 [root@tessellate Download]#
Ok, looks good. I'll check that in shortly. Thanks!
fixed in numactl-2.0.3-rc3.1.f12. Thanks!
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
*** Bug 506532 has been marked as a duplicate of this bug. ***
Neil - could you or someone else point me at numactl-2.0.3-rc3.1.f12? In the development mirrors, I can only find: numactl-2.0.3-1.fc12.x86_64 And that still has the problem. I'm still trying to figure out what I can't get libvirtd happy on this machine, and I think it's related to that memory error. But that's why I was running with -v in the first place :) [root@cct201 tmp]# rpm -q numactl libvirt numactl-2.0.3-1.fc12.x86_64 libvirt-0.6.2-11.fc11.x86_64 [root@cct201 tmp]# libvirtd -v libnuma: Warning: /sys not mounted or invalid. Assuming one node: No such file or directory libvir: QEMU error : out of memory 12:01:29.816: info : Received unexpected signal 17 12:01:29.816: info : Received unexpected signal 17 12:01:29.816: info : Received unexpected signal 17
http://koji.fedoraproject.org/koji/buildinfo?buildID=101429 sorry, no rc3 in the release name
dang it! They put the warnings back! Well, I can remove the warnings, but thats not going to fix your out of memory issue I've fixed the warnings in numactl-2.0.3-2.fc12
Sorry to bug you again. I have a new suggestion. I suggest that the error be fixed so that it doesn't say /sys is mounted, and instead perhaps warn that an expected file under /sys does not exist. The error appears legit and I'll be filing a separate bug shortly that goes in to why. I do not have this probelm on a 2-socket, non-Nahelem system. No warnings at all from libvirt in verbose mode. I started to dig more in to the code and learned more about the hardware. The hardware I observed this problem on was a 2-socket Nahelem system. In Nahelem-land, each socket is a separate node. OK so far because it appears we were trying to just go to the 2nd node under /sys... HOWEVER, at least on this test 2-socket Nahelem system, the nodes are not numbered 0 and then 1. They are numbered 0 and then 2. libvirt (and its call chain) seem to assume the 2nd node is 1. ls -ld /sys/devices/system/node/node* drwxr-xr-x 2 root root 0 2009-06-17 14:57 /sys/devices/system/node/node0 drwxr-xr-x 2 root root 0 2009-06-17 14:57 /sys/devices/system/node/node2 I believe the QEMU error I saw, which I guess I'll file as a separate bug when I have more detail, seems to be related to this issue as well.
Yeah if numactl can't cope with non-contiguous NUMA node numbering and returns an error, then the libvirt QEMU driver will shut itself down, which would explain your problems there. Feel free to file a bug against Libvirt for this - the failure to query NUMA toplogy should not cause libvirt to stop working, it should simply continue without NUMA toplogy info
Thank you; I'm in the process of filing a bug. -Erik
BZ 506590 filed on the libvirt node numbering issue.
This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.
Today when using top it still failed.
(In reply to Christopher Meng from comment #28) > Today when using top it still failed. Please file a new bug, this report is still pointing at Fedora 11 and the last human comment was 4 years ago.