Bug 869361
Summary: | UV: KVM virt-manager fails to launch on large memory systems (>8TB) | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | George Beshers <gbeshers> |
Component: | libvirt | Assignee: | Libvirt Maintainers <libvirt-maint> |
Status: | CLOSED DUPLICATE | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 6.4 | CC: | acathrow, ajia, berrange, ctatman, dallan, dfults, dyasny, dyuan, gbeshers, gsun, honzhang, leiwang, loriann, mprivozn, qguan, randerso, rja, tee, wshi, xuzhang |
Target Milestone: | rc | Keywords: | OtherQA |
Target Release: | 6.5 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-06-07 14:46:13 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 844783 |
Description
George Beshers
2012-10-23 16:59:21 UTC
Does this bug have the wrong component? What is "UV"? What is the output of the following commands when run as root? virsh list --all virsh capabilities UV = Ultra Violet -- SGI's x86_64 super commuters. There are actually two systems where this fails, UV1000 w/ 2048cpus and 8TB of memory UV2000 w/ 2048cpus and 16TB of memory In both cases that is 2048 cores -- we are not currently enabling HyperThreading. I will add the requested information soon. I have the output from ' strace -o virsh-out -ff virsh capabilities' if that is helpful. # virsh list --all Id Name State ---------------------------------------------------- # virsh capabilities error: failed to get capabilities error: Unable to encode message payload error: Reconnected to the hypervisor [root@uv48-sys ~]# topology System type: UV100/1000 System name: uv48-sys Serial number: UV-00000048 Partition number: 0 128 Blades 64 Routers 4096 CPUs 128 Nodes 9084.85 GB Memory Total 128.00 GB Max Memory on any Node 1 BASE I/O Riser 2 Network Controllers 1 Storage Controller 8 USB Controllers 1 VGA GPU This has the wrong component, which is why no one was looking at it. From the info in comment #4 I'd guess it is probably not the amount of RAM that's the trigger, but rather the size of the NUMA topology causing very large capabilities XML Provide provide the version of the libvirt RPM that is installed when seeing this behaviour. George, I think this is the very same bug that we've chased a while ago. Let me find it. Move to rhel6.5 tracker. George, can you please provide both server & client side debug logs as well as version requested in comment 7? http://wiki.libvirt.org/page/DebugLogs Thanks. Michael, that info is in BZ 960683. This BZ should get closed out as replaced by BZ 960683. Sorry for the confusion. *** This bug has been marked as a duplicate of bug 960683 *** hi, Michal Privoznik, I'm verifying this bug in the latest libvirt 6.5 build. First, I need to reproduce this bug in the old build, if it can be reproduced, then, we test the latest build to verify the bugs. But the problem is that we didn't have the large machine which memory is large than 8T. Since this bug is duplicated with bug 960683, and there is one attachment to simulate huge cpus on small boxes. I add that patch to the old build and try to reproduce this bug. The result is: This bug (869361) can't be reproduced via that simulated path. The bug (960683) can be reproduced via that simulated path. PS. Here is the simulated path link: https://bugzilla.redhat.com/attachment.cgi?id=756168 Would you please give me some advice, how can I simulated one env to reproduce this bug? Or how can I verify this bug in the latest build? Thanks very much. Well I don't think this one needs to be reproduced. It is a duplicate. The orginal problem for this bug was encoding numa topology into capabilities XML. The encoded XML was too big for a libvirt packet. However, we've fixed it meanwhile and now even huge XML can be sent through. OK, I got it. Thanks for your quickly reply. (In reply to Michal Privoznik from comment #15) > Well I don't think this one needs to be reproduced. It is a duplicate. The > orginal problem for this bug was encoding numa topology into capabilities > XML. The encoded XML was too big for a libvirt packet. However, we've fixed > it meanwhile and now even huge XML can be sent through. |