Red Hat Bugzilla – Full Text Bug Listing
|Summary:||UV: KVM virt-manager fails to launch on large memory systems (>8TB)|
|Product:||Red Hat Enterprise Linux 6||Reporter:||George Beshers <gbeshers>|
|Component:||libvirt||Assignee:||Libvirt Maintainers <libvirt-maint>|
|Status:||CLOSED DUPLICATE||QA Contact:||Virtualization Bugs <virt-bugs>|
|Version:||6.4||CC:||acathrow, ajia, berrange, ctatman, dallan, dfults, dyasny, dyuan, gbeshers, gsun, honzhang, leiwang, loriann, mprivozn, qguan, randerso, rja, tee, wshi, xuzhang|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2013-06-07 10:46:13 EDT||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:|
Description George Beshers 2012-10-23 12:59:21 EDT
Description of problem: The current problem occurs when trying to launch virt-manager on a large memory system. Smallest I've seen so far, was uv48-sys with 8TB memory. When running virt-manager you'll see the error. libvirtError: Unable to encode message payload. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Comment 2 Richard W.M. Jones 2012-10-23 13:26:11 EDT
Does this bug have the wrong component? What is "UV"? What is the output of the following commands when run as root? virsh list --all virsh capabilities
Comment 3 George Beshers 2012-10-23 13:46:22 EDT
UV = Ultra Violet -- SGI's x86_64 super commuters. There are actually two systems where this fails, UV1000 w/ 2048cpus and 8TB of memory UV2000 w/ 2048cpus and 16TB of memory In both cases that is 2048 cores -- we are not currently enabling HyperThreading. I will add the requested information soon.
Comment 4 Derek Fults 2012-10-29 11:27:54 EDT
I have the output from ' strace -o virsh-out -ff virsh capabilities' if that is helpful. # virsh list --all Id Name State ---------------------------------------------------- # virsh capabilities error: failed to get capabilities error: Unable to encode message payload error: Reconnected to the hypervisor [root@uv48-sys ~]# topology System type: UV100/1000 System name: uv48-sys Serial number: UV-00000048 Partition number: 0 128 Blades 64 Routers 4096 CPUs 128 Nodes 9084.85 GB Memory Total 128.00 GB Max Memory on any Node 1 BASE I/O Riser 2 Network Controllers 1 Storage Controller 8 USB Controllers 1 VGA GPU
Comment 5 Richard W.M. Jones 2013-06-05 07:20:27 EDT
This has the wrong component, which is why no one was looking at it.
Comment 6 Daniel Berrange 2013-06-05 10:13:00 EDT
From the info in comment #4 I'd guess it is probably not the amount of RAM that's the trigger, but rather the size of the NUMA topology causing very large capabilities XML
Comment 7 Daniel Berrange 2013-06-05 10:14:43 EDT
Provide provide the version of the libvirt RPM that is installed when seeing this behaviour.
Comment 8 Michal Privoznik 2013-06-05 10:22:19 EDT
George, I think this is the very same bug that we've chased a while ago. Let me find it.
Comment 9 Michal Privoznik 2013-06-05 10:44:47 EDT
Comment 10 Russ Anderson 2013-06-06 22:27:55 EDT
Move to rhel6.5 tracker.
Comment 11 Michal Privoznik 2013-06-07 02:14:25 EDT
George, can you please provide both server & client side debug logs as well as version requested in comment 7? http://wiki.libvirt.org/page/DebugLogs Thanks.
Comment 12 Russ Anderson 2013-06-07 10:45:10 EDT
Comment 13 Russ Anderson 2013-06-07 10:46:13 EDT
*** This bug has been marked as a duplicate of bug 960683 ***
Comment 14 Xuesong Zhang 2013-10-15 05:08:05 EDT
hi, Michal Privoznik, I'm verifying this bug in the latest libvirt 6.5 build. First, I need to reproduce this bug in the old build, if it can be reproduced, then, we test the latest build to verify the bugs. But the problem is that we didn't have the large machine which memory is large than 8T. Since this bug is duplicated with bug 960683, and there is one attachment to simulate huge cpus on small boxes. I add that patch to the old build and try to reproduce this bug. The result is: This bug (869361) can't be reproduced via that simulated path. The bug (960683) can be reproduced via that simulated path. PS. Here is the simulated path link: https://bugzilla.redhat.com/attachment.cgi?id=756168 Would you please give me some advice, how can I simulated one env to reproduce this bug? Or how can I verify this bug in the latest build? Thanks very much.
Comment 15 Michal Privoznik 2013-10-15 05:30:42 EDT
Well I don't think this one needs to be reproduced. It is a duplicate. The orginal problem for this bug was encoding numa topology into capabilities XML. The encoded XML was too big for a libvirt packet. However, we've fixed it meanwhile and now even huge XML can be sent through.
Comment 16 Xuesong Zhang 2013-10-15 05:49:21 EDT
OK, I got it. Thanks for your quickly reply. (In reply to Michal Privoznik from comment #15) > Well I don't think this one needs to be reproduced. It is a duplicate. The > orginal problem for this bug was encoding numa topology into capabilities > XML. The encoded XML was too big for a libvirt packet. However, we've fixed > it meanwhile and now even huge XML can be sent through.