Bug 869361 - UV: KVM virt-manager fails to launch on large memory systems (>8TB)
UV: KVM virt-manager fails to launch on large memory systems (>8TB)
Status: CLOSED DUPLICATE of bug 960683
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt (Show other bugs)
6.4
x86_64 Linux
unspecified Severity urgent
: rc
: 6.5
Assigned To: Libvirt Maintainers
Virtualization Bugs
: OtherQA
Depends On:
Blocks: 844783
  Show dependency treegraph
 
Reported: 2012-10-23 12:59 EDT by George Beshers
Modified: 2013-10-15 05:49 EDT (History)
20 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-06-07 10:46:13 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description George Beshers 2012-10-23 12:59:21 EDT
Description of problem:

The current problem occurs when trying to launch virt-manager on a
large memory system.  Smallest I've seen so far, was uv48-sys with
8TB memory. 

When running virt-manager you'll see the error.

     libvirtError: Unable to encode message payload.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 2 Richard W.M. Jones 2012-10-23 13:26:11 EDT
Does this bug have the wrong component?

What is "UV"?

What is the output of the following commands when run as root?

  virsh list --all
  virsh capabilities
Comment 3 George Beshers 2012-10-23 13:46:22 EDT
UV = Ultra Violet -- SGI's x86_64 super commuters.

There are actually two systems where this fails,

   UV1000 w/ 2048cpus and 8TB of memory
   UV2000 w/ 2048cpus and 16TB of memory

In both cases that is 2048 cores -- we are not currently
enabling HyperThreading.

I will add the requested information soon.
Comment 4 Derek Fults 2012-10-29 11:27:54 EDT
I have the output from ' strace -o virsh-out -ff virsh capabilities'
if that is helpful.  


# virsh list --all
 Id    Name                           State
----------------------------------------------------


#   virsh capabilities
error: failed to get capabilities
error: Unable to encode message payload
error: Reconnected to the hypervisor

[root@uv48-sys ~]# topology
System type: UV100/1000
System name: uv48-sys
Serial number: UV-00000048
Partition number: 0
     128 Blades
      64 Routers
    4096 CPUs
     128 Nodes
 9084.85 GB Memory Total
  128.00 GB Max Memory on any Node
       1 BASE I/O Riser
       2 Network Controllers
       1 Storage Controller
       8 USB Controllers
       1 VGA GPU
Comment 5 Richard W.M. Jones 2013-06-05 07:20:27 EDT
This has the wrong component, which is why no one was looking at it.
Comment 6 Daniel Berrange 2013-06-05 10:13:00 EDT
From the info in comment #4 I'd guess it is probably not the amount of RAM that's the trigger, but rather the size of the NUMA topology causing very large capabilities XML
Comment 7 Daniel Berrange 2013-06-05 10:14:43 EDT
Provide provide the version of the libvirt RPM that is installed when seeing this behaviour.
Comment 8 Michal Privoznik 2013-06-05 10:22:19 EDT
George, I think this is the very same bug that we've chased a while ago. Let me find it.
Comment 9 Michal Privoznik 2013-06-05 10:44:47 EDT
Found it:

https://bugzilla.redhat.com/show_bug.cgi?id=797279
Comment 10 Russ Anderson 2013-06-06 22:27:55 EDT
Move to rhel6.5 tracker.
Comment 11 Michal Privoznik 2013-06-07 02:14:25 EDT
George,

can you please provide both server & client side debug logs as well as version requested in comment 7?

http://wiki.libvirt.org/page/DebugLogs

Thanks.
Comment 12 Russ Anderson 2013-06-07 10:45:10 EDT
Michael, that info is in BZ 960683.

This BZ should get closed out as replaced by BZ 960683.
Sorry for the confusion.
Comment 13 Russ Anderson 2013-06-07 10:46:13 EDT

*** This bug has been marked as a duplicate of bug 960683 ***
Comment 14 Xuesong Zhang 2013-10-15 05:08:05 EDT
hi, Michal Privoznik,

   I'm verifying this bug in the latest libvirt 6.5 build. 
   First, I need to reproduce this bug in the old build, if it can be reproduced, then, we test the latest build to verify the bugs. But the problem is that we didn't have the large machine which memory is large than 8T.
   Since this bug is duplicated with bug 960683, and there is one attachment to simulate huge cpus on small boxes. I add that patch to the old build and try to reproduce this bug. The result is: 
   This bug (869361) can't be reproduced via that simulated path.
   The bug (960683) can be reproduced via that simulated path.
   PS. Here is the simulated path link: https://bugzilla.redhat.com/attachment.cgi?id=756168

   Would you please give me some advice, how can I simulated one env to reproduce this bug? Or how can I verify this bug in the latest build? Thanks very much.
Comment 15 Michal Privoznik 2013-10-15 05:30:42 EDT
Well I don't think this one needs to be reproduced. It is a duplicate. The orginal problem for this bug was encoding numa topology into capabilities XML. The encoded XML was too big for a libvirt packet. However, we've fixed it meanwhile and now even huge XML can be sent through.
Comment 16 Xuesong Zhang 2013-10-15 05:49:21 EDT
OK, I got it. Thanks for your quickly reply.

(In reply to Michal Privoznik from comment #15)
> Well I don't think this one needs to be reproduced. It is a duplicate. The
> orginal problem for this bug was encoding numa topology into capabilities
> XML. The encoded XML was too big for a libvirt packet. However, we've fixed
> it meanwhile and now even huge XML can be sent through.

Note You need to log in before you can comment on or make changes to this bug.