Bug 1440683
Summary: | libvirt error "Unable to encode message payload" when using virConnectGetAllDomainStats API | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Richard W.M. Jones <rjones> |
Component: | libvirt | Assignee: | Peter Krempa <pkrempa> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.4 | CC: | chhu, chorn, dyuan, fj-lsoft-bm, fj-lsoft-ofuku, jsuchane, juzhou, lmen, mprivozn, mzhan, pkrempa, rjones, tumeya, tzheng, virt-bugs, xiaodwan, xuzhang, yoguma |
Target Milestone: | rc | Keywords: | OtherQA |
Target Release: | 7.4 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | QJ170216-012/74Z | ||
Fixed In Version: | libvirt-3.2.0-7.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1422795 | Environment: | |
Last Closed: | 2017-08-02 00:05:54 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1298243, 1317092, 1422795, 1449577 |
Description
Richard W.M. Jones
2017-04-10 09:25:27 UTC
The problem is that even if we increase the limit arbitrarily, the user then can use an even more insane configuration which will still hit the problem. The question is what is a sane configuration in this aspect. Can we get an idea of how large the buffer should be increased to make 494 disks work? (For me 255 disks works) The current limit is 16MB, and if (for example) we had to increase that to 24MB or 32MB it doesn't seem like a great hardship. The limit is only there to stop someone exhausting memory or other resources in either the client or the daemon, but 32MB doesn't seem like it would cause problems. Yes, it's exactly to prevent too large messages. I think even 128 or 256MiB will allow a similar kind of protection and recent machines have lots of ram anyways. The thing is that you still don't have the certainity that $big_number_of_disks will work. The bulk stats API is unique in this regard since it transports a lot of text data (field names) and thus was the first to hit this limit. It also might be possible to split the returned data into multiple messages, at least for the case where you specify a list of domains. I'm not sure though whether we can do this on the RPC level without breaking compat though. That does suggest another possibility -- using a simple, fixed Huffman encoding on the strings. (I guess this would need to be a different proc nr for backwards compatibility though) For a similar bug (in a different place) see bug 1443066. (In reply to Peter Krempa from comment #6) > Yes, it's exactly to prevent too large messages. I think even 128 or 256MiB > will allow a similar kind of protection and recent machines have lots of ram > anyways. > > The thing is that you still don't have the certainity that > $big_number_of_disks will work. > > The bulk stats API is unique in this regard since it transports a lot of > text data (field names) and thus was the first to hit this limit. > > It also might be possible to split the returned data into multiple messages, > at least for the case where you specify a list of domains. I'm not sure > though whether we can do this on the RPC level without breaking compat > though. This problem has been dealt with in the past exactly in this manner. A bug report comes in that message limits are too small. We review the case, confirm that's the problem and increase the limits. I think we should do the same here as well. So I've tried to figure out how much data you can actually return via this mechanism: The RPC packet payload has roughly 16M limit (- a little overhead) The data entries for a single disk have roughly 500 bytes, for a network interface 400 bytes, 1 vcpu ~150 and the rest which does not scale by domain variables is ~400 bytes. Consider disk being 1k, this means we can return data for roughly 16k disks/interfaces. Unfortunately, if called to return data for multiple VMs, the 16m allowance is split among them, since the data is returned in one RPC call. This means, that a VM having ~500 disks, plus other stuff will use ~1MiB of stats data when overestimating pretty strongly. This still should fit very well if called to return data for a single VM. It looks like that the call was used to return data for a bigger number of VMs, which scales pretty badly. Increasing RPC buffer sizes, while possible won't help much for hosts with a lot of big VMs. A suggestion would be to use the statistics API to retrieve the information for only a limited number of VMs iteratively. That way, while losing some programmer comfort the RPC message size should be large enough to accomodate VMs with nearly insane configurations. It's also possible to retrieve the statistics in parallel, if the set of VMs does not overlap. Given the scenario above I think that the problem is the size explosion from returning data for multiple VMs while the data is huge due to the number of disks. Current configuration would allow to retrieve statistics for roughly 16-32 guests. Doubling or quadrupling the message size would help slightly, but going beyond that won't be possible upstream. I can propose a patch to increase the buffer size, but as said, it will be hard to get it past 100 VMs with such config, so I'd suggest batching the calls to the stats API. I think that making the 500 disks x 1 VM case work will be sufficient for this bug in RHEL 7.4. We might need to look again at this for 7.5, but I hope that limit shouldn't be too invasive for libvirt right now. However I will add (as background) that the reason that we changed virt-top to retrieve as much stats in a single API call was because we previously had serious scalability problems because of the round trip time of individual libvirt requests. So fewer round trips is better from our point of view. I've doubled the upstream RPC message size: commit 97863780e8846277842e746186a54c4d2941755b Author: Peter Krempa <pkrempa> Date: Mon May 22 17:48:09 2017 +0200 rpc: Bump maximum message size to 32M While most of the APIs are okay with 16M messages, the bulk stats API can run into the limit in big configurations. Before we devise a new plan for this, bump this limit slightly to accomodate some more configs. I also wanted to add a suggestion how to use the API for big configurations but there's an ongoing discussion. https://www.redhat.com/archives/libvir-list/2017-May/msg00853.html I have to set this back to ASSIGNED / FailedQA, because for reasons that I don't understand, this does not change the number of disks we can add to a guest before the virConnectGetAllDomainStats throws "Unable to encode message payload". 10:44 < danpb> rjones: hmm, there's some dubious logic in virNetMessageEncodePayload() for increasing the buffer length 10:44 < danpb> unsigned int newlen = (msg->bufferLength - VIR_NET_MESSAGE_LEN_MAX) * 4; 10:44 < danpb> if (newlen > VIR_NET_MESSAGE_MAX) { 10:44 < danpb> virReportError(VIR_ERR_RPC, "%s", _("Unable to encode message payload")); 10:44 < danpb> goto error; 10:44 < danpb> } 10:45 < danpb> we're taking the existing buffer length and multiplying it by 4 and then checking against the upper limit 10:46 < danpb> we start with 65 KB, that jumps to 256 KB, then 1 MB, then 4 MB, then 16 MB 10:47 < danpb> if you multiply 16 mb by 4, you'll get 64 mb which is bigger than the 32mb limit I changed the loop so it doubles instead of * 4 on each iteration. However this did not fix the problem. Next I investigated with gdb, and the behaviour is very strange. With 310 disks, the buffer grows to only 131072 bytes and then succeeds: Breakpoint 1, virNetMessageEncodePayload (msg=msg@entry=0x562db2f24650, filter=0x562db21fd1b0 <xdr_remote_connect_get_all_domain_stats_ret>, data=0x7f9014000b50) at rpc/virnetmessage.c:362 362 newlen *= 2; (gdb) print newlen $15 = 131072 (gdb) cont With 320 disks, the buffer doubles past 33554432 and then fails: Breakpoint 1, virNetMessageEncodePayload (msg=msg@entry=0x562db2f22ec0, filter=0x562db21fd1b0 <xdr_remote_connect_get_all_domain_stats_ret>, data=0x7f90200008e0) at rpc/virnetmessage.c:362 362 newlen *= 2; (gdb) print newlen $8 = 8388608 (gdb) cont Continuing. Breakpoint 1, virNetMessageEncodePayload (msg=msg@entry=0x562db2f22ec0, filter=0x562db21fd1b0 <xdr_remote_connect_get_all_domain_stats_ret>, data=0x7f90200008e0) at rpc/virnetmessage.c:362 362 newlen *= 2; (gdb) print newlen $9 = 16777216 (gdb) cont Continuing. Breakpoint 1, virNetMessageEncodePayload (msg=msg@entry=0x562db2f22ec0, filter=0x562db21fd1b0 <xdr_remote_connect_get_all_domain_stats_ret>, data=0x7f90200008e0) at rpc/virnetmessage.c:362 362 newlen *= 2; (gdb) print newlen $10 = 33554432 (gdb) cont [virt-top then fails with the "Unable to encode message payload" error.] I don't understand how there can be such a huge discontinuity between 310 and 320 disks, but I suspect there must be some kind further bug either in the loop or elsewhere in the libvirt code. OK I understand what's the problem. We're hitting this limit: struct remote_connect_get_all_domain_stats_ret { remote_domain_stats_record retStats<REMOTE_DOMAIN_LIST_MAX>; }; I'm fairly sure that the use of REMOTE_DOMAIN_LIST_MAX is simply wrong there. It should be another REMOTE_*_MAX limit, although I'm not sure which -- possibly we need to define a new limit. We don't need to increase VIR_NET_MESSAGE_MAX at all, but I am going to submit a patch upstream so that we double the size of the array instead of quadrupling it (but that won't be relevant to the eventual fix for this bug). Verify this BZ per following BZ. https://bugzilla.redhat.com/show_bug.cgi?id=1422795#c26 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846 |