Bug 869650
| Summary: | Can't add more than 256 logical networks | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Chris Pelland <cpelland> |
| Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 6.4 | CC: | abaron, acathrow, bazulay, cpelland, dallan, danken, dyasny, dyuan, iheim, jdenemar, lpeer, mavital, mprivozn, myakove, mzhan, pm-eus, rvaknin, rwu, weizhan, whuang, ydu, ykaul, zhpeng |
| Target Milestone: | rc | Keywords: | ZStream |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | libvirt-0.9.10-21.el6_3.6 | Doc Type: | Bug Fix |
| Doc Text: |
Cause:
Libvirt client communicates with the libvirt daemon via our RPC system. The messages have limit for maximum size in order to prevent memory exhaustion. Whenever the daemon was about to receive a message it had to allocate memory up to the limit. So blind lifting of limit would cause libvirtd to be more memory hungry.
Consequence:
The limit was 65536 bytes (including libvirt headers). This wasn't enough for some big XMLs and hence big messages were dropped leaving client unable to fetch useful data.
Fix:
The buffer for incoming messages was made dynamic. Usually, messages are small enough so there is no need to allocate 64KB buffer for them. This allows us to size up the limit (up to 1MB) without making libvirtd use more memory than is really needed.
Result:
Libvirt is able to send bigger messages and hence fetch much more data.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-11-22 09:40:08 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 869557 | ||
| Bug Blocks: | |||
|
Description
Chris Pelland
2012-10-24 13:36:20 UTC
As we could backport just the patch to size up RPC limits, it would make libvirt to consume much more memory. That's because before my patchset libvirt was allocating the whole buffer (=the maximum message size) even for small messages. And this buffer was there through whole API execution. That's why I think we should backport the patch before as well:
commit eb635de1fed3257c5c62b552d1ec981c9545c1d7
Author: Michal Privoznik <mprivozn>
AuthorDate: Fri Apr 27 14:49:48 2012 +0200
Commit: Michal Privoznik <mprivozn>
CommitDate: Tue Jun 5 17:48:40 2012 +0200
rpc: Size up RPC limits
Since we are allocating RPC buffer dynamically, we can increase limits
for max. size of RPC message and RPC string. This is needed to cover
some corner cases where libvirt is run on such huge machines that their
capabilities XML is 4 times bigger than our current limit. This leaves
users with inability to even connect.
commit a2c304f6872f15c13c1cd642b74008009f7e115b
Author: Michal Privoznik <mprivozn>
AuthorDate: Thu Apr 26 17:21:24 2012 +0200
Commit: Michal Privoznik <mprivozn>
CommitDate: Tue Jun 5 17:48:40 2012 +0200
rpc: Switch to dynamically allocated message buffer
Currently, we are allocating buffer for RPC messages statically.
This is not such pain when RPC limits are small. However, if we want
ever to increase those limits, we need to allocate buffer dynamically,
based on RPC message len (= the first 4 bytes). Therefore we will
decrease our mem usage in most cases and still be flexible enough in
corner cases.
These patches are around since 0.9.13 (June 25 2012) and there has been just one bug found so far (probably worth backporting as well) - bug 845521 - fixed in this commit:
commit f8ef393ee3a67a61a4c991f50d62652ed81c2ebd
Author: Peter Krempa <pkrempa>
AuthorDate: Fri Aug 3 16:50:16 2012 +0200
Commit: Peter Krempa <pkrempa>
CommitDate: Fri Aug 3 23:30:01 2012 +0200
client: Free message when freeing client
The last message of the client was not freed leaking 4 bytes of memory
in the client when the remote daemon crashed while processing a message.
So I am okay with backporting these three patches as from my POV I consider them safe. BTW: RPC code is something used by *every* libvirt user, so if there were any bugs, they would have been discovered already.
Moving to POST: http://post-office.corp.redhat.com/archives/rhvirt-patches/2012-October/msg01191.html Verify this bug with libvirt-0.9.10-21.el6_3.6.x86_64
Steps:
1.Prepare a template xml file to define networks.
# cat templ.xml
<network>
<name>NET-#NIC#</name>
<forward mode='nat'/>
<bridge name='virbr-#NIC#' stp='on' delay='0' />
<ip address='192.168.221.#NIC#' netmask='255.255.255.255'>
</ip>
</network>
2. Prepare a script to define and start 250 networks automatically
# cat vnet.sh
#!/bin/sh
for i in {1..250}; do
sed "s/#NIC#/$i/g" templ.xml > net-$i.xml
virsh net-define net-$i.xml
virsh net-start NET-$i
rm -f net-$i.xml
sleep 1
done
3. Redo step1-2 to add other 250 networks, and check the result:
# virsh net-list --all |wc -l
504
Hi Michal, I have a question need your confirm. When i define&start more than 500 networks, and restart libvirtd service, then execute any virsh command will take several mins(only the first time). So is this normal? # service libvirtd restart Stopping libvirtd daemon: [ OK ] Starting libvirtd daemon: [ OK ] # time virsh list --all Id Name State ---------------------------------------------------- 4 test running real 3m16.719s user 0m0.010s sys 0m0.010s # time virsh list --all Id Name State ---------------------------------------------------- 4 test running real 0m0.063s user 0m0.009s sys 0m0.022s Yes, when libvirt is starting up it autostarts some objects, like domain, network, storage pool. For network, multiple commands are spawned (iptables - usually 13 times for the default network; then dnsmasq to be the dhcp server for domains). I believe this is the source of such delay. You can see it yourself - if your system is under heavy load then this is the case. However, I agree it should not last so long. If you think the same, we should open a new bug and leave this one VERIFIED. (In reply to comment #9) > However, I agree it should not last so long. If you think the same, we > should open a new bug and leave this one VERIFIED. I agree also, can you open a BZ with the data about how long these operations are taking? (In reply to comment #12) > (In reply to comment #9) > > However, I agree it should not last so long. If you think the same, we > > should open a new bug and leave this one VERIFIED. > > I agree also, can you open a BZ with the data about how long these > operations are taking? Yes, already open a new bug 877244. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-1484.html |