Bug 981729 - Improve handling of "max_clients" setting
Improve handling of "max_clients" setting
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.0
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Michal Privoznik
Virtualization Bugs
:
Depends On:
Blocks: TRACKER-bugs-affecting-libguestfs 992980 1058606 1086175
  Show dependency treegraph
 
Reported: 2013-07-05 11:32 EDT by Daniel Berrange
Modified: 2014-06-17 20:52 EDT (History)
11 users (show)

See Also:
Fixed In Version: libvirt-1.1.1-3.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 992980 1058606 1070221 (view as bug list)
Environment:
Last Closed: 2014-06-13 06:00:59 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Daniel Berrange 2013-07-05 11:32:09 EDT
Description of problem:
The 'max_clients' setting controls how many clients are allowed to connect to libvirt, as a protection against DOS attacker from unauthenticated users.

It is problematic when dealing with concurrent start of large numbers of containers, because this can trigger a a very large number of connections in a short period of time.

With the default max_clients=20, this easily causes container startup failure.

We can't simply raise the limit since it is DOS protection, but we can improve the behaviour somewhat.

Currently libvirt will unconditionally accept() any incoming socket, regardless of the current number of clients. Thus if the max limit is hit, libvirt will accept and immediately  close client connections.

It would be better if libvirt simply did not accept() the connection. This would let pending connections wait for a previous connection to close before continuing. The max number of queued connections can be controlled via the listen() syscall, and could be a fairly large number, since they consume minimal resources. This would be a new "max_queued_clients" setting in libvirtd.conf, perhaps as much as 1000 

A new 'max_anonymous_clients' setting could limit only those connections which are accept()ed, but not yet authenticated. This could be fairly low (perhaps current 20)

The existing 'max_clients' setting could then be used as a limit on total number of connections, and set to a far higher value (perhaps several 100 or more)

Version-Release number of selected component (if applicable):
1.1.0-1.el6.

How reproducible:
Always

Steps to Reproduce:
1. Attempt to open 30 concurrent connections to libvirt
2.
3.

Actual results:
Only first 20 succeed, the rest are dropped

Expected results:
10 connections are queued, pending close of 10 earlier connections

Additional info:
Comment 2 Alex Jia 2013-07-08 06:23:37 EDT
# tail -2 /etc/libvirt/libvirtd.conf
max_clients = 20
max_workers = 20


# for i in {1..30}; do virt-sandbox-service create -C -u httpd.service -N dhcp myapache$i;done

# for i in {1..30}; do virt-sandbox-service create start myapache$i & done

XXX

# Unable to open connection: Unable to open lxc:///: Cannot recv data: Connection reset by peer
Unable to open connection: Unable to open lxc:///: Cannot recv data: Connection reset by peer
Unable to open connection: Unable to open lxc:///: Cannot write data: Broken pipe
Unable to open connection: Unable to open lxc:///: Cannot write data: Broken pipe
Unable to open connection: Unable to open lxc:///: Cannot write data: Broken pipe
Unable to open connection: Unable to open lxc:///: Cannot write data: Broken pipe
Unable to open connection: Unable to open lxc:///: Cannot write data: Broken pipe
Unable to open connection: Unable to open lxc:///: Cannot write data: Broken pipe
Unable to open connection: Unable to open lxc:///: Cannot write data: Broken pipe
Unable to open connection: Unable to open lxc:///: Cannot write data: Broken pipe

And check libvirtd log:

2013-07-08 10:13:58.933+0000: 8034: error : virNetServerAddClient:262 : Too many active clients (20), dropping connection from 127.0.0.1;0
2013-07-08 10:13:58.941+0000: 8034: error : virNetServerAddClient:262 : Too many active clients (20), dropping connection from 127.0.0.1;0
2013-07-08 10:13:58.943+0000: 8034: error : virNetServerAddClient:262 : Too many active clients (20), dropping connection from 127.0.0.1;0
Comment 3 Michal Privoznik 2013-07-25 10:24:16 EDT
I've just proposed patches upstream:

https://www.redhat.com/archives/libvir-list/2013-July/msg01646.html
Comment 5 Daniel Berrange 2013-08-05 05:58:33 EDT
NB the upstream patches only implement half of this bug. There's still no separation of the limits for anonymous vs authenticated clients.
Comment 6 Michal Privoznik 2013-08-05 06:27:36 EDT
Ah, then I shouldn't have moved this to POST. Sorry.
Comment 7 Michal Privoznik 2013-08-05 06:50:47 EDT
After IRC discussion with Dan, we agreed to split this bug into two. The first part which is done (introducing "max_client" setting) is to be done in this bug. For the issue Dan's mentioning in comment #5 I've cloned this bug into bug 992980. Hence moving to POST again.
Comment 9 Alex Jia 2013-12-02 04:58:09 EST
(In reply to Alex Jia from comment #2)
> # tail -2 /etc/libvirt/libvirtd.conf
> max_clients = 20
> max_workers = 20 

# tail -3 /etc/libvirt/libvirtd.conf

max_clients = 20
max_workers = 20
max_queued_clients = 20

> # for i in {1..30}; do virt-sandbox-service create -C -u httpd.service -N
> dhcp myapache$i;done
> 
> # for i in {1..30}; do virt-sandbox-service start myapache$i & done
> 

# rpm -q libvirt-sandbox libvirt kernel
libvirt-sandbox-0.5.0-6.el7.x86_64
libvirt-1.1.1-13.el7.x86_64
kernel-3.10.0-0.rc7.64.el7.x86_64

Using new 'virsh' method parallel to start containers:

# for i in {1..30}; do virsh -c lxc:/// start myapache$i & done

And can't hit the following issues, Michal, is it an expected result? or I must run many more containers to reproduce this?

> 
> Unable to open connection: Unable to open lxc:///: Cannot write data: Broken
> pipe
> 
> And check libvirtd log:
> 
> 2013-07-08 10:13:58.933+0000: 8034: error : virNetServerAddClient:262 : Too
> many active clients (20), dropping connection from 127.0.0.1;0

<slice>

2013-12-02 08:52:16.173+0000: 12384: debug : lxcContainerWaitForContinue:392 : Wait continue on fd 51
2013-12-02 08:52:16.178+0000: 12383: debug : lxcContainerWaitForContinue:392 : Wait continue on fd 56
2013-12-02 08:52:16.183+0000: 12021: debug : lxcContainerWaitForContinue:392 : Wait continue on fd 54
2013-12-02 08:52:16.191+0000: 12380: debug : lxcContainerWaitForContinue:392 : Wait continue on fd 69
2013-12-02 08:52:16.198+0000: 15201: debug : lxcContainerWaitForContinue:392 : Wait continue on fd 75
2013-12-02 08:52:16.202+0000: 15202: debug : lxcContainerWaitForContinue:392 : Wait continue on fd 62
2013-12-02 08:52:16.203+0000: 12387: debug : lxcContainerWaitForContinue:392 : Wait continue on fd 83
2013-12-02 08:52:16.212+0000: 15359: debug : lxcContainerWaitForContinue:392 : Wait continue on fd 81
2013-12-02 08:52:16.227+0000: 12378: debug : lxcContainerWaitForContinue:392 : Wait continue on fd 67
2013-12-02 08:52:16.291+0000: 12384: debug : lxcContainerWaitForContinue:394 : Got continue on fd 51 1
2013-12-02 08:52:16.292+0000: 12018: debug : virLXCMonitorHandleEventInit:107 : Event init 19730
2013-12-02 08:52:16.292+0000: 12384: debug : virDomainFree:2428 : dom=0x7f9258014df0, (VM: name=myapache24, uuid=b295c4f7-7921-46e7-8142-ed795724671e)
2013-12-02 08:52:16.293+0000: 12024: debug : virDomainLookupByUUID:2186 : conn=0x7f929c004560, uuid=b295c4f7-7921-46e7-8142-ed795724671e
2013-12-02 08:52:16.293+0000: 12024: debug : virDomainFree:2428 : dom=0x7f9284008db0, (VM: name=myapache24, uuid=b295c4f7-7921-46e7-8142-ed795724671e)
2013-12-02 08:52:16.294+0000: 12018: debug : virConnectClose:1523 : conn=0x7f929c004560
2013-12-02 08:52:16.308+0000: 12383: debug : lxcContainerWaitForContinue:394 : Got continue on fd 56 1
2013-12-02 08:52:16.308+0000: 12018: debug : virLXCMonitorHandleEventInit:107 : Event init 19776
2013-12-02 08:52:16.309+0000: 12383: debug : virDomainFree:2428 : dom=0x7f926400d3b0, (VM: name=myapache27, uuid=0d1096c3-792c-4be8-a701-3e0067d12e0a)
2013-12-02 08:52:16.310+0000: 12385: debug : virDomainLookupByUUID:2186 : conn=0x7f9298011110, uuid=0d1096c3-792c-4be8-a701-3e0067d12e0a
2013-12-02 08:52:16.310+0000: 12385: debug : virDomainFree:2428 : dom=0x7f925c010e60, (VM: name=myapache27, uuid=0d1096c3-792c-4be8-a701-3e0067d12e0a)
2013-12-02 08:52:16.312+0000: 12018: debug : virConnectClose:1523 : conn=0x7f9298011110

</slice>
Comment 10 Alex Jia 2014-02-25 05:18:06 EST
Daniel, I can successfully start 41 containers not 40 now, is it an expected result? 

# tail -3 /etc/libvirt/libvirtd.conf 
max_clients = 20
max_workers = 20
max_queued_clients = 20

# for i in {1..50}; do virt-sandbox-service create -C -u httpd.service -N dhcp myapache$i;done

# for i in {1..50}; do virsh -c lxc:/// start myapache$i & done

# virsh -c lxc:/// -q list |wc -l
41

# rpm -q libvirt-daemon libvirt-sandbox kernel
libvirt-daemon-1.1.1-23.el7.x86_64
libvirt-sandbox-0.5.0-9.el7.x86_64
kernel-3.10.0-86.el7.x86_64

Additional info:

error: Failed to start domain myapache36
error: internal error: Failed to allocate free veth pair after 10 attempts

error: Failed to start domain myapache29
error: internal error: Failed to allocate free veth pair after 10 attempts

NOTE: Maybe, 10 attempts are too few for some users then they possibly want to change this, so I think it will be better if we have a configuration item for it, otherwise, we should document 10 attempts in libvirtd.conf or relevant guide.
Comment 11 Michal Privoznik 2014-02-25 10:51:28 EST
(In reply to Alex Jia from comment #10)
> Daniel, I can successfully start 41 containers not 40 now, is it an expected
> result? 
> 
> # tail -3 /etc/libvirt/libvirtd.conf 
> max_clients = 20
> max_workers = 20
> max_queued_clients = 20
> 
> # for i in {1..50}; do virt-sandbox-service create -C -u httpd.service -N
> dhcp myapache$i;done
> 
> # for i in {1..50}; do virsh -c lxc:/// start myapache$i & done
> 
> # virsh -c lxc:/// -q list |wc -l
> 41

Yes and no. Kernel does some caching on sockets and some partial opening even if the server is not currently responsive too. So you may end up with more than 40 guests running. Hence I think anything above or equal to 40 is okay.

> 
> # rpm -q libvirt-daemon libvirt-sandbox kernel
> libvirt-daemon-1.1.1-23.el7.x86_64
> libvirt-sandbox-0.5.0-9.el7.x86_64
> kernel-3.10.0-86.el7.x86_64
> 
> Additional info:
> 
> error: Failed to start domain myapache36
> error: internal error: Failed to allocate free veth pair after 10 attempts
> 
> error: Failed to start domain myapache29
> error: internal error: Failed to allocate free veth pair after 10 attempts
> 

This is an internal (buggy) implementation. Let me see if I can fix this.
Comment 12 Michal Privoznik 2014-02-25 11:08:22 EST
Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2014-February/msg01548.html
Comment 14 Michal Privoznik 2014-02-26 07:32:26 EST
So, after discussion on my backport, the bug raised in comment 10 is a separate issue and deserves own bug. I'm moving this back to MODIFIED, as the request is complete and creating a new bug for the veth issue: bug 1070221.
Comment 15 dyuan 2014-03-11 23:40:50 EDT
Move to VERIFIED since the separate bug is filed and already verified.
Comment 16 Ludek Smid 2014-06-13 06:00:59 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.