Bug 1014604

Summary: Race condition allocating veth devices with parallel LXC container creation
Product: Red Hat Enterprise Linux 7 Reporter: Daniel Berrangé <berrange>
Component: libvirtAssignee: Daniel Berrangé <berrange>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: acathrow, ajia, berrange, dallan, dyuan, fullung, jdenemar, lsu, mprivozn
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-1.1.1-26.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-13 09:29:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 910269, 992980, 1058606, 1086175    

Description Daniel Berrangé 2013-10-02 11:37:08 UTC
Description of problem:
Since the LXC driver was switched to have fine grained locking per VM, most of container creation happens in parallel. This exposed a flaw in the code allocating veth devices - it had a designed in race condition when called in parallel. This causes startup failures when starting guests in parallel



Version-Release number of selected component (if applicable):
libvirt-1.1.1-8.el7

How reproducible:
Sometimes

Steps to Reproduce:
1. Define config for many LXC containers each with bridged NICs
2. Start them all in parallel
3.

Actual results:
Some guests will fail to start

Expected results:
All guests start

Additional info:

Comment 4 Alex Jia 2013-10-16 09:16:53 UTC
Reproduced this on libvirt-1.1.1-8.el7.x86_64 with libvirt-sandbox-0.5.0-5.el7.x86_64 and kernel-3.10.0-33.el7.x86_64.

# for i in {1..100}; do virt-sandbox-service -c lxc:/// create -N dhcp,source=default mylxcsh$i /bin/bash;done

# for i in {1..100}; do virsh -c lxc:/// start mylxcsh$i & done
<slice>

error: internal error: Child process (ip link add veth0 type veth peer name veth1) unexpected exit status 2: RTNETLINK answers: File exists

Domain mylxcsh5 started

Domain mylxcsh3 started

XXXXXX

Domain mylxcsh42 started

error: Failed to start domain mylxcsh96
error: error: Failed to start domain mylxcsh60
internal error: Child process (ip link add veth16 type veth peer name veth21) unexpected exit status 2: RTNETLINK answers: File exists

</slice>

# virsh -c lxc:/// -q list | wc -l
18


Tested it on 

# for i in {1..100}; do virsh -c lxc:/// start mylxcsh$i & done

<slice>

error: Failed to start domain mylxcsh11
error: internal error: Failed to allocate free veth pair after 10 attempts

error: Failed to start domain mylxcsh24
error: internal error: Failed to allocate free veth pair after 10 attempts

error: Failed to start domain mylxcsh10
error: internal error: Failed to allocate free veth pair after 10 attempts

error: Failed to start domain mylxcsh43
error: internal error: Failed to allocate free veth pair after 10 attempts

</slice>


Daniel, are 10 attempts acceptable? or could users change attempts times? I guess 10 times are hard code. Thanks.


# virsh -c lxc:/// -q list |wc -l
95

# tail -2 /etc/libvirt/libvirtd.conf
max_clients = 1024
max_workers = 1024

Comment 5 Alex Jia 2013-10-16 09:21:37 UTC
(In reply to Alex Jia from comment #4)
> Tested it on 

Tested it on libvirt-1.1.1-9.el7.x86_64 with libvirt-sandbox-0.5.0-5.el7.x86_64 and kernel-3.10.0-33.el7.x86_64.

Comment 6 Alex Jia 2013-10-31 06:07:14 UTC
Daniel, could you help confirm issues on Comment4? thanks.

Comment 7 Alex Jia 2013-12-02 10:27:01 UTC
(In reply to Alex Jia from comment #4)
> Tested it on 

Retest this.

# rpm -q libvirt libvirt-sandbox kernel
libvirt-1.1.1-13.el7.x86_64
libvirt-sandbox-0.5.0-6.el7.x86_64
kernel-3.10.0-0.rc7.64.el7.x86_64

> 
> # for i in {1..100}; do virsh -c lxc:/// start mylxcsh$i & done
> 
> <slice>
> 
> error: Failed to start domain mylxcsh11
> error: internal error: Failed to allocate free veth pair after 10 attempts
> 
> error: Failed to start domain mylxcsh24
> error: internal error: Failed to allocate free veth pair after 10 attempts
> 
> error: Failed to start domain mylxcsh10
> error: internal error: Failed to allocate free veth pair after 10 attempts
> 
> error: Failed to start domain mylxcsh43
> error: internal error: Failed to allocate free veth pair after 10 attempts
> 
> </slice>
> 

Domain mylxcsh36 started

error: Failed to start domain mylxcsh43
error: internal error: Failed to allocate free veth pair after 10 attempts

Notes, only 1 cantainer can't be successfully started.

> 
> Daniel, are 10 attempts acceptable? or could users change attempts times? I
> guess 10 times are hard code. Thanks.
> 
> 
> # virsh -c lxc:/// -q list |wc -l
> 95

#  virsh -c lxc:/// -q list|grep mylxcsh|wc -l
99

> 
> # tail -2 /etc/libvirt/libvirtd.conf
> max_clients = 1024
> max_workers = 1024

# tail -3 /etc/libvirt/libvirtd.conf
max_clients = 20
max_workers = 20
max_queued_clients = 20

Notes, it seems the above limitation is invalid after restarting libvirtd.

In libvirt log:

# grep error /var/log/libvirt/libvirtd.log
2013-12-02 10:15:05.481+0000: 21664: error : virNetlinkEventCallback:340 : nl_recv returned with error: No buffer space available
2013-12-02 10:15:06.590+0000: 21873: error : virNetDevVethCreate:179 : internal error: Failed to allocate free veth pair after 10 attempts

Notes, what does mean about "No buffer space available"? and "10 attempts" are enough? can users change attempts times? thanks.

Comment 8 Alex Jia 2013-12-06 10:43:21 UTC
src/util/virnetdevveth.c:67:#define MAX_DEV_NUM 65536
src/util/virnetdevveth.c:120:#define MAX_VETH_RETRIES 10

It's very easy to hit error "Failed to allocate free veth pair after %d attempts" if parallel start LXC container.

Comment 9 Luwen Su 2014-02-12 06:24:27 UTC
Also reproduced with same steps in 
libvirt-1.1.1-22.el7.x86_64
kernel-3.10.0-86.el7.x86_64
libvirt-sandbox-0.5.0-9.el7.x86_64

something like 
error: Failed to start domain mylxcsh89
error: internal error: Failed to allocate free veth pair after 10 attempts

Comment 10 Jiri Denemark 2014-02-26 16:21:12 UTC
The from comment #4  should be fixed by an upstream commit v1.2.2-rc2-1-gc0d162c:

commit c0d162c68c2f19af8d55a435a9e372da33857048
Author: Michal Privoznik <mprivozn>
Date:   Tue Feb 25 16:41:07 2014 +0100

    virNetDevVethCreate: Serialize callers
    
    Consider dozen of LXC domains, each of them having this type of interface:
    
        <interface type='network'>
          <mac address='52:54:00:a7:05:4b'/>
          <source network='default'/>
        </interface>
    
    When starting these domain in parallel, all workers may meet in
    virNetDevVethCreate() where a race starts. Race over allocating veth
    pairs because allocation requires two steps:
    
      1) find first nonexistent '/sys/class/net/vnet%d/'
      2) run 'ip link add ...' command
    
    Now consider two threads. Both of them find N as the first unused veth
    index but only one of them succeeds allocating it. The other one fails.
    For such cases, we are running the allocation in a loop with 10 rounds.
    However this is very flaky synchronization. It should be rather used
    when libvirt is competing with other process than when libvirt threads
    fight each other. Therefore, internally we should use mutex to serialize
    callers, and do the allocation in loop (just in case we are competing
    with a different process). By the way we have something similar already
    since 1cf97c87.
    
    Signed-off-by: Michal Privoznik <mprivozn>

Comment 12 Jiri Denemark 2014-02-26 16:22:21 UTC
*** Bug 1070221 has been marked as a duplicate of this bug. ***

Comment 15 Luwen Su 2014-03-11 09:19:39 UTC
Test under libvirt-1.1.1-26.el7.x86_64
and via comment 4's steps

All containers start up in parallel 
and no error found both in libvirtd and system log.

So set it VERIFIED

Comment 16 Ludek Smid 2014-06-13 09:29:36 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.