1014604 – Race condition allocating veth devices with parallel LXC container creation

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1014604 - Race condition allocating veth devices with parallel LXC container creation

Summary: Race condition allocating veth devices with parallel LXC container creation

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Daniel Berrangé
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1070221 (view as bug list)
Depends On:
Blocks:	TRACKER-bugs-affecting-libguestfs 992980 1058606 1086175
TreeView+	depends on / blocked

Reported:	2013-10-02 11:37 UTC by Daniel Berrangé
Modified:	2014-06-18 00:56 UTC (History)
CC List:	9 users (show)
Fixed In Version:	libvirt-1.1.1-26.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-06-13 09:29:36 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Daniel Berrangé 2013-10-02 11:37:08 UTC

Description of problem:
Since the LXC driver was switched to have fine grained locking per VM, most of container creation happens in parallel. This exposed a flaw in the code allocating veth devices - it had a designed in race condition when called in parallel. This causes startup failures when starting guests in parallel



Version-Release number of selected component (if applicable):
libvirt-1.1.1-8.el7

How reproducible:
Sometimes

Steps to Reproduce:
1. Define config for many LXC containers each with bridged NICs
2. Start them all in parallel
3.

Actual results:
Some guests will fail to start

Expected results:
All guests start

Additional info:

Comment 4 Alex Jia 2013-10-16 09:16:53 UTC

Reproduced this on libvirt-1.1.1-8.el7.x86_64 with libvirt-sandbox-0.5.0-5.el7.x86_64 and kernel-3.10.0-33.el7.x86_64.

# for i in {1..100}; do virt-sandbox-service -c lxc:/// create -N dhcp,source=default mylxcsh$i /bin/bash;done

# for i in {1..100}; do virsh -c lxc:/// start mylxcsh$i & done
<slice>

error: internal error: Child process (ip link add veth0 type veth peer name veth1) unexpected exit status 2: RTNETLINK answers: File exists

Domain mylxcsh5 started

Domain mylxcsh3 started

XXXXXX

Domain mylxcsh42 started

error: Failed to start domain mylxcsh96
error: error: Failed to start domain mylxcsh60
internal error: Child process (ip link add veth16 type veth peer name veth21) unexpected exit status 2: RTNETLINK answers: File exists

</slice>

# virsh -c lxc:/// -q list | wc -l
18


Tested it on 

# for i in {1..100}; do virsh -c lxc:/// start mylxcsh$i & done

<slice>

error: Failed to start domain mylxcsh11
error: internal error: Failed to allocate free veth pair after 10 attempts

error: Failed to start domain mylxcsh24
error: internal error: Failed to allocate free veth pair after 10 attempts

error: Failed to start domain mylxcsh10
error: internal error: Failed to allocate free veth pair after 10 attempts

error: Failed to start domain mylxcsh43
error: internal error: Failed to allocate free veth pair after 10 attempts

</slice>


Daniel, are 10 attempts acceptable? or could users change attempts times? I guess 10 times are hard code. Thanks.


# virsh -c lxc:/// -q list |wc -l
95

# tail -2 /etc/libvirt/libvirtd.conf
max_clients = 1024
max_workers = 1024

Comment 5 Alex Jia 2013-10-16 09:21:37 UTC

(In reply to Alex Jia from comment #4)
> Tested it on 

Tested it on libvirt-1.1.1-9.el7.x86_64 with libvirt-sandbox-0.5.0-5.el7.x86_64 and kernel-3.10.0-33.el7.x86_64.

Comment 6 Alex Jia 2013-10-31 06:07:14 UTC

Daniel, could you help confirm issues on Comment4? thanks.

Comment 7 Alex Jia 2013-12-02 10:27:01 UTC

(In reply to Alex Jia from comment #4)
> Tested it on 

Retest this.

# rpm -q libvirt libvirt-sandbox kernel
libvirt-1.1.1-13.el7.x86_64
libvirt-sandbox-0.5.0-6.el7.x86_64
kernel-3.10.0-0.rc7.64.el7.x86_64

> 
> # for i in {1..100}; do virsh -c lxc:/// start mylxcsh$i & done
> 
> <slice>
> 
> error: Failed to start domain mylxcsh11
> error: internal error: Failed to allocate free veth pair after 10 attempts
> 
> error: Failed to start domain mylxcsh24
> error: internal error: Failed to allocate free veth pair after 10 attempts
> 
> error: Failed to start domain mylxcsh10
> error: internal error: Failed to allocate free veth pair after 10 attempts
> 
> error: Failed to start domain mylxcsh43
> error: internal error: Failed to allocate free veth pair after 10 attempts
> 
> </slice>
> 

Domain mylxcsh36 started

error: Failed to start domain mylxcsh43
error: internal error: Failed to allocate free veth pair after 10 attempts

Notes, only 1 cantainer can't be successfully started.

> 
> Daniel, are 10 attempts acceptable? or could users change attempts times? I
> guess 10 times are hard code. Thanks.
> 
> 
> # virsh -c lxc:/// -q list |wc -l
> 95

#  virsh -c lxc:/// -q list|grep mylxcsh|wc -l
99

> 
> # tail -2 /etc/libvirt/libvirtd.conf
> max_clients = 1024
> max_workers = 1024

# tail -3 /etc/libvirt/libvirtd.conf
max_clients = 20
max_workers = 20
max_queued_clients = 20

Notes, it seems the above limitation is invalid after restarting libvirtd.

In libvirt log:

# grep error /var/log/libvirt/libvirtd.log
2013-12-02 10:15:05.481+0000: 21664: error : virNetlinkEventCallback:340 : nl_recv returned with error: No buffer space available
2013-12-02 10:15:06.590+0000: 21873: error : virNetDevVethCreate:179 : internal error: Failed to allocate free veth pair after 10 attempts

Notes, what does mean about "No buffer space available"? and "10 attempts" are enough? can users change attempts times? thanks.

Comment 8 Alex Jia 2013-12-06 10:43:21 UTC

src/util/virnetdevveth.c:67:#define MAX_DEV_NUM 65536
src/util/virnetdevveth.c:120:#define MAX_VETH_RETRIES 10

It's very easy to hit error "Failed to allocate free veth pair after %d attempts" if parallel start LXC container.

Comment 9 Luwen Su 2014-02-12 06:24:27 UTC

Also reproduced with same steps in 
libvirt-1.1.1-22.el7.x86_64
kernel-3.10.0-86.el7.x86_64
libvirt-sandbox-0.5.0-9.el7.x86_64

something like 
error: Failed to start domain mylxcsh89
error: internal error: Failed to allocate free veth pair after 10 attempts

Comment 10 Jiri Denemark 2014-02-26 16:21:12 UTC

The from comment #4  should be fixed by an upstream commit v1.2.2-rc2-1-gc0d162c:

commit c0d162c68c2f19af8d55a435a9e372da33857048
Author: Michal Privoznik <mprivozn>
Date:   Tue Feb 25 16:41:07 2014 +0100

    virNetDevVethCreate: Serialize callers
    
    Consider dozen of LXC domains, each of them having this type of interface:
    
        <interface type='network'>
          <mac address='52:54:00:a7:05:4b'/>
          <source network='default'/>
        </interface>
    
    When starting these domain in parallel, all workers may meet in
    virNetDevVethCreate() where a race starts. Race over allocating veth
    pairs because allocation requires two steps:
    
      1) find first nonexistent '/sys/class/net/vnet%d/'
      2) run 'ip link add ...' command
    
    Now consider two threads. Both of them find N as the first unused veth
    index but only one of them succeeds allocating it. The other one fails.
    For such cases, we are running the allocation in a loop with 10 rounds.
    However this is very flaky synchronization. It should be rather used
    when libvirt is competing with other process than when libvirt threads
    fight each other. Therefore, internally we should use mutex to serialize
    callers, and do the allocation in loop (just in case we are competing
    with a different process). By the way we have something similar already
    since 1cf97c87.
    
    Signed-off-by: Michal Privoznik <mprivozn>

Comment 12 Jiri Denemark 2014-02-26 16:22:21 UTC

*** Bug 1070221 has been marked as a duplicate of this bug. ***

Comment 15 Luwen Su 2014-03-11 09:19:39 UTC

Test under libvirt-1.1.1-26.el7.x86_64
and via comment 4's steps

All containers start up in parallel 
and no error found both in libvirtd and system log.

So set it VERIFIED

Comment 16 Ludek Smid 2014-06-13 09:29:36 UTC

This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.