Bug 981726

Summary:

Fine grained locking in LXC driver

Product:

Red Hat Enterprise Linux 7

Reporter:

Daniel Berrangé <berrange>

Component:

libvirt

Assignee:

Michal Privoznik <mprivozn>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

7.0

CC:

acathrow, ajia, berrange, dallan, dyuan, gsun, lsu, mprivozn, weizhan, zpeng

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

libvirt-1.1.1-1.el7

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-06-13 10:55:36 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

960861

Attachments:

Description	Flags
lxc0.xml	none
myapache1.xml	none

Description Daniel Berrangé 2013-07-05 15:24:53 UTC

Description of problem:
The QEMU driver was recently changed to remove use of its global driver lock almost entirely, in favour of per-VM locks. The LXC driver still relies on its global driver lock for some large periods of time, in particular container start and even worse, container stop.

If you start 100 containers and then kill all the libvirt_lxc processes. The entire libvirt LXC driver will hang for a very long time due to the driver lock.

The LXC driver needs to be adapted to use the same locking model as the QEMU driver.

This will be critical to meet OpenShift's requirement to run 1000's of LXC instances per host.

Version-Release number of selected component (if applicable):
libirt-1.1.0-1.el7

How reproducible:
Always

Steps to Reproduce:
1. Start 100 containers
2. killall libvirt_lxc
3.  virsh -c lxc:/// list

Actual results:
The virsh list command hangs for a very long time, until all VM cleanup is finished.

Expected results:
No hang.

Additional info:

Comment 2 Alex Jia 2013-07-08 07:37:54 UTC

I reproduced this question with 200 running container, the 'virsh -c lxc:/// list' take more than 6mins to list running containers.


[root@dell-op790-03 ~]# ps -ef|grep sandbox|grep -v grep|wc -l
200

[root@dell-op790-03 ~]# killall libvirt_lxc

Notes, in fact, if I don't kill all of libvirt_lxc process, the 'virsh -c lxc:/// list' still is very slow(more than 3mins) to list running containers, it's another bug 980839.

[root@dell-op790-03 ~]# time virsh -c lxc:/// list
 Id    Name                           State
----------------------------------------------------


real	6m45.384s
user	0m0.004s
sys	0m0.044s

Comment 3 Michal Privoznik 2013-07-17 14:00:30 UTC

Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2013-July/msg01085.html

Comment 4 Michal Privoznik 2013-07-18 12:23:10 UTC

To POST:

commit dbeb04a65cb986db951af0382495b41692d1ad9d
Author:     Michal Privoznik <mprivozn>
AuthorDate: Wed Jul 17 09:37:09 2013 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jul 18 14:16:54 2013 +0200

    Introduce lxcDomObjFromDomain
    
    Similarly to qemu driver, we can use a helper function to
    lookup a domain instead of copying multiple lines around.

commit eb150c86b4c645de7aa1f002cc11a40f0adb43bf
Author:     Michal Privoznik <mprivozn>
AuthorDate: Wed Jul 17 09:20:26 2013 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jul 18 14:16:54 2013 +0200

    Remove lxcDriverLock from almost everywhere
    
    With the majority of fields in the virLXCDriverPtr struct
    now immutable or self-locking, there is no need for practically
    any methods to be using the LXC driver lock. Only a handful
    of helper APIs now need it.

commit 2a82171affa266e4afe472ea7b79038e338aab52
Author:     Michal Privoznik <mprivozn>
AuthorDate: Wed Jul 17 09:14:42 2013 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jul 18 14:16:54 2013 +0200

    lxc: Make activeUsbHostdevs use locks
    
    The activeUsbHostdevs item in LXCDriver are lockable, but the lock has
    to be called explicitly. Call the virObject(Un)Lock() in order to
    achieve mutual exclusion once lxcDriverLock is removed.

commit 64ec738e58917c7a94c862fca725335a4fff3e11
Author:     Michal Privoznik <mprivozn>
AuthorDate: Mon Jul 15 11:43:10 2013 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jul 18 14:16:54 2013 +0200

    Stop accessing driver->caps directly in LXC driver
    
    The 'driver->caps' pointer can be changed on the fly. Accessing
    it currently requires the global driver lock. Isolate this
    access in a single helper, so a future patch can relax the
    locking constraints.

commit c86950533a0457cb8d6515088c925213ea629b9f
Author:     Michal Privoznik <mprivozn>
AuthorDate: Mon Jul 15 19:08:11 2013 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jul 18 14:16:54 2013 +0200

    lxc: switch to virCloseCallbacks API

commit 4deeb74d01adfe1d0d36914785ba24ad7f74060e
Author:     Michal Privoznik <mprivozn>
AuthorDate: Tue Jul 16 19:20:24 2013 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jul 18 14:16:54 2013 +0200

    Introduce annotations for virLXCDriverPtr fields
    
    Annotate the fields in virLXCDriverPtr to indicate the locking
    rules for their use.

commit 29bed27eb4434a148da68c8bdf48b52bad3c2567
Author:     Michal Privoznik <mprivozn>
AuthorDate: Tue Jul 16 19:05:06 2013 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jul 18 14:16:54 2013 +0200

    lxc: Use atomic ops for driver->nactive

commit 7fca37554c2059f13a3de32bcc29c3c1e404fbc6
Author:     Michal Privoznik <mprivozn>
AuthorDate: Tue Jul 16 17:45:05 2013 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jul 18 14:16:53 2013 +0200

    Introduce a virLXCDriverConfigPtr object
    
    Currently the virLXCDriverPtr struct contains an wide variety
    of data with varying access needs. Move all the static config
    data into a dedicated virLXCDriverConfigPtr object. The only
    locking requirement is to hold the driver lock, while obtaining
    an instance of virLXCDriverConfigPtr. Once a reference is held
    on the config object, it can be used completely lockless since
    it is immutable.
    
    NB, not all APIs correctly hold the driver lock while getting
    a reference to the config object in this patch. This is safe
    for now since the config is never updated on the fly. Later
    patches will address this fully.

commit 7e94a1a4ea6581ea876c61df73173a743972a00e
Author:     Michal Privoznik <mprivozn>
AuthorDate: Thu Jul 18 13:34:55 2013 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jul 18 14:16:53 2013 +0200

    virLXCDriver: Drop unused @cgroup
    
    It is not used anywhere, so it makes no sense to have it there.

commit 272769beccd7479c75e700a6cbc138d495db66fb
Author:     Michal Privoznik <mprivozn>
AuthorDate: Mon Jul 15 16:53:13 2013 +0200
Commit:     Michal Privoznik <mprivozn>
CommitDate: Thu Jul 18 14:16:53 2013 +0200

    qemu: Move close callbacks handling into util/virclosecallbacks.c


v1.1.0-251-gdbeb04a

Comment 5 Alex Jia 2013-07-23 10:56:43 UTC

To successfully start 89 apache containers then run steps based on Comment 2, it seems list performance is still poor.

Libvirt downstream(libvirt-1.1.0-2.el7.x86_64, libvirt-sandbox-0.2.1-1.el7.x86_64):

# for i in {1..100}; do time virt-sandbox-service start myapache$i & done

# ps -ef|grep sandbox|grep -v grep|wc -l
89

Notes, only have 89 containers are successfully started.

# killall libvirt_lxc && time virsh -c lxc:/// list
 Id    Name                           State
----------------------------------------------------


real	3m2.865s
user	0m0.007s
sys	0m0.007s

Libvirt upstream(commit 7729a16):

# ./daemon/libvirtd -d

# ps -ef|grep sandbox|grep -v grep|wc -l
89


# killall libvirt_lxc && time ./tools/virsh -c lxc:/// list
 Id    Name                           State
----------------------------------------------------


real	2m53.398s
user	0m0.004s
sys	0m0.012s

Comment 6 Michal Privoznik 2013-07-24 17:05:22 UTC

Alex,

do you have debug logging turned on? I've run oprofile several times and found out that the most time is spent in virLogVMessage and virAsprintfInternal (when formatting the log message). Although, I'm not able to get even close to your times:

libvirt.git # pgrep libvirt_lxc | wc -l
100
libvirt.git # killall libvirt_lxc && time virsh -c lxc:/// list
 Id    Name                           State
----------------------------------------------------


real    0m9.184s
user    0m0.010s
sys     0m0.000s

Can you share the domain XML with me? All my containers do is run bash.

Comment 7 Alex Jia 2013-07-25 03:26:42 UTC

(In reply to Michal Privoznik from comment #6)
> Alex,
> Can you share the domain XML with me? All my containers do is run bash.

I created systemd linux container not generic linux container, you may create 100 apache container with the following steps:

# for i in {1..100}; do virt-sandbox-service create -C -u httpd.service -N dhcp,source=default myapache$i;done

To start 100 containers

# tail -2 /etc/libvirt/libvirtd.conf 
max_clients = 100
max_workers = 100

# for i in {1..100}; do virt-sandbox-service start myapache$i & done

Notes, it should be "&" not ";" followed by "myapache$i", in addition, if your host hasn't enough memory then some containers will be killed, so the running containers may be less than 100. And then you may check LXC guest XML configuration with 'virsh -c lxc:/// dumpxml myapache$i'.

If you can't reproduce my case, please restart libvirtd service then try again, as usual, the first time may be very slow for listing running container then will be very fast on the second time if you don't restart libvirtd service, it seems libvirt caches something.

Michal, also please show your container XML configuration and test steps, thanks.

Comment 8 Michal Privoznik 2013-07-25 07:21:31 UTC

Well, the problem is, I can't get virt-sandbox-service running on my system. Anyway, for XML I'm using see attachment. But I think the other bottleneck here is our event loop. If an lxc container dies, the event loop issues a callback to cleanup cgroups, mark corresponding domain as inactive, etc. This is all done in a singe thread. The event loop thread. So, if one hundred containers die at once, the even loop must issue one hundred callbacks prior executing 'virsh list'. However, I am hesitant to say removing this bottleneck will make significant change to time you're measuring. As I've said, cleanup callback removes all container processes from cgroups. And I think this is even the bigger bottleneck than the one in the event loop. Especially if apache you're starting spawns some processes.

Having said that, I'm still curious how did you get to 3 minutes. Can you 'virsh dumpxml myapache1' and share the output for me?

Comment 9 Michal Privoznik 2013-07-25 07:22:36 UTC

Created attachment 778101 [details]
lxc0.xml

Comment 10 Alex Jia 2013-07-25 09:12:02 UTC

Created attachment 778145 [details]
myapache1.xml

Comment 11 Daniel Berrangé 2013-07-25 09:22:12 UTC

I think one thing to be aware of when testing this bug, is that its goal was *not* to solve every scalability problem in the LXC driver. It was specifically just to switch the LXC driver to use fine grained locking, instead of a global mutex. This allows greater concurrency for operations. For example if a 'virsh destory foo' takes 3 seconds to complete, this bug allows for a 'virsh dominfo bar' to happen without blocking, since it is a different VM & thus different lock. It should also have sped up the creation of multiple containers in parallel. 

I fully expect that more scalability problems will exist. I'm not surprised that we still have problems in shutdown of containers, for the reasons Michal describes wrt the event loop. This is why I left  bug 960861 open independently rather than marking it a duplicate of this.

Comment 12 Luwen Su 2013-08-02 08:26:36 UTC

Hi Michal , 

The procedure also cost long time on my testing.

#rpm -q libvirt libvirt-sandbox
libvirt-1.1.1-1.el7.x86_64
libvirt-sandbox-0.5.0-1.el7.x86_64


1.Create and start
#for i in {1..100}; do virt-sandbox-service create -C -u httpd.service -N dhcp,source=default myapache$i;done

To start 100 containers
# tail -2 /etc/libvirt/libvirtd.conf 
max_clients = 100
max_workers = 100

# for i in {1..100}; do virsh -c lxc:/// start myapache$i & done

2.Calculate and kill

#ps -ef|grep lxc|grep -v grep|wc -l
44
# killall libvirt_lxc && time virsh -c lxc:/// list
 Id    Name                           State
----------------------------------------------------

real	1m23.168s
user	0m0.010s
sys	0m0.008s

3.
Lots of logs like

libvirtd[25434]:Unable to remove /sys/fs/cgroup/cpu,cpuacct/machine/myapache11.libvirt-lxc/system/libvirtd.service/system/ (16)
libvirtd[25434]: Unable to remove /sys/fs/cgroup/cpu,cpuacct/machine/myapache11.libvirt-lxc/system/libvirtd.service/system//systemd-journald.service 


same xml as comment 10 , ajia's . Any suggestion ?

Comment 13 Alex Jia 2013-08-02 08:34:38 UTC

(In reply to time.su from comment #12)
 
> same xml as comment 10 , ajia's . Any suggestion ?

In fact, you need to verify the bug with LXC and QEMU driver, I think you may try steps according to DB's advise on Comment 11.

Comment 14 Michal Privoznik 2013-08-05 10:25:32 UTC

Well, I don't think the test you are using is right for the reasons I've described in comment #8. What about trying tests Daniel describes in comment #11?

Comment 15 Luwen Su 2013-08-06 02:42:23 UTC

(In reply to Michal Privoznik from comment #14)
> Well, I don't think the test you are using is right for the reasons I've
> described in comment #8. What about trying tests Daniel describes in comment
> #11?

Yeah , i think i made a mistake . I will have a try basing on you mentioned and the patch .

Comment 16 Luwen Su 2013-08-08 07:18:26 UTC

Hi Michal , 
   I did two things when test the bug

   1.
    Just like your pathes mentioned ,  i started 4 groups lxc which each  group    has 10 guests , and then start and destroy 10 times individual .
    The libvirt 1.1.1-2 saved more than 20 ses than libvirt 1.0.6 average

One group template , and run them via 1 group & 2 group ......
CONNECT="virsh -c lxc:///"
for j in {1..10}
      do
     for i in {11..20}
            do
               $CONNECT start lxc$i ; $CONNECT destroy lxc$i
            done

     done


    2.
    Start and destory one guest 100 times in a loop , and at the same time query another guest's dominfo , like

for i in {1..100}
   do
  {
                      
      $CONNECT start lxc1 ;  $CONNECT destroy lxc1 & getinfo #$CONNECT dominfo lxc2
                       
   } & getinfo
  done

libvirt 1.1.1-2 cost
real	0m16.965s
user	0m3.995s
sys	0m2.463s

libvirt 1.0.6 cost 30 secs average


Is it okay for verified ?

Comment 17 Michal Privoznik 2013-08-08 08:39:46 UTC

(In reply to time.su from comment #16)
> Hi Michal , 
>    I did two things when test the bug
> 
>    1.
>     Just like your pathes mentioned ,  i started 4 groups lxc which each 
> group    has 10 guests , and then start and destroy 10 times individual .
>     The libvirt 1.1.1-2 saved more than 20 ses than libvirt 1.0.6 average
> 
> One group template , and run them via 1 group & 2 group ......
> CONNECT="virsh -c lxc:///"
> for j in {1..10}
>       do
>      for i in {11..20}
>             do
>                $CONNECT start lxc$i ; $CONNECT destroy lxc$i
>             done
> 
>      done

You could get even better results, if you'd write a python script that shares a connection object so you don't have to connect & disconnect each time. But that just a cosmetics.

> 
> 
>     2.
>     Start and destory one guest 100 times in a loop , and at the same time
> query another guest's dominfo , like
> 
> for i in {1..100}
>    do
>   {
>                       
>       $CONNECT start lxc1 ;  $CONNECT destroy lxc1 & getinfo #$CONNECT
> dominfo lxc2
>                        
>    } & getinfo
>   done
> 
> libvirt 1.1.1-2 cost
> real	0m16.965s
> user	0m3.995s
> sys	0m2.463s
> 
> libvirt 1.0.6 cost 30 secs average
> 
> 
> Is it okay for verified ?

Yes. But maybe we should ask Dan if it's okay to move this to VERIFIED. I understand that even more locks can be broken into smaller ones, but I think this is a good starting position for that. I mean, we can do that later as rhel-7.0 life goes on (in a separate bug). Dan, what's your oppinion?

Comment 18 Daniel Berrangé 2013-08-08 08:42:14 UTC

> > libvirt 1.1.1-2 cost
> > real	0m16.965s
> > user	0m3.995s
> > sys	0m2.463s
> > 
> > libvirt 1.0.6 cost 30 secs average
> > 
> > 
> > Is it okay for verified ?
> 
> Yes. But maybe we should ask Dan if it's okay to move this to VERIFIED. I
> understand that even more locks can be broken into smaller ones, but I think
> this is a good starting position for that. I mean, we can do that later as
> rhel-7.0 life goes on (in a separate bug). Dan, what's your oppinion?

Yes, this bug was never intending to solve all problems - it is just one step in an incremental effort to improve scalability. From a coding & design POV it has achieved its aims, regardless of what performance numbers you actually see in testing.

Comment 19 Luwen Su 2013-08-09 10:30:49 UTC

Okay , per comment 16 , 17 , 18 , i set this one VERIFIED .

Thank both of your's kindly help and explaintion.

Comment 20 Ludek Smid 2014-06-13 10:55:36 UTC

This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.