Bug 1330712 - Performance optimization of docker devicemapper graph driver
Summary: Performance optimization of docker devicemapper graph driver
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker
Version: 7.3
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Vivek Goyal
QA Contact: atomic-bugs@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-26 18:32 UTC by Jeremy Eder
Modified: 2020-04-13 21:30 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-13 21:30:00 UTC
Target Upstream Version:


Attachments (Terms of Use)

Comment 1 Vivek Goyal 2016-04-26 18:47:56 UTC
marian, other day we were discussing the use case for an lvm graph driver. I am wondering if it is worth do a quick prototype implementation and see if it can provide better performance. This is assuming that multiple device operations can progress in parallel for most part and unlike docker devicemapper don't get serialized completely.

Comment 2 Vivek Goyal 2016-04-26 18:48:55 UTC
Or, is there any chance that libdevmapper can be thread safe so that multiple threads of docker can call into library in parallel instead of having to serialize all the operations.

Comment 3 Daniel Walsh 2016-04-26 19:07:06 UTC
I am not sure why openshift would trust a brand new backend rather then OverlayFS, even if we could quickly build a lvm backend.  But it would be good to know if lvm has better performance.

Comment 5 Zdenek Kabelac 2016-04-27 08:41:23 UTC
Basically you do not want serialize all operation into a single thread processing stack and you want to keep processing of operations as independent commands.

And yes, it should be tried to use lvm2 commands as direct replacement of docker device manipulations code.

Comment 6 Alasdair Kergon 2016-04-28 00:07:37 UTC
Try the new dm_udev_wait_immediate which will let you wait for udev outside the library while not holding the library mutex.

https://www.redhat.com/archives/lvm-devel/2016-April/msg00145.html
https://git.fedorahosted.org/cgit/lvm2.git/patch/?id=16019b518e287da19c87eb64229f5c3ca057cb05

Comment 9 Marian Csontos 2016-05-24 09:28:56 UTC
I analysed the out.csv data and found the following:

*lookupDevice*
  2. took
        <type 'float'>
        Nulls: False
        Min: 1.630804
        Max: 11622.047727
        Sum: 193923.888232
        Mean: 712.955471441
        Median: 289.74097
        Standard Deviation: 1261.40217467
        Unique values: 272

Row count: 272

================================================================================
*lookupDeviceWithLock*
================================================================================
  2. took
        <type 'float'>
        Nulls: False
        Min: 1.630804
        Max: 11622.047727
        Sum: 193914.978846
        Mean: 718.203625356
        Median: 291.2552555
        Standard Deviation: 1264.58521366
        Unique values: 270

Row count: 270

lookupDeviceWithLock is calling lookupDevice. And waiting for the lock takes almost no time at all (ignoring those 2 extra calls to lookupDevice from elsewhere.) So those 3 minutes are spent in lookupDevice, which does nothing but calls loadMetadata:

    func (devices *DeviceSet) loadMetadata(hash string) *devInfo {
    
            info := &devInfo{Hash: hash, devices: devices}
    
            jsonData, err := ioutil.ReadFile(devices.metadataFile(info))
            if err != nil {
                    return nil
            }
    
            if err := json.Unmarshal(jsonData, &info); err != nil {
                    return nil
            }
    
            if info.DeviceID > maxDeviceID {
                    logrus.Errorf("Ignoring Invalid DeviceId=%d", info.DeviceID)
                    return nil
            }
    
            return info
    }

IIUC looking at this function, lookupDevice is spending those 3 minutes reading and parsing JSON files
and there is nothing what LVM/DM could do to speed it up.

There is a cache (devices.Devices) in place, and I fail to understand this.

Jeremy, Vivek, was the source file instrumented incorrectly and is the csv is just plain wrong? Or is there something I am missing?

Everything would be easier and faster to resolve if go had a proper profiling tool. :-/

Comment 10 Vivek Goyal 2016-05-25 12:53:33 UTC
May be tests were done on a disk which was slow. Jeremy, any chance that these tests were done on an AWS instance. I have often seen that there additional disk can be very slow sometimes.

Comment 11 Marian Csontos 2016-05-25 13:31:58 UTC
(In reply to Vivek Goyal from comment #10)
> May be tests were done on a disk which was slow. Jeremy, any chance that
> these tests were done on an AWS instance. I have often seen that there
> additional disk can be very slow *sometimes*.

I hope not. Or does it make any sense to test performance on a platform where performance varies sometimes?

Comment 12 Jeremy Eder 2016-05-25 15:28:34 UTC
Not EC2.  Done in a KVM guest.  Backing storage is 6x300GB SAS 10k RAID6.

Comment 13 Marian Csontos 2016-05-26 15:46:06 UTC
I am not a performance expert, but using VMs for testing performance just does not sound like a good idea to me. Way too many layers to consider.

And we still do not know where's the bottleneck - is it CPU, I/O or memory bound?

If it is absolutely essential to use VMs, e.g. as the infrastructure is using them, I have few easy optimizations to try.

IIUC both the worker VMs and the built images should all be reproducible/throwaway, so we can optimize for speed, ignoring any data safety.

First optimization is of course using "writeback" on the VMs disks. This greatly improves file locking speed, as the file does not have to be written through all the layers down to the slow HDDs.

Using noop scheduler in the VMs and deadline on host brought the best results for my test VMs.

Providing enough^TM memory to the VMs, no swap, and let the host do swapping. (This may not be efficient in "cloud". But does anyone expect to get consistent performance from single VM in cloud? Want more? Scale out! It rhymes with /.c.l..out/ after all.)

What's the backing storage for VM images? Filesystem or LV? Raw, sparse, compressed? I suggest using linear LVs as these have less overhead than anything on top of FS.

Using RAID6 seems like a waste.
I suggest using no redundancy, maybe RAID0.

Also RAID6 for thin-pool metadata is not good.
We recommend linear or RAID1 volumes on the fastest available storage, ideally on a different disk than data disk(s).

I suggest using one of the disks for metadata volumes, and the rest for data in RAID0.

Comment 14 Marian Csontos 2016-05-26 15:46:51 UTC
Also I would absolutely love to see performance on real data. Even though DM can not share buffers, I have a hunch OverlayFS will have significant overhead too, as there will not be that much sharing in an image build service.

Comment 15 Daniel Walsh 2016-08-19 22:37:45 UTC
Any update on this Vivek?


Note You need to log in before you can comment on or make changes to this bug.