Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1330712

Summary:	Performance optimization of docker devicemapper graph driver
Product:	Red Hat Enterprise Linux 7	Reporter:	Jeremy Eder <jeder>
Component:	docker	Assignee:	Vivek Goyal <vgoyal>
Status:	CLOSED WONTFIX	QA Contact:	atomic-bugs <atomic-bugs>
Severity:	high	Docs Contact:
Priority:	high
Version:	7.3	CC:	acarter, agk, dwalsh, lsm5, mcsontos, ndordet, twaugh, vgoyal, zkabelac
Target Milestone:	rc	Keywords:	Extras
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-04-13 21:30:00 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 1 Vivek Goyal 2016-04-26 18:47:56 UTC

marian, other day we were discussing the use case for an lvm graph driver. I am wondering if it is worth do a quick prototype implementation and see if it can provide better performance. This is assuming that multiple device operations can progress in parallel for most part and unlike docker devicemapper don't get serialized completely.

Comment 2 Vivek Goyal 2016-04-26 18:48:55 UTC

Or, is there any chance that libdevmapper can be thread safe so that multiple threads of docker can call into library in parallel instead of having to serialize all the operations.

Comment 3 Daniel Walsh 2016-04-26 19:07:06 UTC

I am not sure why openshift would trust a brand new backend rather then OverlayFS, even if we could quickly build a lvm backend.  But it would be good to know if lvm has better performance.

Comment 5 Zdenek Kabelac 2016-04-27 08:41:23 UTC

Basically you do not want serialize all operation into a single thread processing stack and you want to keep processing of operations as independent commands.

And yes, it should be tried to use lvm2 commands as direct replacement of docker device manipulations code.

Comment 6 Alasdair Kergon 2016-04-28 00:07:37 UTC

Try the new dm_udev_wait_immediate which will let you wait for udev outside the library while not holding the library mutex.

https://www.redhat.com/archives/lvm-devel/2016-April/msg00145.html
https://git.fedorahosted.org/cgit/lvm2.git/patch/?id=16019b518e287da19c87eb64229f5c3ca057cb05

Comment 9 Marian Csontos 2016-05-24 09:28:56 UTC

I analysed the out.csv data and found the following:

*lookupDevice*
  2. took
        <type 'float'>
        Nulls: False
        Min: 1.630804
        Max: 11622.047727
        Sum: 193923.888232
        Mean: 712.955471441
        Median: 289.74097
        Standard Deviation: 1261.40217467
        Unique values: 272

Row count: 272

================================================================================
*lookupDeviceWithLock*
================================================================================
  2. took
        <type 'float'>
        Nulls: False
        Min: 1.630804
        Max: 11622.047727
        Sum: 193914.978846
        Mean: 718.203625356
        Median: 291.2552555
        Standard Deviation: 1264.58521366
        Unique values: 270

Row count: 270

lookupDeviceWithLock is calling lookupDevice. And waiting for the lock takes almost no time at all (ignoring those 2 extra calls to lookupDevice from elsewhere.) So those 3 minutes are spent in lookupDevice, which does nothing but calls loadMetadata:

    func (devices *DeviceSet) loadMetadata(hash string) *devInfo {
    
            info := &devInfo{Hash: hash, devices: devices}
    
            jsonData, err := ioutil.ReadFile(devices.metadataFile(info))
            if err != nil {
                    return nil
            }
    
            if err := json.Unmarshal(jsonData, &info); err != nil {
                    return nil
            }
    
            if info.DeviceID > maxDeviceID {
                    logrus.Errorf("Ignoring Invalid DeviceId=%d", info.DeviceID)
                    return nil
            }
    
            return info
    }

IIUC looking at this function, lookupDevice is spending those 3 minutes reading and parsing JSON files
and there is nothing what LVM/DM could do to speed it up.

There is a cache (devices.Devices) in place, and I fail to understand this.

Jeremy, Vivek, was the source file instrumented incorrectly and is the csv is just plain wrong? Or is there something I am missing?

Everything would be easier and faster to resolve if go had a proper profiling tool. :-/

Comment 10 Vivek Goyal 2016-05-25 12:53:33 UTC

May be tests were done on a disk which was slow. Jeremy, any chance that these tests were done on an AWS instance. I have often seen that there additional disk can be very slow sometimes.

Comment 11 Marian Csontos 2016-05-25 13:31:58 UTC

(In reply to Vivek Goyal from comment #10)
> May be tests were done on a disk which was slow. Jeremy, any chance that
> these tests were done on an AWS instance. I have often seen that there
> additional disk can be very slow *sometimes*.

I hope not. Or does it make any sense to test performance on a platform where performance varies sometimes?

Comment 12 Jeremy Eder 2016-05-25 15:28:34 UTC

Not EC2.  Done in a KVM guest.  Backing storage is 6x300GB SAS 10k RAID6.

Comment 13 Marian Csontos 2016-05-26 15:46:06 UTC

I am not a performance expert, but using VMs for testing performance just does not sound like a good idea to me. Way too many layers to consider.

And we still do not know where's the bottleneck - is it CPU, I/O or memory bound?

If it is absolutely essential to use VMs, e.g. as the infrastructure is using them, I have few easy optimizations to try.

IIUC both the worker VMs and the built images should all be reproducible/throwaway, so we can optimize for speed, ignoring any data safety.

First optimization is of course using "writeback" on the VMs disks. This greatly improves file locking speed, as the file does not have to be written through all the layers down to the slow HDDs.

Using noop scheduler in the VMs and deadline on host brought the best results for my test VMs.

Providing enough^TM memory to the VMs, no swap, and let the host do swapping. (This may not be efficient in "cloud". But does anyone expect to get consistent performance from single VM in cloud? Want more? Scale out! It rhymes with /.c.l..out/ after all.)

What's the backing storage for VM images? Filesystem or LV? Raw, sparse, compressed? I suggest using linear LVs as these have less overhead than anything on top of FS.

Using RAID6 seems like a waste.
I suggest using no redundancy, maybe RAID0.

Also RAID6 for thin-pool metadata is not good.
We recommend linear or RAID1 volumes on the fastest available storage, ideally on a different disk than data disk(s).

I suggest using one of the disks for metadata volumes, and the rest for data in RAID0.

Comment 14 Marian Csontos 2016-05-26 15:46:51 UTC

Also I would absolutely love to see performance on real data. Even though DM can not share buffers, I have a hunch OverlayFS will have significant overhead too, as there will not be that much sharing in an image build service.

Comment 15 Daniel Walsh 2016-08-19 22:37:45 UTC

Any update on this Vivek?