Hide Forgot
marian, other day we were discussing the use case for an lvm graph driver. I am wondering if it is worth do a quick prototype implementation and see if it can provide better performance. This is assuming that multiple device operations can progress in parallel for most part and unlike docker devicemapper don't get serialized completely.
Or, is there any chance that libdevmapper can be thread safe so that multiple threads of docker can call into library in parallel instead of having to serialize all the operations.
I am not sure why openshift would trust a brand new backend rather then OverlayFS, even if we could quickly build a lvm backend. But it would be good to know if lvm has better performance.
Basically you do not want serialize all operation into a single thread processing stack and you want to keep processing of operations as independent commands. And yes, it should be tried to use lvm2 commands as direct replacement of docker device manipulations code.
Try the new dm_udev_wait_immediate which will let you wait for udev outside the library while not holding the library mutex. https://www.redhat.com/archives/lvm-devel/2016-April/msg00145.html https://git.fedorahosted.org/cgit/lvm2.git/patch/?id=16019b518e287da19c87eb64229f5c3ca057cb05
I analysed the out.csv data and found the following: *lookupDevice* 2. took <type 'float'> Nulls: False Min: 1.630804 Max: 11622.047727 Sum: 193923.888232 Mean: 712.955471441 Median: 289.74097 Standard Deviation: 1261.40217467 Unique values: 272 Row count: 272 ================================================================================ *lookupDeviceWithLock* ================================================================================ 2. took <type 'float'> Nulls: False Min: 1.630804 Max: 11622.047727 Sum: 193914.978846 Mean: 718.203625356 Median: 291.2552555 Standard Deviation: 1264.58521366 Unique values: 270 Row count: 270 lookupDeviceWithLock is calling lookupDevice. And waiting for the lock takes almost no time at all (ignoring those 2 extra calls to lookupDevice from elsewhere.) So those 3 minutes are spent in lookupDevice, which does nothing but calls loadMetadata: func (devices *DeviceSet) loadMetadata(hash string) *devInfo { info := &devInfo{Hash: hash, devices: devices} jsonData, err := ioutil.ReadFile(devices.metadataFile(info)) if err != nil { return nil } if err := json.Unmarshal(jsonData, &info); err != nil { return nil } if info.DeviceID > maxDeviceID { logrus.Errorf("Ignoring Invalid DeviceId=%d", info.DeviceID) return nil } return info } IIUC looking at this function, lookupDevice is spending those 3 minutes reading and parsing JSON files and there is nothing what LVM/DM could do to speed it up. There is a cache (devices.Devices) in place, and I fail to understand this. Jeremy, Vivek, was the source file instrumented incorrectly and is the csv is just plain wrong? Or is there something I am missing? Everything would be easier and faster to resolve if go had a proper profiling tool. :-/
May be tests were done on a disk which was slow. Jeremy, any chance that these tests were done on an AWS instance. I have often seen that there additional disk can be very slow sometimes.
(In reply to Vivek Goyal from comment #10) > May be tests were done on a disk which was slow. Jeremy, any chance that > these tests were done on an AWS instance. I have often seen that there > additional disk can be very slow *sometimes*. I hope not. Or does it make any sense to test performance on a platform where performance varies sometimes?
Not EC2. Done in a KVM guest. Backing storage is 6x300GB SAS 10k RAID6.
I am not a performance expert, but using VMs for testing performance just does not sound like a good idea to me. Way too many layers to consider. And we still do not know where's the bottleneck - is it CPU, I/O or memory bound? If it is absolutely essential to use VMs, e.g. as the infrastructure is using them, I have few easy optimizations to try. IIUC both the worker VMs and the built images should all be reproducible/throwaway, so we can optimize for speed, ignoring any data safety. First optimization is of course using "writeback" on the VMs disks. This greatly improves file locking speed, as the file does not have to be written through all the layers down to the slow HDDs. Using noop scheduler in the VMs and deadline on host brought the best results for my test VMs. Providing enough^TM memory to the VMs, no swap, and let the host do swapping. (This may not be efficient in "cloud". But does anyone expect to get consistent performance from single VM in cloud? Want more? Scale out! It rhymes with /.c.l..out/ after all.) What's the backing storage for VM images? Filesystem or LV? Raw, sparse, compressed? I suggest using linear LVs as these have less overhead than anything on top of FS. Using RAID6 seems like a waste. I suggest using no redundancy, maybe RAID0. Also RAID6 for thin-pool metadata is not good. We recommend linear or RAID1 volumes on the fastest available storage, ideally on a different disk than data disk(s). I suggest using one of the disks for metadata volumes, and the rest for data in RAID0.
Also I would absolutely love to see performance on real data. Even though DM can not share buffers, I have a hunch OverlayFS will have significant overhead too, as there will not be that much sharing in an image build service.
Any update on this Vivek?