Bug 672985 - blkid runs very slow with many dm devices and a large blkid.tab file
Summary: blkid runs very slow with many dm devices and a large blkid.tab file
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: e2fsprogs
Version: 5.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Eric Sandeen
QA Contact: BaseOS QE - Apps
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-26 22:14 UTC by Dave Wysochanski
Modified: 2018-12-06 14:37 UTC (History)
4 users (show)

Fixed In Version: e2fsprogs-1.39-27.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-28 17:46:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Archive file which contains reproducer (2.54 KB, application/x-compressed-tar)
2011-01-26 22:22 UTC, Dave Wysochanski
no flags Details
Simple patch which improves the performance of blkid from O(n^2) to O(n). (646 bytes, patch)
2011-01-26 22:25 UTC, Dave Wysochanski
no flags Details | Diff

Description Dave Wysochanski 2011-01-26 22:14:48 UTC
Description of problem:

Issue was reported by a customer running with 1300 devices, but it can be seen with less than 100 devices.

blkid has code that excessively calls dm ioctls.  The problematic code is in lib/blkid/devname.c:probe_one():

/*
 * Probe a single block device to add to the device cache.
 */
static void probe_one(blkid_cache cache, const char *ptname,
                      dev_t devno, int pri, int only_if_new)
{
        blkid_dev dev = NULL;
        struct list_head *p;
        const char **dir;
        char *devname = NULL;

        /* See if we already have this device number in the cache. */
        list_for_each(p, &cache->bic_devs) {
                blkid_dev tmp = list_entry(p, struct blkid_struct_dev,
                                           bid_devs);
#ifdef HAVE_DEVMAPPER
                if (!dm_device_is_leaf(devno))
                        continue;
#endif
                if (tmp->bid_devno == devno) {
                        if (only_if_new)
                                return;
                        dev = blkid_verify(cache, tmp);
                        break;
               }
        }


The code is repeatedly calling dm_device_is_leaf(devno) when devno will never change.  Earlier we loop through all dm maps and call probe_one(), so in effect the whole algorithm is O(n^2), where 'n' is # of dm maps blkid can recognize.  I'll attach a simple patch which moves the call inside the 'if (tmp->bid_devno ...)' and reduces execution time to O(n).

Upstream, the dm code has been removed from libblkid, but I'm not sure we want this aggressive of a patch for 5.Y so I'll attach a very simple patch.

Version-Release number of selected component (if applicable):
e2fsprogs-1.39-23.el5_5.1

How reproducible:
Every time.  Will attach reproducer and patch to fix.

 
Actual results:
Excessive time to execute blkid, which may result in excessive boot times.


Expected results:
blkid should execute in a reasonable time.

Comment 1 Dave Wysochanski 2011-01-26 22:22:03 UTC
Created attachment 475490 [details]
Archive file which contains reproducer

To run the reproducer, do the following:
# tar -zxvf blkid-slowness-repro.tgz
# cd blkid-slowness-repro/
# make create-fake-device
# ./blkid-test.sh 

Test creates /tmp/oracle.bin, but will fail if the file already exists.
Edit test-params.sh to increase the number of MPATH devices, which will increase the time.

Comment 2 Dave Wysochanski 2011-01-26 22:25:36 UTC
Created attachment 475492 [details]
Simple patch which improves the performance of blkid from O(n^2) to O(n).

Simple patch which improves the performance from O(n^2) to O(n).

Example with MPATHS=100:
# ./blkid-test.sh 
Setting up test
Succesfully wrote superblock oracleasm to file /tmp/oracle.bin
Running new blkid

real	0m1.527s
user	0m0.272s
sys	0m1.168s
Running original blkid

real	0m44.102s
user	0m6.928s
sys	0m36.814s
Tearing down test

Comment 3 Dave Wysochanski 2011-01-27 22:49:05 UTC
Eric has backported an upstream fix which removes the libdevmapper calls altogether, and thus performs better than the original patch.  I have verified it in his scratch/test build of e2fsprogs-1.39-27.el5 and e2fsprogs-libs-1.39-27.el5:
https://brewweb.devel.redhat.com/taskinfo?taskID=3070354

The above MPATHS=100 test yields:
# ./blkid-test.sh 
Setting up test
Succesfully wrote superblock oracleasm to file /tmp/oracle.bin
Running new blkid

real	0m0.440s
user	0m0.096s
sys	0m0.316s
Tearing down test


I ran the test with MPATHS=1300, which should simulate the customers environment and got this:

# ./blkid-test.sh 
Setting up test
Succesfully wrote superblock oracleasm to file /tmp/oracle.bin
Running new blkid

real	1m20.305s
user	0m8.761s
sys	1m8.532s
Tearing down test


So the only bad news for the customer will be that there will be no more long coffee breaks while they wait for blkid to finish.

Comment 4 Eric Sandeen 2011-01-28 17:46:19 UTC
Dave, thanks very much for figuring out the testcase and testing the newer package.  I'll mark this as NEXTRELEASE since e2fsprogs-1.39-27.el5 and later, available in RHEL5.7 and later, will have the fix, as per bug #553216 (it's not exactly a dup of that bug ...)

Comment 5 Dave Wysochanski 2011-02-01 22:02:06 UTC
Fix has been verified by customer.


Note You need to log in before you can comment on or make changes to this bug.