Bug 672985

Summary: blkid runs very slow with many dm devices and a large blkid.tab file
Product: Red Hat Enterprise Linux 5 Reporter: Dave Wysochanski <dwysocha>
Component: e2fsprogsAssignee: Eric Sandeen <esandeen>
Status: CLOSED NEXTRELEASE QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.5CC: esandeen, fsorenso, jwest, sct
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: e2fsprogs-1.39-27.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-28 17:46:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Archive file which contains reproducer
none
Simple patch which improves the performance of blkid from O(n^2) to O(n). none

Description Dave Wysochanski 2011-01-26 22:14:48 UTC
Description of problem:

Issue was reported by a customer running with 1300 devices, but it can be seen with less than 100 devices.

blkid has code that excessively calls dm ioctls.  The problematic code is in lib/blkid/devname.c:probe_one():

/*
 * Probe a single block device to add to the device cache.
 */
static void probe_one(blkid_cache cache, const char *ptname,
                      dev_t devno, int pri, int only_if_new)
{
        blkid_dev dev = NULL;
        struct list_head *p;
        const char **dir;
        char *devname = NULL;

        /* See if we already have this device number in the cache. */
        list_for_each(p, &cache->bic_devs) {
                blkid_dev tmp = list_entry(p, struct blkid_struct_dev,
                                           bid_devs);
#ifdef HAVE_DEVMAPPER
                if (!dm_device_is_leaf(devno))
                        continue;
#endif
                if (tmp->bid_devno == devno) {
                        if (only_if_new)
                                return;
                        dev = blkid_verify(cache, tmp);
                        break;
               }
        }


The code is repeatedly calling dm_device_is_leaf(devno) when devno will never change.  Earlier we loop through all dm maps and call probe_one(), so in effect the whole algorithm is O(n^2), where 'n' is # of dm maps blkid can recognize.  I'll attach a simple patch which moves the call inside the 'if (tmp->bid_devno ...)' and reduces execution time to O(n).

Upstream, the dm code has been removed from libblkid, but I'm not sure we want this aggressive of a patch for 5.Y so I'll attach a very simple patch.

Version-Release number of selected component (if applicable):
e2fsprogs-1.39-23.el5_5.1

How reproducible:
Every time.  Will attach reproducer and patch to fix.

 
Actual results:
Excessive time to execute blkid, which may result in excessive boot times.


Expected results:
blkid should execute in a reasonable time.

Comment 1 Dave Wysochanski 2011-01-26 22:22:03 UTC
Created attachment 475490 [details]
Archive file which contains reproducer

To run the reproducer, do the following:
# tar -zxvf blkid-slowness-repro.tgz
# cd blkid-slowness-repro/
# make create-fake-device
# ./blkid-test.sh 

Test creates /tmp/oracle.bin, but will fail if the file already exists.
Edit test-params.sh to increase the number of MPATH devices, which will increase the time.

Comment 2 Dave Wysochanski 2011-01-26 22:25:36 UTC
Created attachment 475492 [details]
Simple patch which improves the performance of blkid from O(n^2) to O(n).

Simple patch which improves the performance from O(n^2) to O(n).

Example with MPATHS=100:
# ./blkid-test.sh 
Setting up test
Succesfully wrote superblock oracleasm to file /tmp/oracle.bin
Running new blkid

real	0m1.527s
user	0m0.272s
sys	0m1.168s
Running original blkid

real	0m44.102s
user	0m6.928s
sys	0m36.814s
Tearing down test

Comment 3 Dave Wysochanski 2011-01-27 22:49:05 UTC
Eric has backported an upstream fix which removes the libdevmapper calls altogether, and thus performs better than the original patch.  I have verified it in his scratch/test build of e2fsprogs-1.39-27.el5 and e2fsprogs-libs-1.39-27.el5:
https://brewweb.devel.redhat.com/taskinfo?taskID=3070354

The above MPATHS=100 test yields:
# ./blkid-test.sh 
Setting up test
Succesfully wrote superblock oracleasm to file /tmp/oracle.bin
Running new blkid

real	0m0.440s
user	0m0.096s
sys	0m0.316s
Tearing down test


I ran the test with MPATHS=1300, which should simulate the customers environment and got this:

# ./blkid-test.sh 
Setting up test
Succesfully wrote superblock oracleasm to file /tmp/oracle.bin
Running new blkid

real	1m20.305s
user	0m8.761s
sys	1m8.532s
Tearing down test


So the only bad news for the customer will be that there will be no more long coffee breaks while they wait for blkid to finish.

Comment 4 Eric Sandeen 2011-01-28 17:46:19 UTC
Dave, thanks very much for figuring out the testcase and testing the newer package.  I'll mark this as NEXTRELEASE since e2fsprogs-1.39-27.el5 and later, available in RHEL5.7 and later, will have the fix, as per bug #553216 (it's not exactly a dup of that bug ...)

Comment 5 Dave Wysochanski 2011-02-01 22:02:06 UTC
Fix has been verified by customer.