Description of problem: Issue was reported by a customer running with 1300 devices, but it can be seen with less than 100 devices. blkid has code that excessively calls dm ioctls. The problematic code is in lib/blkid/devname.c:probe_one(): /* * Probe a single block device to add to the device cache. */ static void probe_one(blkid_cache cache, const char *ptname, dev_t devno, int pri, int only_if_new) { blkid_dev dev = NULL; struct list_head *p; const char **dir; char *devname = NULL; /* See if we already have this device number in the cache. */ list_for_each(p, &cache->bic_devs) { blkid_dev tmp = list_entry(p, struct blkid_struct_dev, bid_devs); #ifdef HAVE_DEVMAPPER if (!dm_device_is_leaf(devno)) continue; #endif if (tmp->bid_devno == devno) { if (only_if_new) return; dev = blkid_verify(cache, tmp); break; } } The code is repeatedly calling dm_device_is_leaf(devno) when devno will never change. Earlier we loop through all dm maps and call probe_one(), so in effect the whole algorithm is O(n^2), where 'n' is # of dm maps blkid can recognize. I'll attach a simple patch which moves the call inside the 'if (tmp->bid_devno ...)' and reduces execution time to O(n). Upstream, the dm code has been removed from libblkid, but I'm not sure we want this aggressive of a patch for 5.Y so I'll attach a very simple patch. Version-Release number of selected component (if applicable): e2fsprogs-1.39-23.el5_5.1 How reproducible: Every time. Will attach reproducer and patch to fix. Actual results: Excessive time to execute blkid, which may result in excessive boot times. Expected results: blkid should execute in a reasonable time.
Created attachment 475490 [details] Archive file which contains reproducer To run the reproducer, do the following: # tar -zxvf blkid-slowness-repro.tgz # cd blkid-slowness-repro/ # make create-fake-device # ./blkid-test.sh Test creates /tmp/oracle.bin, but will fail if the file already exists. Edit test-params.sh to increase the number of MPATH devices, which will increase the time.
Created attachment 475492 [details] Simple patch which improves the performance of blkid from O(n^2) to O(n). Simple patch which improves the performance from O(n^2) to O(n). Example with MPATHS=100: # ./blkid-test.sh Setting up test Succesfully wrote superblock oracleasm to file /tmp/oracle.bin Running new blkid real 0m1.527s user 0m0.272s sys 0m1.168s Running original blkid real 0m44.102s user 0m6.928s sys 0m36.814s Tearing down test
Eric has backported an upstream fix which removes the libdevmapper calls altogether, and thus performs better than the original patch. I have verified it in his scratch/test build of e2fsprogs-1.39-27.el5 and e2fsprogs-libs-1.39-27.el5: https://brewweb.devel.redhat.com/taskinfo?taskID=3070354 The above MPATHS=100 test yields: # ./blkid-test.sh Setting up test Succesfully wrote superblock oracleasm to file /tmp/oracle.bin Running new blkid real 0m0.440s user 0m0.096s sys 0m0.316s Tearing down test I ran the test with MPATHS=1300, which should simulate the customers environment and got this: # ./blkid-test.sh Setting up test Succesfully wrote superblock oracleasm to file /tmp/oracle.bin Running new blkid real 1m20.305s user 0m8.761s sys 1m8.532s Tearing down test So the only bad news for the customer will be that there will be no more long coffee breaks while they wait for blkid to finish.
Dave, thanks very much for figuring out the testcase and testing the newer package. I'll mark this as NEXTRELEASE since e2fsprogs-1.39-27.el5 and later, available in RHEL5.7 and later, will have the fix, as per bug #553216 (it's not exactly a dup of that bug ...)
Fix has been verified by customer.