Description of problem: In libmultipath:configure.c, the ACT_CREATE action in domap() fails to unlock the paths if the map is already present, e.g. (dm-mp RHEL5 code) : 327 case ACT_CREATE: 328 if (lock_multipath(mpp, 1)) { 329 condlog(3, "%s: failed to create map (in use)", 330 mpp->alias); 331 return DOMAP_RETRY; 332 } 333 334 if (dm_map_present(mpp->alias)) 335 ---->>> break; 336 337 r = dm_addmap(DM_DEVICE_CREATE, DEFAULT_TARGET, mpp, 1, 0); 338 339 if (!r) 340 r = dm_addmap(DM_DEVICE_CREATE, DEFAULT_TARGET, mpp, 1, 341 1); 342 /* 343 * DM_DEVICE_CREATE is actually DM_DEV_CREATE plus 344 * DM_TABLE_LOAD. Failing the second part leaves an 345 * empty map. Clean it up. 346 */ 347 if (!r && dm_map_present(mpp->alias)) { 348 condlog(3, "%s: failed to load map " 349 "(a path might be in use)", 350 mpp->alias); 351 dm_flush_map(mpp->alias, DEFAULT_TARGET); 352 } 353 354 lock_multipath(mpp, 0); 355 break; This is fixed upstream with commit cc278d3cfe17eadb0ef6db6bf48675ff0a4a9270 but we don't have that fix in dm-mp for RHEL5 yet. Also, whilst I'm at it, lock_multipath() itself is a bit strange - if it successfully locks one or more paths, but then fails on a particular path with EWOULDBLOCK, it returns without unlocking the paths that it did successfully lock. i.e. we get a partial lock. This seems bad and racy, and will (I guess) effectively prevent DOMAP_RETRY from succeeding without an intervening unlocking call. Version-Release number of selected component (if applicable): All RHEL versions of dm-mp. -- Mark Goodwin
Thanks for the heads up! lock_multipath() now cleans up after inself if it fails halfway though, and domap() unlocks all the locked paths if the map is present.
(In reply to comment #1) > Thanks for the heads up! lock_multipath() now cleans up after inself if it > fails halfway though, and domap() unlocks all the locked paths if the map is > present. Sounds good Ben - can you point me to the patch or attach it here please? Also, can you speculate what actual issue(s) this fix will solve? It would be good to be able to relate this to various dm-mp support issues. I suspect it may something to do with dm table corruption during reboot with separately mounted /var but haven't been able to prove or repro that. Cheers -- Mark
Created attachment 377793 [details] Fix to correctly unlock multipath device after failures during create.
I'm not sure that this will deal with corruption during boot. This will only happen during failures to create a device by multipathd. multipath itself will simply exit, and drop the lock then. It seems like this should only be problem when there is a race to create the multipath device. That would only happen if multipath were called at the same time as multipathd was creating a device, and multipath won the race. Sometimes devices get rendered unusable if commands fail, but that is usually because device-mapper suspends the dm-device and then either can't or doesn't resume the dm-device.
Verified that the proposed patch made it to 0.4.7-31.el5.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0255.html