Description of problem: dm-mpath paniced at dm_pg_init_complete+0x10 while testing multipath reaction to CLARiion CX300 non-destructive ucode upgrade called an NDU. Using version 0.4.3-pre5 multipath tools and version 2.6.11-rc3-udm2 linux kernel. I see no code in place that would prevent this problem from occurring on Red Hat AS 4 Update 1. The panic is occurring due to corrupted memory in the path structure for a multipath pg_init i/o completion. I suspect that the memory for the path structure (and its encompassing path group and multipath structure) has been freed by the multipath destructor, subsequently re-allocated for other use, and written upon. I suspect that the problem is caused by having a pg_init request oustanding while the pending count on a multipath mapped device is zero when the multipath mapped device is suspended. Since pg_init requests are not accounted for in the pending count of the multipath mapped device structure, it is possible to have outstanding pg_init requests awaiting i/o completion when the pending count is zero. If the multipath table is destroyed via dm_table_destroy() before the pg_init i/o completion arrives, dm_pg_init_complete() can reference corrupted memory. This can happen either from the swap-in of a new dm table or the closing of the dm mapped device. My panic involves the former use case. I suspect that the prerequisite state of having a pg_init request outstanding while having a 0 pending count for a multiapth during a multipath device suspension is achieved in one of two possible ways. First, when a multipath, which has no pending requests and a call from user space to switch_pg_num() sends a pg_init request just prior to having the multipath device suspended. Second, when a multipath mapped device which has one or more pending requests and a pg_init request outstanding is suspended. Reasonable fixes for both cases (which I tried) are (1) to only send a pg_init request if there are one or more pending requests queued for the multipath and (2) to not dispatch pending requests if there is a pg_init request outstanding. With these changes in place, I no longer incurred this problem. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I think this is fixed by the updated patch I attached for bug #155428 and maybe bug #15442 also has an impact here. Does the bug still occur with these updates in place?
(In reply to comment #1) > I think this is fixed by the updated patch I attached for bug #155428 and maybe > bug #15442 also has an impact here. > Does the bug still occur with these updates in place? From reading the description for bug #155428, I cannot see how that bug and this one are at all related. Also, I am not able to access bug #15442 (or #155442???).
bug 154442
Marking as duplicate of bug 154442 as the fix is believed to be the same. *** This bug has been marked as a duplicate of 154442 ***