Bug 156576 - Panic due to outstanding pg_init completion after multipath mapped device is suspended and the mapped device's multipath structure is destroyed via its destructor.
Panic due to outstanding pg_init completion after multipath mapped device is ...
Status: CLOSED DUPLICATE of bug 154442
Product: Fedora
Classification: Fedora
Component: device-mapper-multipath (Show other bugs)
rawhide
All Linux
medium Severity high
: ---
: ---
Assigned To: Alasdair Kergon
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-05-01 20:53 EDT by Ed Goggin
Modified: 2007-11-30 17:11 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-07-02 15:56:59 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Ed Goggin 2005-05-01 20:53:10 EDT
Description of problem:

dm-mpath paniced at dm_pg_init_complete+0x10 while testing multipath
reaction to CLARiion CX300 non-destructive ucode upgrade called an NDU.

Using version 0.4.3-pre5 multipath tools and version 2.6.11-rc3-udm2
linux kernel.  I see no code in place that would prevent this problem
from occurring on Red Hat AS 4 Update 1.

The panic is occurring due to corrupted memory in the path structure
for a multipath pg_init i/o completion.  I suspect that the memory for
the path structure (and its encompassing path group and multipath structure)
has been freed by the multipath destructor, subsequently re-allocated
for other use, and written upon.

I suspect that the problem is caused by having a pg_init request oustanding
while the pending count on a multipath mapped device is zero when the
multipath mapped device is suspended.

Since pg_init requests are not accounted for in the pending
count of the multipath mapped device structure, it is possible
to have outstanding pg_init requests awaiting i/o completion
when the pending count is zero.  If the multipath table
is destroyed via dm_table_destroy() before the pg_init i/o
completion arrives, dm_pg_init_complete() can reference
corrupted memory.  This can happen either from the swap-in
of a new dm table or the closing of the dm mapped device.
My panic involves the former use case.

I suspect that the prerequisite state of having a pg_init request
outstanding while having a 0 pending count for a multiapth during
a multipath device suspension is achieved in one of two possible ways.
First, when a multipath, which has no pending requests and a call from
user space to switch_pg_num() sends a pg_init request just prior to having
the multipath device suspended.  Second, when a multipath mapped device
which has one or more pending requests and a pg_init request outstanding
is suspended.

Reasonable fixes for both cases (which I tried) are (1) to only send a pg_init
request if there are one or more pending requests queued for the multipath and 
(2) to not dispatch pending requests if there is a pg_init request outstanding.
With these changes in place, I no longer incurred this problem.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
Comment 1 Lars Marowsky-Bree 2005-05-03 06:19:11 EDT
I think this is fixed by the updated patch I attached for bug #155428 and maybe
bug #15442 also has an impact here.

Does the bug still occur with these updates in place?
Comment 2 Ed Goggin 2005-05-03 09:06:05 EDT
(In reply to comment #1)
> I think this is fixed by the updated patch I attached for bug #155428 and 
maybe
> bug #15442 also has an impact here.
> Does the bug still occur with these updates in place?

From reading the description for bug #155428, I cannot see how that bug
and this one are at all related.  Also, I am not able to access bug
#15442 (or #155442???).
Comment 3 Alasdair Kergon 2005-05-04 10:57:28 EDT
bug 154442
Comment 6 Alasdair Kergon 2005-07-02 15:56:59 EDT
Marking as duplicate of bug 154442 as the fix is believed to be the same.

*** This bug has been marked as a duplicate of 154442 ***

Note You need to log in before you can comment on or make changes to this bug.