Bug 154443
Summary: | kernel dm-multipath: can device be destroyed while outstanding pg_init I/O? | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Alasdair Kergon <agk> |
Component: | device-mapper-multipath | Assignee: | Alasdair Kergon <agk> |
Status: | CLOSED DUPLICATE | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.1 | CC: | agk, christophe.varoqui, dmo, tranlan |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-07-02 19:55:02 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
With the patch from bug 154442 I think this is now resolved: A pg_init can't be issued unless there's I/O outstanding. suspend waits for I/O => it waits for pg_init to complete. destroy first does suspend. *** This bug has been marked as a duplicate of 154442 *** |
Need to re-check the logic in the existing code. Theory: > Suspend cannot complete until there is no I/O outstanding, including > any pg_init I/O. > - queueing is disabled while suspended > - the last I/O through cannot complete with an error until the last > path has failed, which can't happen until the pg_init for its PG > has completed. Comments from Ed Goggin [based on a different version of the code]: I think there are two other use cases besides the __switch_pg() case described above whereby a pg_init i/o can be left outstanding after its multipath structure's memory is freed. First, the most common case -- it is possible for an uncompleted PG switch to be pending with no i/os queued to the multipath when a dev_suspend initiated i/o suspension followed by table swap occurs. process_queued_ios() when called by multipath_presuspend(), will initiate a pg_init even though there are no queued user i/os. Because there are no i/os queued, the suspend will not block waiting for the pg_init to complete. If the dm_table_put() call on the old table frees the multipath structure memory before the pg_init i/o completes (likely case), the problem occurs. This case can be prevented simply by having process_queued_ios() not initiate a pg_init i/o if there are no queued i/os pended on the multipath structure. Second, a somewhat less common case -- a pg_init i/o is outstanding when a dev_suspend() call is made on a multipath for which the kernel thinks it currently has no valid paths. I think the sequence of events for the use case is as enumerated below. I think this case can be prevented by not calling dispatch_queued_ios() if there is a oustanding pg_init. This state is detected by the conditional logic (m->queue_io && !m->pg_init_required) In this case, any queued i/os will be dispatched only after the outstanding pg_init i/o completes via the call to process_queued_io() from dm_pg_init_complete(). dm_pg_init_complete() must also be changed to reset the queue_io flag if the suspended flag is set even in the case of an error on the pg_init i/o, otherwise the i/os would never be dispatched in the "all-paths-down" use case. (1) pg_init i/o is sent and causes queuing of one or more user i/os (2) all paths are detected as failed in kernel (3) multipath reloads new table causing via dev_suspend() ioctl causing i/o suspend followed by table swap (a) pre-suspend causes even the last i/o to fail on the last path -- even before the pg_init for its PG has completed