Bug 185447

Summary: kernel dm: flush queued bios if suspend is interrupted
Product: Red Hat Enterprise Linux 4 Reporter: Alasdair Kergon <agk>
Component: kernelAssignee: Alasdair Kergon <agk>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: jbaron, jnomura, mbroz
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2006-0575 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-08-10 22:41:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 181409    
Attachments:
Description Flags
Regression test for this bug none

Description Alasdair Kergon 2006-03-14 21:27:20 UTC
If dm_suspend() is cancelled, bios already added
to the deferred list need to be submitted.
Otherwise they remain 'in limbo' until there's a dm_resume().
 
Signed-off-by: Jun'ichi Nomura <j-nomura.nec.com>
Signed-Off-By: Alasdair G Kergon <agk>
 
Index: linux-2.6.16-rc5/drivers/md/dm.c
===================================================================
--- linux-2.6.16-rc5.orig/drivers/md/dm.c       2006-03-12 21:56:04.000000000 +0000
+++ linux-2.6.16-rc5/drivers/md/dm.c    2006-03-12 21:58:05.000000000 +0000
@@ -1093,6 +1093,7 @@ int dm_suspend(struct mapped_device *md,
 {
        struct dm_table *map = NULL;
        DECLARE_WAITQUEUE(wait, current);
+       struct bio *def;
        int r = -EINVAL;
  
        down(&md->suspend_lock);
@@ -1152,9 +1153,11 @@ int dm_suspend(struct mapped_device *md,
        /* were we interrupted ? */
        r = -EINTR;
        if (atomic_read(&md->pending)) {
+               clear_bit(DMF_BLOCK_IO, &md->flags);
+               def = bio_list_get(&md->deferred);
+               __flush_deferred_io(md, def);
                up_write(&md->io_lock);
                unlock_fs(md);
-               clear_bit(DMF_BLOCK_IO, &md->flags);
                goto out;
        }
        up_write(&md->io_lock);

Comment 2 Jun'ichi NOMURA 2006-03-20 20:26:38 UTC
Created attachment 126361 [details]
Regression test for this bug

This test checks the bug.
You'll see "PASS" if the bug is fixed.
"FAIL" on RHEL4 U3 (2.6.9-34.EL).

I ran this test on 2.6.9-34.EL plus the proposed patch
and confirmed it's fixed.

Comment 3 Jun'ichi NOMURA 2006-03-20 20:30:14 UTC
If someone hits this bug, process doing I/O on the map will stall.
Crash will show backtrace like below:

crash> bt 6694
PID: 6694   TASK: e00000011ecb8000  CPU: 7   COMMAND: "dd"
 #0 [BSP:e00000011ecb93a0] context_switch at a000000100068500
 #1 [BSP:e00000011ecb9288] schedule at a000000100586fc0
 #2 [BSP:e00000011ecb9268] io_schedule at a000000100589830
 #3 [BSP:e00000011ecb9238] __wait_on_buffer at a000000100123010
 #4 [BSP:e00000011ecb91a8] __block_prepare_write at a00000010012b6f0
 #5 [BSP:e00000011ecb9170] block_prepare_write at a00000010012be40
 #6 [BSP:e00000011ecb9138] blkdev_prepare_write at a000000100134810
 #7 [BSP:e00000011ecb9060] generic_file_buffered_write at a0000001000d3520
 #8 [BSP:e00000011ecb8ff0] __generic_file_aio_write_nolock at a0000001000d4500
 #9 [BSP:e00000011ecb8fa0] generic_file_aio_write_nolock at a0000001000d4de0
#10 [BSP:e00000011ecb8f68] generic_file_write_nolock at a0000001000d5220
#11 [BSP:e00000011ecb8f30] blkdev_file_write at a000000100136f50
#12 [BSP:e00000011ecb8ee0] vfs_write at a000000100120090
#13 [BSP:e00000011ecb8e68] sys_write at a0000001001202b0
#14 [BSP:e00000011ecb8e68] ia64_ret_from_syscall at a00000010000f3e0

Workaround is to suspend the problematic map again and resume it.


Comment 4 Jason Baron 2006-03-28 17:51:59 UTC
committed in stream u4 build 34.9. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 8 Red Hat Bugzilla 2006-08-10 22:41:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html