If dm_suspend() is cancelled, bios already added to the deferred list need to be submitted. Otherwise they remain 'in limbo' until there's a dm_resume(). Signed-off-by: Jun'ichi Nomura <j-nomura.nec.com> Signed-Off-By: Alasdair G Kergon <agk> Index: linux-2.6.16-rc5/drivers/md/dm.c =================================================================== --- linux-2.6.16-rc5.orig/drivers/md/dm.c 2006-03-12 21:56:04.000000000 +0000 +++ linux-2.6.16-rc5/drivers/md/dm.c 2006-03-12 21:58:05.000000000 +0000 @@ -1093,6 +1093,7 @@ int dm_suspend(struct mapped_device *md, { struct dm_table *map = NULL; DECLARE_WAITQUEUE(wait, current); + struct bio *def; int r = -EINVAL; down(&md->suspend_lock); @@ -1152,9 +1153,11 @@ int dm_suspend(struct mapped_device *md, /* were we interrupted ? */ r = -EINTR; if (atomic_read(&md->pending)) { + clear_bit(DMF_BLOCK_IO, &md->flags); + def = bio_list_get(&md->deferred); + __flush_deferred_io(md, def); up_write(&md->io_lock); unlock_fs(md); - clear_bit(DMF_BLOCK_IO, &md->flags); goto out; } up_write(&md->io_lock);
Created attachment 126361 [details] Regression test for this bug This test checks the bug. You'll see "PASS" if the bug is fixed. "FAIL" on RHEL4 U3 (2.6.9-34.EL). I ran this test on 2.6.9-34.EL plus the proposed patch and confirmed it's fixed.
If someone hits this bug, process doing I/O on the map will stall. Crash will show backtrace like below: crash> bt 6694 PID: 6694 TASK: e00000011ecb8000 CPU: 7 COMMAND: "dd" #0 [BSP:e00000011ecb93a0] context_switch at a000000100068500 #1 [BSP:e00000011ecb9288] schedule at a000000100586fc0 #2 [BSP:e00000011ecb9268] io_schedule at a000000100589830 #3 [BSP:e00000011ecb9238] __wait_on_buffer at a000000100123010 #4 [BSP:e00000011ecb91a8] __block_prepare_write at a00000010012b6f0 #5 [BSP:e00000011ecb9170] block_prepare_write at a00000010012be40 #6 [BSP:e00000011ecb9138] blkdev_prepare_write at a000000100134810 #7 [BSP:e00000011ecb9060] generic_file_buffered_write at a0000001000d3520 #8 [BSP:e00000011ecb8ff0] __generic_file_aio_write_nolock at a0000001000d4500 #9 [BSP:e00000011ecb8fa0] generic_file_aio_write_nolock at a0000001000d4de0 #10 [BSP:e00000011ecb8f68] generic_file_write_nolock at a0000001000d5220 #11 [BSP:e00000011ecb8f30] blkdev_file_write at a000000100136f50 #12 [BSP:e00000011ecb8ee0] vfs_write at a000000100120090 #13 [BSP:e00000011ecb8e68] sys_write at a0000001001202b0 #14 [BSP:e00000011ecb8e68] ia64_ret_from_syscall at a00000010000f3e0 Workaround is to suspend the problematic map again and resume it.
committed in stream u4 build 34.9. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0575.html