Bug 451819
Summary: | process hangs in async direct IO / possible race between dio_bio_end_aio() and dio_await_one() ? | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Issue Tracker <tao> | ||||
Component: | kernel | Assignee: | Jeff Moyer <jmoyer> | ||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 4.6 | CC: | bmr, john.daley, jplans, rlerch, shantikatta, sputhenp, tao | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Previously, in the kernel, there was a race condition may have been encountered between dio_bio_end_aio() and dio_await_one(). This may have lead to a situation where direct I/O is left waiting indefinitely on an I/O process that has already completed. With this update, the the reference counting operations are locked so that the submission and completion paths see a unified state, which resolves this issue.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-05-18 19:08:11 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 391511, 455750, 456083, 458752, 461297 | ||||||
Attachments: |
|
Comment 2
Ulrich Obergfell
2008-06-17 16:42:12 UTC
I'll take a look. I'll backport: commit 5eb6c7a2ab413dea1ee6c08dd58263a1c2c2efa3 Author: Zach Brown <zach.brown> Date: Sun Dec 10 02:21:07 2006 -0800 [PATCH] dio: lock refcount operations and commit fcb82f8835c1d71b4fe5de1d9894f45370f80dab Author: Zach Brown <zach.brown> Date: Tue Jul 3 15:28:55 2007 -0700 dio: remove bogus refcounting BUG_ON Thanks for the detailed problem report. Created attachment 309947 [details]
dio: lock refcount operations
This is a patch combining and backporting the two aforementioned upstream
commits.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. This bug has been copied as 4.7 z-stream (EUS) bug #456083 and now must be resolved in the current update release, set blocker flag. Committed in 78.1.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/ Updating PM score. *** Bug 467906 has been marked as a duplicate of this bug. *** Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: There is a race between dio_bio_end_aio() and dio_await_one() which can lead to a situation where a process performing asynchronous, direct I/O is left waiting indefinitely on an I/O that has already completed. The solution is to lock the reference counting operations so that the submission and completion paths see a unified state. re-post of analysis from comment #3 crash> sys KERNEL: /cores/20080425054953/work/vmlinux DUMPFILE: /cores/20080425054953/work/1822328-vmcore CPUS: 16 DATE: Thu Apr 24 03:14:27 2008 UPTIME: 03:18:57 LOAD AVERAGE: 268.20, 166.55, 80.32 TASKS: 3527 NODENAME: host.example.com RELEASE: 2.6.9-67.0.4.ELlargesmp VERSION: #1 SMP Fri Jan 18 05:10:01 EST 2008 MACHINE: x86_64 (2800 Mhz) MEMORY: 31.5 GB PANIC: "Oops: 0002 [1] SMP " (check log for details) crash> bt 8960 PID: 8960 TASK: 102edef7030 CPU: 4 COMMAND: "oracle" #0 [100b5397ac8] schedule at ffffffff8030e7e8 #1 [100b5397ba0] io_schedule at ffffffff8030f15e #2 [100b5397bc0] __blockdev_direct_IO at ffffffff8019c9b2 #3 [100b5397c70] ext3_direct_IO at ffffffffa011253a #4 [100b5397cf0] generic_file_direct_IO at ffffffff8015c8fa #5 [100b5397d30] generic_file_direct_write at ffffffff8015c980 #6 [100b5397d80] __generic_file_aio_write_nolock at ffffffff8015ccbc #7 [100b5397e00] generic_file_aio_write_nolock at ffffffff8015cf9f #8 [100b5397e30] generic_file_aio_write at ffffffff8015d069 #9 [100b5397e80] ext3_file_write at ffffffffa010df01 #10 [100b5397ea0] aio_pwrite at ffffffff8019e1ab #11 [100b5397eb0] aio_run_iocb at ffffffff8019d53f #12 [100b5397ed0] io_submit_one at ffffffff8019e560 #13 [100b5397f00] sys_io_submit at ffffffff8019e665 #14 [100b5397f80] tracesys at ffffffff801103ce (via system_call) RIP: 0000002a96f5f6a7 RSP: 0000007fbfff67b8 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: ffffffff801103ce RCX: ffffffffffffffff RDX: 0000007fbfff67c0 RSI: 0000000000000001 RDI: 0000002a971ce000 RBP: 0000000006003750 R8: 0000000000000000 R9: 0000000000000000 R10: 0000007fbfffb3b8 R11: 0000000000000202 R12: 0000007fbfff8900 R13: 0000000000000001 R14: 0000007fbfff8900 R15: 0000007fbfff88e0 ORIG_RAX: 00000000000000d1 CS: 0033 SS: 002b the process has been blocked for 7 minutes and 7 seconds ... crash> px per_cpu__runqueues | grep "\[4\]" [4]: 104000616e0 crash> px ((struct runqueue *)0x104000616e0)->timestamp_last_tick $1 = 0xac6c8e38db3 crash> px ((struct task_struct *)0x102edef7030)->last_ran $2 = 0xa6346839bb6 crash> pd (0xac6c8e38db3 - 0xa6346839bb6) / 1000000000 $3 = 427 <- seconds blocked the "running transaction" in /data1 ... crash> px ((struct task_struct *)0x102edef7030)->journal_info $4 = (void *) 0x10276615640 crash> px ((struct handle_s *)0x10276615640)->h_transaction $5 = (transaction_t *) 0x103fb8952c0 <---------------------------------+ | | crash> mount | grep /data1 | 10300025500 104000fe800 ext3 /dev/md10 /data1 | | crash> px ((struct super_block *)0x104000fe800)->s_fs_info | $6 = (void *) 0x100d1d68000 | | crash> px ((struct ext3_sb_info *)0x100d1d68000)->s_journal | $7 = (struct journal_s *) 0x1040016bc00 | | crash> px ((struct journal_s *)0x1040016bc00)->j_running_transaction | $8 = (transaction_t *) 0x103fb8952c0 <---------------------------------+ "kjournald" PID 30348 is waiting on "t_updates" ... crash> pd ((transaction_t *)0x103fb8952c0)->t_updates $9 = 1 crash> px &((struct journal_s *)0x1040016bc00)->j_wait_updates $10 = (wait_queue_head_t *) 0x1040016bce0 crash> waitq 0x1040016bce0 PID: 30348 TASK: 101fa8727f0 CPU: 8 COMMAND: "kjournald" crash> bt 30348 PID: 30348 TASK: 101fa8727f0 CPU: 8 COMMAND: "kjournald" #0 [100d1ca1ad8] schedule at ffffffff8030e7e8 #1 [100d1ca1bb0] journal_commit_transaction at ffffffffa00fa872 #2 [100d1ca1e80] kjournald at ffffffffa00fda5c #3 [100d1ca1f50] kernel_thread at ffffffff80110f47 192 processes are waiting on "T_LOCKED" ... crash> pd ((transaction_t *)0x103fb8952c0)->t_state $11 = T_LOCKED crash> px &((struct journal_s *)0x1040016bc00)->j_wait_transaction_locked $12 = (wait_queue_head_t *) 0x1040016bc68 crash> waitq 0x1040016bc68 | wc -l 192 crash> waitq 0x1040016bc68 PID: 25604 TASK: 10444f1f030 CPU: 4 COMMAND: "oracle" PID: 25542 TASK: 10216e75030 CPU: 15 COMMAND: "oracle" ... PID: 24278 TASK: 100120d1030 CPU: 1 COMMAND: "ls" ... PID: 22651 TASK: 1044b3b67f0 CPU: 3 COMMAND: "oracle" crash> bt 24278 PID: 24278 TASK: 100120d1030 CPU: 1 COMMAND: "ls" #0 [1000b047ca8] schedule at ffffffff8030e7e8 #1 [1000b047d80] start_this_handle at ffffffffa00f8359 #2 [1000b047e60] journal_start at ffffffffa00f84c6 #3 [1000b047e80] ext3_dirty_inode at ffffffffa0112797 #4 [1000b047ea0] __mark_inode_dirty at ffffffff80199b0c #5 [1000b047ed0] update_atime at ffffffff80192f5e #6 [1000b047f00] vfs_readdir at ffffffff8018c3a8 #7 [1000b047f30] sys_getdents64 at ffffffff8018c784 #8 [1000b047f80] tracesys at ffffffff801103ce (via system_call) RIP: 0000003369d8c68e RSP: 0000007fbffff320 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: ffffffff801103ce RCX: ffffffffffffffff RDX: 0000000000001000 RSI: 000000000051a598 RDI: 0000000000000003 RBP: 0000002a957836a0 R8: 0000003369f31ce8 R9: 0000000000000010 R10: 0000007fbffff201 R11: 0000000000000246 R12: 0000000000000000 R13: 0000007fbffff320 R14: 000000000051a560 R15: 000000000051a540 ORIG_RAX: 00000000000000d9 CS: 0033 SS: 002b the copy of the user iocb on the kernel stack of "oracle" PID 8960 ... crash> dis sys_io_submit ... asmlinkage long sys_io_submit(aio_context_t ctx_id, long nr, struct iocb __user * __user *iocbpp) { ... for (i=0; i<nr; i++) { ... struct iocb tmp; ... if (unlikely(copy_from_user(&tmp, user_iocb, sizeof(tmp)))) { // dl = sizeof(tmp) 0xffffffff8019e645 <sys_io_submit+149>: mov $0x40,%dl 0xffffffff8019e647 <sys_io_submit+151>: mov %r12,%rsi // rdi = rsp (&tmp) 0xffffffff8019e64a <sys_io_submit+154>: mov %rsp,%rdi // call copy_from_user 0xffffffff8019e64d <sys_io_submit+157>: callq 0xffffffff801ee960 ... crash> bt -f 8960 PID: 8960 TASK: 102edef7030 CPU: 4 COMMAND: "oracle" ... #13 [100b5397f00] sys_io_submit at ffffffff8019e665 -+ 100b5397f08: 0000002a97ec2ea0 0000000000000000 | 100b5397f18: 0000001600000001 0000002a974a3e00 | copy of 100b5397f28: 0000000000100000 0000000000b00200 | user iocb 100b5397f38: 0000000000000000 0000000000000000 | -+ 100b5397f48: 0000000000000001 0000000000000001 100b5397f58: 0000007fbfff88e0 0000007fbfff8900 100b5397f68: 0000000000000001 0000007fbfff8900 100b5397f78: 0000000006003750 ffffffff801103ce ... crash> px *(struct iocb *)0x100b5397f08 $13 = { aio_data = 0x2a97ec2ea0, aio_key = 0x0, aio_reserved1 = 0x0, aio_lio_opcode = 0x1 = IOCB_CMD_PWRITE, aio_reqprio = 0x0, aio_fildes = 0x16 =22, file descriptor aio_buf = 0x2a974a3e00, | aio_nbytes = 0x100000, | aio_offset = 0xb00200, | aio_reserved2 = 0x0, | aio_reserved3 = 0x0 | } | v crash> files 8960 | grep " 22 " 22 1024869c880 1044fde6228 1056b3fcb18 REG /data1/oracle/archive/CC/1_59673_629046150.dbf the dio structure pertaining to the above iocb ... crash> dis __blockdev_direct_IO ... static struct bio *dio_await_one(struct dio *dio) { unsigned long flags; struct bio *bio = NULL; spin_lock_irqsave(&dio->bio_lock, flags); while (wait_for_more_bios(dio)) { set_current_state(TASK_UNINTERRUPTIBLE); if (wait_for_more_bios(dio)) { dio->waiter = current; spin_unlock_irqrestore(&dio->bio_lock, flags); io_schedule(); ... // rax = current 0xffffffff8019c98b <__blockdev_direct_IO+2780>: mov %gs:0x0,%rax // rbx = rbp->bio_lock 0xffffffff8019c994 <__blockdev_direct_IO+2789>: lea 0x318(%rbp),%rbx 0xffffffff8019c99b <__blockdev_direct_IO+2796>: mov %r12,%rsi // rbp->waiter = rax 0xffffffff8019c99e <__blockdev_direct_IO+2799>: mov %rax,0x328(%rbp) 0xffffffff8019c9a5 <__blockdev_direct_IO+2806>: mov %rbx,%rdi // call _spin_unlock_irqrestore 0xffffffff8019c9a8 <__blockdev_direct_IO+2809>: callq 0xffffffff8030f7ca // call io_schedule 0xffffffff8019c9ad <__blockdev_direct_IO+2814>: callq 0xffffffff8030f138 0xffffffff8019c9b2 <__blockdev_direct_IO+2819>: mov %rbx,%rdi crash> dis io_schedule 1 // save rbp (dio) 0xffffffff8030f138 <io_schedule>: push %rbp crash> bt -f 8960 PID: 8960 TASK: 102edef7030 CPU: 4 COMMAND: "oracle" ... #1 [100b5397ba0] io_schedule at ffffffff8030f15e 100b5397ba8: 00000100083a8e80 000001010012bb18 saved rbp (dio) 100b5397bb8: 000001010012b800 ffffffff8019c9b2 ... crash> px *(struct dio *)0x1010012b800 $14 = { bio = 0x0, inode = 0x1056b3fcb18, rw = 0x1 = WRITE, i_size = 0x11b49800, lock_type = 0x1 = DIO_LOCKING, blkbits = 0x9, blkfactor = 0x3, start_zero_done = 0x1, pages_in_io = 0x102, size = 0x100000, block_in_file = 0x5808, blocks_available = 0x0, final_block_in_request = 0x6001, first_block_in_page = 0x7, boundary = 0x0, reap_counter = 0x1, get_blocks = 0xffffffffa01114c8 <ext3_direct_io_get_blocks>, end_io = 0, final_block_in_bio = 0x9486140, next_block_for_io = 0x9486140, map_bh = { b_state = 0x0, b_this_page = 0x5245504f2e533253, b_page = 0x2c504f4320454341, b_count = { counter = 0x53345320 }, b_size = 0x1000, b_blocknr = 0x1290c27, b_data = 0x204552454857204 b_bdev = 0x100db0e7cc0, b_end_io = 0x2044495f414e491e, b_private = 0x0, b_assoc_buffers = { next = 0x492e4b5320444e41, prev = 0x333431203d2044 } }, cur_page = 0x0, cur_page_offset = 0x0, cur_page_len = 0xc00, cur_page_block = 0x948613a, curr_page = 0x40, total_pages = 0x101, curr_user_address = 0x2a974e3e00, pages = {0x10002cea048, 0x102ff4e21a0, 0x102ff4e21d8, 0x102ff4e2210, 0x102ff4e 2248, 0x102ff4e2280, 0x102ff4e22b8, 0x102ff4e22f0, 0x102ff4e2328, 0x102ff4e2360, 0x102ff4e2398, 0x102ff4e23d0, 0x102ff4e2408, 0x102ff4e2440, 0x102ff4e2478, 0x10 2ff4e24b0, 0x102ff4e24e8, 0x102ff4e2520, 0x102ff4e2558, 0x102ff4e2590, 0x102ff4e 25c8, 0x102ff4e2600, 0x102ff4e2638, 0x102ff4e2670, 0x102ff4e26a8, 0x102ff4e26e0, 0x102ff4e2718, 0x102ff4e2750, 0x102ff4e2788, 0x102ff4e27c0, 0x102ff4e27f8, 0x10 2ff4e2830, 0x102ff4e2868, 0x102ff4e28a0, 0x102ff4e28d8, 0x102ff4e2910, 0x102ff4e 2948, 0x102ff4e2980, 0x102ff4e29b8, 0x102ff4e29f0, 0x102ff4e2a28, 0x102ff4e2a60, 0x102ff4e2a98, 0x102ff4e2ad0, 0x102ff4e2b08, 0x102ff4e2b40, 0x102ff4e2b78, 0x10 2ff4e2bb0, 0x102ff4e2be8, 0x102ff4e2c20, 0x102ff4e2c58, 0x102ff4e2c90, 0x102ff4e 2cc8, 0x102ff4e2d00, 0x102ff4e2d38, 0x102ff4e2d70, 0x102ff4e2da8, 0x102ff4e2de0, 0x102ff4e2e18, 0x102ff4e2e50, 0x102ff4e2e88, 0x102ff4e2ec0, 0x102ff4e2ef8, 0x10 2ff4e2f30}, head = 0x40, tail = 0x40, page_errors = 0x0, refcount = { counter = 0x1 <----------+ reference count is 1 (no ongoing I/O) }, | bio_lock = { | lock = 0x1, | magic = 0xdead4ead | }, | bio_list = 0x0, | waiter = 0x102edef7030, <--+ but waiter is not NULL iocb = 0x107b7a3c780, | is_async = 0x1, | io_error = 0x0, | result = 0xe00 | } | | crash> ps | grep 102edef7030 v 8960 1 4 102edef7030 UN 0.1 10634640 32276 oracle repost of analysis from comment #5: details of direct I/O request ----------------------------- vaddr address space file data offset fs block +-------------+ +---------+ | | | | | | | | | I/O buffer | | | 3900000 1698f91e 2a97ce3e00 |/////////////| |/////////| 3900200 2a97ce4000 |/////////////| |/////////| 3900400 |/////////////| |/////////| 3900600 |/////////////| |/////////| 3900800 |/////////////| |/////////| 3900a00 |/////////////| |/////////| 3900c00 |/////////////| |/////////| 3900e00 |/////////////|-- write ->|/////////| 3901000 00000000 (hole) |/////////////| |/////////| 3901200 2a97ce5000 |/////////////| |/////////| 3901400 |/////////////| |/////////| 3901600 |/////////////| |/////////| 3901800 |/////////////| |/////////| 3901a00 |/////////////| |/////////| 3901c00 |/////////////| |/////////| 3901e00 |/////////////| |/////////| 3902000 00000000 (hole) | : | | : | : | : | | : | : | : | | : | : |/////////////| |/////////| 3a00000 00000000 (hole) 2a97be3e00 |/////////////| |/////////| 3a00200 | | | | 3a00400 | | | | 3a00600 | | | | 3a00800 | | | | 3a00a00 | | | | 3a00c00 | | | | | | | | +-------------+ +---------+ details from 1822328_vmcore_20080616 ------------------------------------ crash> sys KERNEL: /cores/20080425054953/work/vmlinux DUMPFILE: /cores/20080617043355/work/1822328_vmcore_20080616 CPUS: 16 DATE: Mon Jun 16 15:19:16 2008 UPTIME: 01:14:21 LOAD AVERAGE: 29.24, 34.39, 30.91 TASKS: 275 NODENAME: host.example.com RELEASE: 2.6.9-67.0.4.ELlargesmp VERSION: #1 SMP Fri Jan 18 05:10:01 EST 2008 MACHINE: x86_64 (3000 Mhz) MEMORY: 16 GB PANIC: "Oops: 0002 [1] SMP " (check log for details) crash> bt 21356 PID: 21356 TASK: 103f8f97030 CPU: 10 COMMAND: "oracle" #0 [100ba6ddac8] schedule at ffffffff8030e7e8 #1 [100ba6ddba0] io_schedule at ffffffff8030f15e #2 [100ba6ddbc0] __blockdev_direct_IO at ffffffff8019c9b2 #3 [100ba6ddc70] ext3_direct_IO at ffffffffa010d53a #4 [100ba6ddcf0] generic_file_direct_IO at ffffffff8015c8fa #5 [100ba6ddd30] generic_file_direct_write at ffffffff8015c980 #6 [100ba6ddd80] __generic_file_aio_write_nolock at ffffffff8015ccbc #7 [100ba6dde00] generic_file_aio_write_nolock at ffffffff8015cf9f #8 [100ba6dde30] generic_file_aio_write at ffffffff8015d069 #9 [100ba6dde80] ext3_file_write at ffffffffa0108f01 #10 [100ba6ddea0] aio_pwrite at ffffffff8019e1ab #11 [100ba6ddeb0] aio_run_iocb at ffffffff8019d53f #12 [100ba6dded0] io_submit_one at ffffffff8019e560 #13 [100ba6ddf00] sys_io_submit at ffffffff8019e665 #14 [100ba6ddf80] system_call at ffffffff8011026a RIP: 0000002a96f5f6a7 RSP: 0000007fbfff8890 RFLAGS: 00010246 RAX: 00000000000000d1 RBX: ffffffff8011026a RCX: 0000000000100000 RDX: 0000007fbfff6890 RSI: 0000000000000001 RDI: 0000002a971ce000 RBP: 0000000006003570 R8: 0000000000000000 R9: 0000000000000000 R10: 0000007fbfffb488 R11: 0000000000000202 R12: 0000007fbfff89d0 R13: 0000000000000001 R14: 0000007fbfff89d0 R15: 0000007fbfff89b0 ORIG_RAX: 00000000000000d1 CS: 0033 SS: 002b the process has been blocked for 24 minutes and 52 seconds ... crash> px per_cpu__runqueues | grep "\[10\]" [10]: 1040001d760 crash> px ((struct runqueue *)0x1040001d760)->timestamp_last_tick $1 = 0x420917ddc2d crash> px ((struct task_struct *)0x103f8f97030)->last_ran $2 = 0x2c50250db37 crash> pd (0x420917ddc2d - 0x2c50250db37) / 1000000000 $3 = 1492 <- seconds blocked the "running transaction" in /data ... crash> px ((struct task_struct *)0x103f8f97030)->journal_info $4 = (void *) 0x103f38461f0 crash> px ((struct handle_s *)0x103f38461f0)->h_transaction $5 = (transaction_t *) 0x100012f11c0 crash> mount | grep /data 10200042500 10037c5f800 ext3 /dev/md0 /data crash> px ((struct super_block *)0x10037c5f800)->s_fs_info $6 = (void *) 0x101e8de0000 crash> px ((struct ext3_sb_info *)0x101e8de0000)->s_journal $7 = (struct journal_s *) 0x10037d58c00 crash> px ((struct journal_s *)0x10037d58c00)->j_running_transaction $8 = (transaction_t *) 0x100012f11c0 "kjournald" PID 17538 is waiting on "t_updates" ... crash> pd ((transaction_t *)0x100012f11c0)->t_updates $9 = 1 crash> px &((struct journal_s *)0x10037d58c00)->j_wait_updates $10 = (wait_queue_head_t *) 0x10037d58ce0 crash> waitq 0x10037d58ce0 PID: 17538 TASK: 100dffb4030 CPU: 13 COMMAND: "kjournald" crash> bt 17538 PID: 17538 TASK: 100dffb4030 CPU: 13 COMMAND: "kjournald" #0 [101f0891ad8] schedule at ffffffff8030e7e8 #1 [101f0891bb0] journal_commit_transaction at ffffffffa00f5872 #2 [101f0891e80] kjournald at ffffffffa00f8a5c #3 [101f0891f50] kernel_thread at ffffffff80110f47 24 processes are waiting on "T_LOCKED" ... crash> pd ((transaction_t *)0x100012f11c0)->t_state $11 = T_LOCKED crash> px &((struct journal_s *)0x10037d58c00)->j_wait_transaction_locked $12 = (wait_queue_head_t *) 0x10037d58c68 crash> waitq 0x10037d58c68 | wc -l 24 crash> waitq 0x10037d58c68 PID: 21578 TASK: 101eb004030 CPU: 7 COMMAND: "ls" PID: 21329 TASK: 103f7782030 CPU: 5 COMMAND: "oracle" PID: 196 TASK: 103f8e93030 CPU: 14 COMMAND: "pdflush" PID: 21321 TASK: 101e7b077f0 CPU: 10 COMMAND: "oracle" PID: 21337 TASK: 101f8561030 CPU: 4 COMMAND: "oracle" PID: 21341 TASK: 103f4fbb030 CPU: 4 COMMAND: "oracle" PID: 21315 TASK: 101f03cc030 CPU: 4 COMMAND: "oracle" PID: 21319 TASK: 101f29877f0 CPU: 8 COMMAND: "oracle" PID: 21391 TASK: 10183645030 CPU: 2 COMMAND: "iozone" PID: 21386 TASK: 1018377f030 CPU: 7 COMMAND: "iozone" PID: 21384 TASK: 1018372a030 CPU: 15 COMMAND: "iozone" PID: 21387 TASK: 101837f97f0 CPU: 9 COMMAND: "iozone" PID: 21397 TASK: 10183fce030 CPU: 12 COMMAND: "iozone" PID: 21383 TASK: 1018372a7f0 CPU: 10 COMMAND: "iozone" PID: 21392 TASK: 10183efd7f0 CPU: 1 COMMAND: "iozone" PID: 21390 TASK: 101836457f0 CPU: 0 COMMAND: "iozone" PID: 21393 TASK: 10183efd030 CPU: 8 COMMAND: "iozone" PID: 21389 TASK: 10183e607f0 CPU: 14 COMMAND: "iozone" PID: 21394 TASK: 10183f647f0 CPU: 3 COMMAND: "iozone" PID: 21396 TASK: 10183fce7f0 CPU: 6 COMMAND: "iozone" PID: 21382 TASK: 101f4cfe030 CPU: 11 COMMAND: "iozone" PID: 21395 TASK: 10183f64030 CPU: 4 COMMAND: "iozone" PID: 21388 TASK: 101837f9030 CPU: 5 COMMAND: "iozone" PID: 21385 TASK: 1018377f7f0 CPU: 13 COMMAND: "iozone" crash> bt 21578 PID: 21578 TASK: 101eb004030 CPU: 7 COMMAND: "ls" #0 [101aaea3ca8] schedule at ffffffff8030e7e8 #1 [101aaea3d80] start_this_handle at ffffffffa00f3359 #2 [101aaea3e60] journal_start at ffffffffa00f34c6 #3 [101aaea3e80] ext3_dirty_inode at ffffffffa010d797 #4 [101aaea3ea0] __mark_inode_dirty at ffffffff80199b0c #5 [101aaea3ed0] update_atime at ffffffff80192f5e #6 [101aaea3f00] vfs_readdir at ffffffff8018c3a8 #7 [101aaea3f30] sys_getdents64 at ffffffff8018c784 #8 [101aaea3f80] system_call at ffffffff8011026a RIP: 0000003369d8c68e RSP: 0000007fbffff318 RFLAGS: 00010246 RAX: 00000000000000d9 RBX: ffffffff8011026a RCX: 0000000000000000 RDX: 0000000000001000 RSI: 000000000051ce18 RDI: 0000000000000003 RBP: 0000002a957836a0 R8: 0000003369f31ce8 R9: 0000000000000010 R10: 0000007fbffff201 R11: 0000000000000246 R12: 0000000000000001 R13: 0000007fbffff310 R14: 000000000051cde0 R15: 000000000051cdc0 ORIG_RAX: 00000000000000d9 CS: 0033 SS: 002b the copy of the user iocb on the kernel stack of "oracle" PID 21356 ... crash> bt -f 21356 PID: 21356 TASK: 103f8f97030 CPU: 10 COMMAND: "oracle" ... #13 [100ba6ddf00] sys_io_submit at ffffffff8019e665 -+ 100ba6ddf08: 0000002a97ec39b0 0000000000000000 | 100ba6ddf18: 0000001600000001 0000002a97ce3e00 | copy of 100ba6ddf28: 0000000000100000 0000000003900200 | user iocb 100ba6ddf38: 0000000000000000 0000000000000000 | -+ 100ba6ddf48: 0000000000000000 0000000000000001 100ba6ddf58: 0000007fbfff89b0 0000007fbfff89d0 100ba6ddf68: 0000000000000001 0000007fbfff89d0 100ba6ddf78: 0000000006003570 ffffffff8011026a ... crash> px *(struct iocb *)0x100ba6ddf08 $13 = { aio_data = 0x2a97ec39b0, aio_key = 0x0, aio_reserved1 = 0x0, aio_lio_opcode = 0x1 = IOCB_CMD_PWRITE, aio_reqprio = 0x0, aio_fildes = 0x16 =22, file descriptor aio_buf = 0x2a97ce3e00, | aio_nbytes = 0x100000, | aio_offset = 0x3900200, | aio_reserved2 = 0x0, | aio_reserved3 = 0x0 | } | v crash> files 21356 | grep " 22 " 22 103f64a2980 10037cc6138 101d1a0a810 REG /data/oracle/archive/DBTEST/1_472_657217547.dbf the dio structure pertaining to the above iocb ... crash> bt -f 21356 PID: 21356 TASK: 103f8f97030 CPU: 10 COMMAND: "oracle" ... #1 [100ba6ddba0] io_schedule at ffffffff8030f15e 100ba6ddba8: 00000101f83af210 00000101f3c8bb18 saved rbp (dio) 100ba6ddbb8: 00000101f3c8b800 ffffffff8019c9b2 ... crash> px *(struct dio *)0x101f3c8b800 $7 = { bio = 0x0, inode = 0x101d1a0a810, rw = 0x1 = WRITE, i_size = 0xbce0200, lock_type = 0x1 = DIO_LOCKING, blkbits = 0x9, blkfactor = 0x3, start_zero_done = 0x1, pages_in_io = 0x102, size = 0x100000, block_in_file = 0x1c808, blocks_available = 0x0, final_block_in_request = 0x1d001, first_block_in_page = 0x7, boundary = 0x0, reap_counter = 0x1, get_blocks = 0xffffffffa010c4c8 <ext3_direct_io_get_blocks>, end_io = 0, final_block_in_bio = 0xb4c7c8f8, next_block_for_io = 0xb4c7c8f8, map_bh = { b_state = 0x0, b_this_page = 0xfffffffffffffffc, b_page = 0x5d2, b_count = { counter = 0x2 }, b_size = 0x1000, b_blocknr = 0x1698f91e, b_data = 0x47b b_bdev = 0x103f8f263c0, b_end_io = 0xfffffffffffffffc, b_private = 0x0, b_assoc_buffers = { next = 0x9c00000002, prev = 0xfffffffffffffffc } }, cur_page = 0x0, cur_page_offset = 0x0, cur_page_len = 0xc00, cur_page_block = 0xb4c7c8f2, curr_page = 0x40, total_pages = 0x101, curr_user_address = 0x2a97d23e00, pages = {0x101fb401cb8, 0x101fe55fcd8, 0x101fe55fd10, 0x101fe55fd48, 0x101fe55 fd80, 0x101fe55fdb8, 0x101fe55fdf0, 0x101fe55fe28, 0x101fe5513e0, 0x101fe551418, 0x101fe551450, 0x101fe551488, 0x101fe5514c0, 0x101fe5514f8, 0x101fe551530, 0x10 1fe551568, 0x101fe5515a0, 0x101fe5515d8, 0x101fe551610, 0x101fe551648, 0x101fe55 1680, 0x101fe5516b8, 0x101fe5516f0, 0x101fe551728, 0x101fe551060, 0x101fe551098, 0x101fe5510d0, 0x101fe551108, 0x101fe551140, 0x101fe551178, 0x101fe5511b0, 0x10 1fe5511e8, 0x101fe551220, 0x101fe551258, 0x101fe551290, 0x101fe5512c8, 0x101fe55 1a00, 0x101fe551a38, 0x101fe551a70, 0x101fe551aa8, 0x101fe53f4e0, 0x101fe53f518, 0x101fe53f550, 0x101fe53f588, 0x101fe53f5c0, 0x101fe53f5f8, 0x101fe53f630, 0x10 1fe53f668, 0x101fe53f6a0, 0x101fe53f6d8, 0x101fe53f710, 0x101fe53f748, 0x101fe53 f780, 0x101fe53f7b8, 0x101fe53f7f0, 0x101fe53f828, 0x101fe53f860, 0x101fe53f898, 0x101fe53f8d0, 0x101fe53f908, 0x101fe53f940, 0x101fe53f978, 0x101fe53f9b0, 0x10 1fe53f9e8}, head = 0x40, tail = 0x40, page_errors = 0x0, refcount = { counter = 0x1 <----------+ reference count is 1 (no ongoing I/O) }, | bio_lock = { | lock = 0x1, | magic = 0xdead4ead | }, | bio_list = 0x0, | waiter = 0x103f8f97030, <--+ but waiter is not NULL iocb = 0x10198315480, | is_async = 0x1, | io_error = 0x0, | result = 0xe00 | } | | crash> ps | grep 103f8f97030 v 21356 1 10 103f8f97030 UN 0.2 10634640 30764 oracle Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -There is a race between dio_bio_end_aio() and dio_await_one() which can lead to a situation where a process performing asynchronous, direct I/O is left waiting indefinitely on an I/O that has already completed. The solution is to lock the reference counting operations so that the submission and completion paths see a unified state.+Previously, in the kernel, there was a race condition may have been encountered between dio_bio_end_aio() and dio_await_one(). This may have lead to a situation where direct I/O is left waiting indefinitely on an I/O process that has already completed. With this update, the the reference counting operations are locked so that the submission and completion paths see a unified state, which resolves this issue. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html |