Description of problem: This issue is forked from bz 253990 where GFS2 glock is reverted from page level locking back to system call layer due to 20%-70% stream IO performance degradation when compared with GFS1. The work goes well for writeback and order-write journaling modes but data journaling mode keeps failing the tests. Most of the problems are due to the conflicted assumptions made between gfs2_jdata_writepages() introduced via bz 303351 and the existing GFS2 code structure inherited from GFS1 days that disallows writepage mixing with journal operations. One particular troublesome area is mmap on top of data journaling. This bugzilla is opened to track these issues. Some of the problems have patches ready. Some of them are not yet well understood. Version-Release number of selected component (if applicable): RHEL 5.1 53.el5 gfs-kmod code base + bz303351 patches + bz253990 patches How reproducible: Run QA's dd_io test with jdata mode will most likely to hit them. Additional info: Three sampled issues: Case 1: recursive deadlock by journal semaphore (sd_log_flush_lock) [<c0610fdc>] rwsem_down_write_failed+0x128/0x143 [<c0439619>] .text.lock.rwsem+0x42/0x89 [<e0737e52>] gfs2_log_flush+0x18/0x420 [gfs2] [<e0737e52>] gfs2_log_flush+0x18/0x420 [gfs2] [<e073b645>] gfs2_jdata_writepages+0x34/0x46 [gfs2] [<c045e191>] do_writepages+0x20/0x32 [<c0459c51>] __filemap_fdatawrite_range+0x65/0x70 [<c0459cd5>] filemap_fdatawrite_range+0xf/0x13 [<c0459d2a>] sync_page_range_nolock+0x51/0x93 [<c045b0b3>] generic_file_aio_write_nolock+0x71/0x83 [<e073d490>] gfs2_write+0x0/0xe [gfs2] [<c045b41a>] generic_file_write_nolock+0x86/0x9a [<c0436abf>] autoremove_wake_function+0x0/0x2d [<e074808b>] gfs2_do_trans_begin+0xc9/0xff [gfs2] [<e073cef8>] gfs2_write_i+0x2f9/0x41c [gfs2] [<e073d490>] gfs2_write+0x0/0xe [gfs2] [<e073d49b>] gfs2_write+0xb/0xe [gfs2] [<c047651d>] do_readv_writev+0x185/0x277 [<c044f287>] audit_syscall_entry+0x11c/0x14e [<c0476646>] vfs_writev+0x37/0x43 Case 2: recursive transaction glock Kernel BUG at fs/gfs2/glock.c:1122 Pid: 5894, comm: d_doio Not tainted 2.6.18-53.el5debug #1 RIP: [<ffffffff8844336e>] :gfs2:gfs2_glock_nq+0x11a/0x1e7 Call Trace: [<ffffffff884586ca>] :gfs2:gfs2_do_trans_begin+0xb8/0x129 [<ffffffff8844a6ba>] :gfs2:gfs2_write_cache_jdata+0x141/0x359 [<ffffffff8844b021>] :gfs2:gfs2_jdata_writepages+0x27/0x5c [<ffffffff8005ba1a>] do_writepages+0x23/0x32 [<ffffffff8004fd0d>] __filemap_fdatawrite_range+0x57/0x61 [<ffffffff800c71fb>] sync_page_range_nolock+0x3c/0x7a [<ffffffff800c75cd>] generic_file_aio_write_nolock+0x57/0x6c [<ffffffff800c79cc>] generic_file_write_nolock+0x8f/0xa8 [<ffffffff80066b82>] _spin_unlock_irq+0x24/0x27 [<ffffffff88447c07>] :gfs2:gfs2_log_reserve+0x133/0x189 [<ffffffff800a016b>] autoremove_wake_function+0x0/0x2e [<ffffffff884586f9>] :gfs2:gfs2_do_trans_begin+0xe7/0x129 [<ffffffff8844ca71>] :gfs2:gfs2_write_i+0x325/0x44b [<ffffffff80016ead>] vfs_write+0xce/0x174 [<ffffffff800177a5>] sys_write+0x45/0x6e [<ffffffff8005e2a6>] tracesys+0xd5/0xdf Case 3: journaled buffer is dirty but unmapped (cause unknown) Kernel BUG at fs/buffer.c:2813 Pid: 5270, comm: glock_workqueue Not tainted 2.6.18-53.el5debug #1 RIP [<ffffffff8001ae2f>] submit_bh+0x1f/0x111 [<ffffffff88446b75>] :gfs2:gfs2_ail1_start_one+0x12f/0x17f [<ffffffff88446c66>] :gfs2:gfs2_ail1_start+0xa1/0xc3 [<ffffffff88447ca2>] :gfs2:gfs2_log_reserve+0xec/0x189 [<ffffffff8844347e>] :gfs2:gfs2_glock_nq+0x1c1/0x1e7 [<ffffffff88458b13>] :gfs2:gfs2_do_trans_begin+0x105/0x147 [<ffffffff8844a874>] :gfs2:gfs2_write_cache_jdata+0x141/0x359 [<ffffffff884439b0>] :gfs2:glock_work_func+0x0/0x43 [<ffffffff8844b147>] :gfs2:gfs2_jdata_writepages+0x27/0x5c [<ffffffff8005ba1a>] do_writepages+0x23/0x32 [<ffffffff8004fd0d>] __filemap_fdatawrite_range+0x57/0x61 [<ffffffff88443fea>] :gfs2:inode_go_sync+0x78/0xdf [<ffffffff884427cf>] :gfs2:gfs2_glock_drop_th+0x20/0x138 [<ffffffff88442e68>] :gfs2:run_queue+0xef/0x2e2 [<ffffffff884439b0>] :gfs2:glock_work_func+0x0/0x43 [<ffffffff884439b0>] :gfs2:glock_work_func+0x0/0x43 [<ffffffff884439df>] :gfs2:glock_work_func+0x2f/0x43 [<ffffffff8003d566>] remove_wait_queue+0x10/0x2c [<ffffffff8004d9c1>] run_workqueue+0x9a/0xf4 [<ffffffff8004a288>] worker_thread+0x0/0x122 [<ffffffff8009ffb7>] keventd_create_kthread+0x0/0x66 [<ffffffff8004a378>] worker_thread+0xf0/0x122 [<ffffffff8008cc76>] default_wake_function+0x0/0xe [<ffffffff8009ffb7>] keventd_create_kthread+0x0/0x66 [<ffffffff800341ae>] kthread+0xfe/0x132 [<ffffffff80066451>] trace_hardirqs_on_thunk+0x35/0x37 [<ffffffff8005f079>] child_rip+0xa/0x11 [<ffffffff8005e6a8>] restore_args+0x0/0x30
"Case 3: journaled buffer is dirty but unmapped" does not look like a result of the performance fix (system-call-vs-page-level). It seems to be a long-time gfs2 lurking problem just sitting there and waits for the right timing to pop out.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
As discussed in the GFS2 team meeting, this bugzilla is only valid if we want to move glock locking back to system call layer. The plan (syscall locking) is put-off (bugzilla 253990) at this moment. Tentatively close this bugzilla. Will re-open if needed.