Bug 236087
Summary: | GFS2: mmap problems with distributed test cases | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Nate Straz <nstraz> | ||||||||
Component: | kernel | Assignee: | Don Zickus <dzickus> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Dean Jansa <djansa> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 5.0 | CC: | jbacik, kanderso, lwang, swhiteho | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | RHBA-2007-0959 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2007-11-07 19:46:35 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Nate Straz
2007-04-11 21:30:35 UTC
A secondary oops on a different node: BUG: unable to handle kernel paging request at virtual address 40000001 printing eip: e05fabee *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: lock_dlm gfs2 dlm configfs qla2xxx CPU: 0 EIP: 0060:[<e05fabee>] Not tainted VLI EFLAGS: 00010297 (2.6.21-rc6 #2) EIP is at gfs2_readpage+0x65/0x17a [gfs2] eax: 000015d8 ebx: ca8ee07c ecx: 40000001 edx: 40000001 esi: c115e7a0 edi: 0001e9ec ebp: cafa2030 esp: cb045dcc ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process d_doio (pid: 5592, ti=cb044000 task=cb297a30 task.ti=cb044000) Stack: 000004de c115e7a0 d18c2000 c115e7a0 c115e7a0 cafa20d4 00000000 c0137335 000201d2 c115e7a0 0001e9ec 00000000 000201d2 c115e7a0 0001e9ec 000004de c0137866 000000d0 00000020 df70dec0 df70df08 cafa20d4 cafa2030 00001387 Call Trace: [<c0137335>] add_to_page_cache+0x60/0x70 [<c0137866>] do_generic_mapping_read+0x209/0x43b [<c013967a>] generic_file_aio_read+0x173/0x1a5 [<c0136f78>] file_read_actor+0x0/0xd1 [<c015165b>] do_sync_read+0xc7/0x10a [<c01297a5>] autoremove_wake_function+0x0/0x35 [<c0151594>] do_sync_read+0x0/0x10a [<c0151dbe>] vfs_read+0x88/0x10a [<c01521bc>] sys_read+0x41/0x67 [<c01030d8>] sysenter_past_esp+0x5d/0x81 ======================= Code: 85 d1 00 00 00 8b 9d 90 01 00 00 8d 43 1c e8 38 10 e1 df 8b 53 38 eb 13 64 a1 08 00 00 00 8b 80 a4 00 00 00 39 42 0c 74 4f 89 ca <8b> 0a 0f 18 01 90 8d 43 38 39 c2 75 e0 31 c0 c6 43 1c 01 85 c0 EIP: [<e05fabee>] gfs2_readpage+0x65/0x17a [gfs2] SS:ESP 0068:cb045dcc dlm: connecting to 1 Created attachment 152323 [details]
Patch that was applied to gfs2 when we ran this test.
We reverted the portion of the patch from gfs2_readpage, but were still able to hit panics with d_mmap1. kernel BUG at fs/gfs2/ops_address.c:200! invalid opcode: 0000 [#1] SMP Modules linked in: qla2xxx lock_dlm gfs2 dlm configfs CPU: 0 EIP: 0060:[<e0533336>] Not tainted VLI EFLAGS: 00010202 (2.6.21-rc6 #2) EIP is at stuffed_readpage+0x15/0xe4 [gfs2] eax: c7a91678 ebx: c7a91678 ecx: 0000018a edx: c134f0e0 esi: 00000000 edi: c134f0e0 ebp: c66b1dd8 esp: c66b1d9c ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process d_doio (pid: 4167, ti=c66b0000 task=c15ebab0 task.ti=c66b0000) Stack: 00000000 00000003 00000000 0000019d c134f0e0 c7a91678 dcac7a30 c7a91678 00000000 c134f0e0 c66b1dd8 e0533992 c66b1dd8 00000001 dc8ed000 c7331dd4 c7331dd4 c7331d9c 00001047 00000003 00000202 00000000 000000c2 e0533960 Call Trace: [<e0533992>] gfs2_readpage+0x83/0xef [gfs2] [<e0533960>] gfs2_readpage+0x51/0xef [gfs2] [<c0137866>] do_generic_mapping_read+0x209/0x43b [<c013967a>] generic_file_aio_read+0x173/0x1a5 [<c0136f78>] file_read_actor+0x0/0xd1 [<c015165b>] do_sync_read+0xc7/0x10a [<c01297a5>] autoremove_wake_function+0x0/0x35 [<c0151594>] do_sync_read+0x0/0x10a [<c0151dbe>] vfs_read+0x88/0x10a [<c01521bc>] sys_read+0x41/0x67 [<c01030d8>] sysenter_past_esp+0x5d/0x81 [<c0400000>] rpcauth_marshcred+0x39/0x52 ======================= Code: ee 0f 95 c2 85 db 0f 94 c0 01 fb 08 c2 75 cb 5b 5e 5b 5e 5f 5d c3 55 57 56 53 83 ec 1c 89 44 24 14 89 54 24 10 83 7a 14 00 74 04 <0f> 0b eb fe 8b 44 24 14 8b 90 48 01 00 00 8b 88 4c 01 00 00 8d EIP: [<e0533336>] stuffed_readpage+0x15/0xe4 [gfs2] SS:ESP 0068:c66b1d9c This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Its interesting to note that Ken had left a case to cover this particular occurance in the code, which was then removed: http://git.kernel.org/?p=linux/kernel/git/steve/gfs2-2.6-nmw.git;a=commitdiff;h=61057c6bb3a3d14cf2bea6ca20dc6d367e1d852e The remaining question then, is why this is an apparently valid code path since we'd expect all inodes to be converted from being stuffed upon mmap(). Now I see whats happening. Its related to the ordering in the page fault path. We call the VFS's nopage (which calls gfs2_readpage()) before we add the buffers to the page (since before readpage, there is no page). As a result if the first page fault to a stuffed file which has been extended is above the initial page mark then we can come down this code path. So we need to add back Ken's fix for this, but with an appropriate flush_dcache_page() for the less coherent architectures. Created attachment 153023 [details] Patch to attempt to fix the problem encountered When testing this patch, the one from bz #236039 should also be applied (i.e. the cleaned up version of the other patch attached to this bug). I'll push this patch upstream shortly. Josef, Nate, is one of you in a position to give this a test run with the QE test suite? I'm pretty sure this is the right fix. Patch has been pushed upstream now into the -nmw git tree. Created attachment 154462 [details]
Patch for RHEL 5.1
The attached patch takes RHEL 5.1 up to the same level as upstream.
Again, this is one I'd like to get in, even if we need further changes later on
since Don is off on holiday shortly, so please open another bug if we need some
more changes in this area rather than getting this one back from POST.
in 2.6.18-19.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 I have not been able to hit this with recent builds (-32). moving to MODIFIED for errata tool QE note this bz has already been verified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html |