| Summary: | GFS2: iozone over nfs causes AIL error | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Abhijith Das <adas> | ||||
| Component: | kernel | Assignee: | Steve Whitehouse <swhiteho> | ||||
| Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | rawhide | CC: | adas, anprice, bmarzins, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, nstraz, rpeterso | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 740066 (view as bug list) | Environment: | |||||
| Last Closed: | 2011-10-31 08:37:23 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Bug Depends On: | |||||||
| Bug Blocks: | 740066 | ||||||
| Attachments: |
|
||||||
|
Description
Abhijith Das
2011-08-15 14:53:41 UTC
I think the way to approach this is to look to see which nfs operation is dirtying blocks without holding the i_mutex. We could probably check for having the i_mutex locked at each call to gfs2_trans_begin() and then see if the code path from nfs shows up. My thought is that we should not be seeing blocks which are dirty or pinned unless that is happening without the i_mutex being held. Steve,
I added code like this in front of *all* occurrences of gfs2_trans_begin. I guess I shouldn't have done it for all because I see it triggered a lot of times from different parts of the code - some of them pertaining to gfs2's internal files (quota, statfs etc).
if (!mutex_is_locked(&(dip->i_inode.i_mutex))) {
printk(KERN_WARNING "i_mutex not held in %s:%s:%d\n", __FILE__, __FUNCTION__, __LINE__);
dump_stack();
}
This is just before the withdraw:
[49619.639483] i_mutex not held in fs/gfs2/glops.c:gfs2_ail_flush:122
[49619.646415] Pid: 2341, comm: nfsd Tainted: G W 3.0.0 #2
[49619.653129] Call Trace:
[49619.655912] [<ffffffffa01e0bd5>] gfs2_ail_flush+0x5d/0x95 [gfs2]
[49619.662754] [<ffffffffa01e711a>] gfs2_fsync+0xb2/0xe8 [gfs2]
[49619.669193] [<ffffffff81143993>] vfs_fsync_range+0x23/0x25
[49619.675456] [<ffffffffa02527f6>] nfsd_commit+0x76/0xa1 [nfsd]
[49619.682007] [<ffffffffa025c312>] nfsd4_commit+0x28/0x37 [nfsd]
[49619.688648] [<ffffffffa025bc62>] nfsd4_proc_compound+0x244/0x42c [nfsd]
[49619.696162] [<ffffffffa024e82b>] nfsd_dispatch+0xeb/0x1ce [nfsd]
[49619.703024] [<ffffffffa018c695>] svc_process_common+0x2d7/0x4d8 [sunrpc]
[49619.710631] [<ffffffffa024e02a>] ? nfsd_shutdown+0x2a/0x2a [nfsd]
[49619.717566] [<ffffffffa018cab7>] svc_process+0x11c/0x13c [sunrpc]
[49619.724489] [<ffffffffa024e11b>] nfsd+0xf1/0x135 [nfsd]
[49619.730438] [<ffffffff8106b3d6>] kthread+0x82/0x8a
[49619.735898] [<ffffffff81493754>] kernel_thread_helper+0x4/0x10
[49619.742521] [<ffffffff8106b354>] ? kthread_worker_fn+0x14b/0x14b
[49619.749341] [<ffffffff81493750>] ? gs_change+0x13/0x13
[49621.504558] GFS2: fsid=Smoke_Cluster:gfs2.3: AIL buffer ffff88006d4d2af8: blo cknr 142063 state 0x00000023 mapping ffff8800629cd620 page state 0x2000000000087 4
[49621.520403] GFS2: fsid=Smoke_Cluster:gfs2.3: AIL glock 2:141579 mapping ffff8 800629cd620
[49621.529450] AIL error
[49621.531993] GFS2: fsid=Smoke_Cluster:gfs2.3: about to withdraw this file syst em
[49621.540220] GFS2: fsid=Smoke_Cluster:gfs2.3: telling LM to unmount
[49621.558625] GFS2: fsid=Smoke_Cluster:gfs2.3: withdrawn
[49621.564433] Pid: 2341, comm: nfsd Tainted: G W 3.0.0 #2
[49621.571177] Call Trace:
[49621.573990] [<ffffffffa01f6a73>] gfs2_lm_withdraw+0xd7/0x10a [gfs2]
[49621.581134] [<ffffffff81488f9f>] ? printk+0x68/0x71
[49621.586730] [<ffffffffa01e06c4>] __gfs2_ail_flush+0xe8/0x18c [gfs2]
[49621.593877] [<ffffffffa01e0bf3>] gfs2_ail_flush+0x7b/0x95 [gfs2]
[49621.600737] [<ffffffffa01e711a>] gfs2_fsync+0xb2/0xe8 [gfs2]
[49621.607195] [<ffffffff81143993>] vfs_fsync_range+0x23/0x25
[49621.613475] [<ffffffffa02527f6>] nfsd_commit+0x76/0xa1 [nfsd]
[49621.620059] [<ffffffffa025c312>] nfsd4_commit+0x28/0x37 [nfsd]
[49621.626720] [<ffffffffa025bc62>] nfsd4_proc_compound+0x244/0x42c [nfsd]
[49621.634262] [<ffffffffa024e82b>] nfsd_dispatch+0xeb/0x1ce [nfsd]
[49621.641131] [<ffffffffa018c695>] svc_process_common+0x2d7/0x4d8 [sunrpc]
[49621.648769] [<ffffffffa024e02a>] ? nfsd_shutdown+0x2a/0x2a [nfsd]
[49621.655730] [<ffffffffa018cab7>] svc_process+0x11c/0x13c [sunrpc]
[49621.662680] [<ffffffffa024e11b>] nfsd+0xf1/0x135 [nfsd]
[49621.668649] [<ffffffff8106b3d6>] kthread+0x82/0x8a
[49621.674131] [<ffffffff81493754>] kernel_thread_helper+0x4/0x10
[49621.680780] [<ffffffff8106b354>] ? kthread_worker_fn+0x14b/0x14b
[49621.687620] [<ffffffff81493750>] ? gs_change+0x13/0x13
[49621.693504] GFS2: fsid=Smoke_Cluster:gfs2.3: AIL buffer ffff880064104478: blo cknr 142090 state 0x00000023 mapping ffff8800629cd620 page state 0x2000000000087 4
[49621.709370] GFS2: fsid=Smoke_Cluster:gfs2.3: AIL glock 2:141579 mapping ffff8 800629cd620
[49621.718438] GFS2: fsid=Smoke_Cluster:gfs2.3: AIL buffer ffff8800642fb680: blo cknr 142600 state 0x00000023 mapping ffff8800629cd620 page state 0x2000000000087 4
[49621.734296] GFS2: fsid=Smoke_Cluster:gfs2.3: AIL glock 2:141579 mapping ffff8 800629cd620
[49621.743363] GFS2: fsid=Smoke_Cluster:gfs2.3: AIL buffer ffff8800640cee38: blo cknr 143110 state 0x00000023 mapping ffff8800629cd620 page state 0x2000000000087 4
[49621.759221] GFS2: fsid=Smoke_Cluster:gfs2.3: AIL glock 2:141579 mapping ffff8
.... there's a lot of above lines, followed by ....
[49628.115224] Pid: 2341, comm: nfsd Tainted: G W 3.0.0 #2
[49628.121944] Call Trace:
[49628.124717] [<ffffffffa01f6ae2>] gfs2_assert_withdraw_i+0x3c/0x4a [gfs2]
[49628.132335] [<ffffffffa01f673c>] gfs2_trans_end+0x170/0x1ed [gfs2]
[49628.139363] [<ffffffffa01e0bfb>] gfs2_ail_flush+0x83/0x95 [gfs2]
[49628.146557] [<ffffffffa01e711a>] gfs2_fsync+0xb2/0xe8 [gfs2]
[49628.152989] [<ffffffff81143993>] vfs_fsync_range+0x23/0x25
[49628.159235] [<ffffffffa02527f6>] nfsd_commit+0x76/0xa1 [nfsd]
[49628.165774] [<ffffffffa025c312>] nfsd4_commit+0x28/0x37 [nfsd]
[49628.172410] [<ffffffffa025bc62>] nfsd4_proc_compound+0x244/0x42c [nfsd]
[49628.179913] [<ffffffffa024e82b>] nfsd_dispatch+0xeb/0x1ce [nfsd]
[49628.186752] [<ffffffffa018c695>] svc_process_common+0x2d7/0x4d8 [sunrpc]
[49628.194354] [<ffffffffa024e02a>] ? nfsd_shutdown+0x2a/0x2a [nfsd]
[49628.201287] [<ffffffffa018cab7>] svc_process+0x11c/0x13c [sunrpc]
[49628.208211] [<ffffffffa024e11b>] nfsd+0xf1/0x135 [nfsd]
[49628.214155] [<ffffffff8106b3d6>] kthread+0x82/0x8a
[49628.219616] [<ffffffff81493754>] kernel_thread_helper+0x4/0x10
[49628.226237] [<ffffffff8106b354>] ? kthread_worker_fn+0x14b/0x14b
[49628.233050] [<ffffffff81493750>] ? gs_change+0x13/0x13
[49628.238893] GFS2: fsid=Smoke_Cluster:gfs2.3: tr_num_revoke = 259, tr_revokes = 1
[49628.247087] GFS2: Transaction created at: gfs2_ail_flush+0x6a/0x95 [gfs2]
I hope I understood you right and this info is helpful. I do have the whole /var/log/messages file saved off too in case you want to look at the other bits.
The code in gfs2_fsync looks like this in -nmw:
if (sync_state) {
587 mutex_lock(&inode->i_mutex);
588 ret = sync_inode_metadata(inode, 1);
589 if (ret) {
590 mutex_unlock(&inode->i_mutex);
591 return ret;
592 }
593 if (gfs2_is_jdata(ip))
594 filemap_write_and_wait(mapping);
595 gfs2_ail_flush(ip->i_gl);
596 mutex_unlock(&inode->i_mutex);
597 }
So I'm wondering how we managed to call gfs2_ail_flush() without holding the i_mutex? Maybe you are working off a slightly different version of the code or maybe I've missed another path to gfs2_ail_flush() from gfs2_fsync()
At least the bug trap has done its job and pointed out where the issue lies.
There is a good argument for using the i_mutex on the internal files. We have to bear in mind that we may need to annotate those locks wrt to i_mutex of the inode that is being operated on though. That is something that is still missing from quota, see bug #589070 for example. On the other hand, I doubt that these are likely to be a problem wrt to NFS since it can only fsync files which it is directly referencing, and the metadata fs is not exported. So we can probably treat any issues flagged up for internal files as a false positive for now.
I backed out my patch and running iozone on the latest nmw code directly on the gfs2 fs also crashes with this: [ 3291.997635] GFS2: fsid=Smoke_Cluster:gfs2.3: AIL buffer ffff88006da6ba90: blocknr 141 687 state 0x00004029 mapping ffff8800748b3aa0 page state 0x20000000000864 [ 3292.013583] GFS2: fsid=Smoke_Cluster:gfs2.3: AIL glock 2:141687 mapping ffff8800748b3aa0 [ 3292.022672] AIL error [ 3292.025245] GFS2: fsid=Smoke_Cluster:gfs2.3: about to withdraw this file system [ 3292.034819] GFS2: fsid=Smoke_Cluster:gfs2.3: telling LM to unmount [ 3292.044028] GFS2: fsid=Smoke_Cluster:gfs2.3: withdrawn [ 3292.049796] Pid: 1931, comm: iozone Tainted: G W 3.0.0 #2 [ 3292.056718] Call Trace: [ 3292.059588] [<ffffffffa01d50b3>] gfs2_lm_withdraw+0xd7/0x10a [gfs2] [ 3292.066727] [<ffffffff81488f9f>] ? printk+0x68/0x71 [ 3292.072325] [<ffffffffa01bf4a4>] __gfs2_ail_flush+0xe8/0x18c [gfs2] [ 3292.079464] [<ffffffffa01bf94a>] gfs2_ail_flush+0x3b/0x52 [gfs2] [ 3292.086322] [<ffffffffa01c5dd6>] gfs2_fsync+0xb2/0xe8 [gfs2] [ 3292.092793] [<ffffffff81143993>] vfs_fsync_range+0x23/0x25 [ 3292.099051] [<ffffffff811439b1>] vfs_fsync+0x1c/0x1e [ 3292.104715] [<ffffffff81143a27>] do_fsync+0x2e/0x43 [ 3292.110284] [<ffffffff81143c7e>] sys_fsync+0x10/0x14 [ 3292.115951] [<ffffffff814915c2>] system_call_fastpath+0x16/0x1b [ 3292.124421] ------------[ cut here ]------------ [ 3292.129594] kernel BUG at fs/gfs2/trans.c:160! [ 3292.134572] invalid opcode: 0000 [#1] SMP [ 3292.139191] CPU 2 [ 3292.141253] Modules linked in: gfs2 dlm configfs sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ipv6 amd64_edac_mod edac_core edac_mce_amd k8temp forcedeth i2c_nforce2 i2c_core shpchp pcspkr serio_ra w ata_generic qla2xxx scsi_transport_fc scsi_tgt pata_acpi sata_nv pata_amd [last unloaded: scsi_wait_sc an] [ 3292.172832] [ 3292.174507] Pid: 1931, comm: iozone Tainted: G W 3.0.0 #2 InventecESC IR2400/MLB2203 [ 3292.184098] RIP: 0010:[<ffffffffa01d4e5b>] [<ffffffffa01d4e5b>] gfs2_trans_add_revoke+0x16/0x46 [gfs 2] [ 3292.194647] RSP: 0018:ffff88007b861e28 EFLAGS: 00010206 [ 3292.200592] RAX: ffff88013ba4b698 RBX: ffff8800748b3908 RCX: ffff88006c28e6b0 [ 3292.208573] RDX: ffff88006c28e680 RSI: ffff88006c28e660 RDI: ffff88013ba4b000 [ 3292.216557] RBP: ffff88007b861e28 R08: ffff88006c28e6b0 R09: ffffffff814915c2 [ 3292.224538] R10: ffff8800fb861b27 R11: 0000000000000000 R12: ffff88006da6ba90 [ 3292.232519] R13: ffff88013ba4b000 R14: ffff88006c28e6b0 R15: ffff88013ba4b670 [ 3292.240500] FS: 00007f43629b4720(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000 [ 3292.249540] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 3292.255969] CR2: 00007fff243c26b8 CR3: 000000013b81c000 CR4: 00000000000006e0 [ 3292.263953] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3292.271934] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 3292.279915] Process iozone (pid: 1931, threadinfo ffff88007b860000, task ffff88007c7a5c80) [ 3292.289156] Stack: [ 3292.291411] ffff88007b861ea8 ffffffffa01bf4e2 0020000000000864 ffff88013b7949c0 [ 3292.299742] ffff88013b7949d8 0000000000004002 ffff8800748b39e8 ffff88013ba4b790 [ 3292.308099] 0000000000022977 ffff88006c28e660 0000000000000000 ffff88013ba4b000 [ 3292.316444] Call Trace: [ 3292.319197] [<ffffffffa01bf4e2>] __gfs2_ail_flush+0x126/0x18c [gfs2] [ 3292.326406] [<ffffffffa01bf94a>] gfs2_ail_flush+0x3b/0x52 [gfs2] [ 3292.333233] [<ffffffffa01c5dd6>] gfs2_fsync+0xb2/0xe8 [gfs2] [ 3292.339657] [<ffffffff81143993>] vfs_fsync_range+0x23/0x25 [ 3292.345893] [<ffffffff811439b1>] vfs_fsync+0x1c/0x1e [ 3292.351548] [<ffffffff81143a27>] do_fsync+0x2e/0x43 [ 3292.357104] [<ffffffff81143c7e>] sys_fsync+0x10/0x14 [ 3292.362762] [<ffffffff814915c2>] system_call_fastpath+0x16/0x1b [ 3292.369471] Code: 48 8b 73 40 4c 89 e7 48 83 c6 20 e8 70 fc ff ff 5b 41 5c c9 c3 55 48 89 e5 66 66 66 66 90 48 8b 46 20 48 8d 56 20 48 39 d0 74 02 <0f> 0b 48 8d 56 40 48 39 56 40 74 02 0f 0b 48 8d 56 50 48 39 56 [ 3292.391471] RIP [<ffffffffa01d4e5b>] gfs2_trans_add_revoke+0x16/0x46 [gfs2] [ 3292.399395] RSP <ffff88007b861e28> [ 3292.403568] ---[ end trace 69d50c31f4ab9b1d ]--- Trying to reproduce the issue here... iozone -a on its own isn't enough, now trying the command line listed above to see if that will trigger it. I've now run two tests with the above command line, and still no problem. Is there anything special I need to reproduce this? Created attachment 521511 [details]
Proposed fix for this issue
The plan is to ignore buffers on the ail list in the wrong state during fsync. There should in general be very few.
Due to lock ordering we have to hold both ail lock and log lock over the whole loop. I've run a few very basic tests and it seems to be holding up so far.
This patch works for me. I was able to run multiple iterations of iozone over nfs on gfs2 with this patch, whereas, without the patch I can hit the AIL error mentioned earlier. Patch has now been posted to cluster-devel |