Bug 245009
Summary: | GFS2 - filesystem consistency error | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Abhijith Das <adas> | ||||||||
Component: | kernel | Assignee: | Steve Whitehouse <swhiteho> | ||||||||
Status: | CLOSED DUPLICATE | QA Contact: | Dean Jansa <djansa> | ||||||||
Severity: | low | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | 5.0 | CC: | cluster-maint, nstraz | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2007-08-14 13:48:36 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Abhijith Das
2007-06-20 15:01:31 UTC
Created attachment 157475 [details]
Debug patch
This should gather the required information. I'm not so sure about pushing this
upstream yet as I think it might be wise to merge some of it into the
gfs2_consist_rgrp() call. I need to check the other callers of that first.
Problem occured while running revolver recovery testing. Needs to be addressed prior to release. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. *** Bug 245647 has been marked as a duplicate of this bug. *** I hit this bug again and here's the debugging info: GFS2: current state 0 new state 0 block offset 5690 [<e0489fb3>] gfs2_setbit+0x8a/0xb1 [gfs2] [<e048a961>] rgblk_free+0x13d/0x161 [gfs2] [<e048b789>] gfs2_free_data+0x20/0x89 [gfs2] [<e0472bec>] do_strip+0x381/0x3f4 [gfs2] [<e048115c>] gfs2_meta_indirect_buffer+0x214/0x268 [gfs2] [<e04717cd>] recursive_scan+0xea/0x16b [gfs2] [<e0471908>] trunc_dealloc+0xba/0xdb [gfs2] [<e047286b>] do_strip+0x0/0x3f4 [gfs2] [<e047acf0>] gfs2_glock_nq+0x164/0x184 [gfs2] [<e0486bd0>] gfs2_delete_inode+0xd3/0x142 [gfs2] [<e0486b3a>] gfs2_delete_inode+0x3d/0x142 [gfs2] [<e0486afd>] gfs2_delete_inode+0x0/0x142 [gfs2] [<c0183859>] generic_delete_inode+0xa5/0x10f [<c01832fd>] iput+0x64/0x66 [<e048b408>] gfs2_inplace_reserve_i+0x525/0x57d [gfs2] [<e047243a>] gfs2_write_alloc_required+0x185/0x1c4 [gfs2] [<e0482396>] gfs2_prepare_write+0x16b/0x274 [gfs2] [<c0154027>] generic_file_buffered_write+0x230/0x607 [<e047414b>] gfs2_dirent_find+0x3a/0x56 [gfs2] [<e04738bd>] gfs2_dirent_scan+0x9b/0x153 [gfs2] [<e0479eef>] gfs2_glock_put+0x1e/0x103 [gfs2] [<e0479eef>] gfs2_glock_put+0x1e/0x103 [gfs2] [<c0128245>] current_fs_time+0x4a/0x55 [<c01548a4>] __generic_file_aio_write_nolock+0x4a6/0x52a [<e047af20>] gfs2_holder_uninit+0xb/0x1b [gfs2] [<c0154b02>] generic_file_write+0x0/0x94 [<c0154a58>] __generic_file_write_nolock+0x86/0x9a [<c01341a5>] autoremove_wake_function+0x0/0x2d [<e047b044>] gfs2_glock_dq+0x71/0x7b [gfs2] [<e0479eef>] gfs2_glock_put+0x1e/0x103 [gfs2] [<e047af20>] gfs2_holder_uninit+0xb/0x1b [gfs2] [<e04838e7>] gfs2_llseek+0x6f/0x94 [gfs2] [<c02fe3b0>] mutex_lock+0xb/0x19 [<c0154b3c>] generic_file_write+0x3a/0x94 [<c0154b02>] generic_file_write+0x0/0x94 [<c016e643>] vfs_write+0xa1/0x143 [<c016ec35>] sys_write+0x3c/0x63 [<c0103f7b>] syscall_call+0x7/0xb ======================= GFS2: fsid=Smoke_Cluster:gfs2.1.2: fatal: filesystem consistency error GFS2: fsid=Smoke_Cluster:gfs2.1.2: RG = 10813675 GFS2: fsid=Smoke_Cluster:gfs2.1.2: function = gfs2_setbit, file = fs/gfs2/rg rp.c, line = 88 GFS2: fsid=Smoke_Cluster:gfs2.1.2: about to withdraw this file system GFS2: fsid=Smoke_Cluster:gfs2.1.2: telling LM to withdraw dlm: closing connection to node 1 dlm: connecting to 1 dlm: gfs2.1: group leave failed -512 0 GFS2: fsid=Smoke_Cluster:gfs2.1.2: withdrawn [<e047db64>] gfs2_lm_withdraw+0x73/0x7f [gfs2] [<e048e1cb>] gfs2_consist_rgrpd_i+0x2a/0x2f [gfs2] [<e048a961>] rgblk_free+0x13d/0x161 [gfs2] [<e048b789>] gfs2_free_data+0x20/0x89 [gfs2] [<e0472bec>] do_strip+0x381/0x3f4 [gfs2] [<e048115c>] gfs2_meta_indirect_buffer+0x214/0x268 [gfs2] [<e04717cd>] recursive_scan+0xea/0x16b [gfs2] [<e0471908>] trunc_dealloc+0xba/0xdb [gfs2] [<e047286b>] do_strip+0x0/0x3f4 [gfs2] [<e047acf0>] gfs2_glock_nq+0x164/0x184 [gfs2] [<e0486bd0>] gfs2_delete_inode+0xd3/0x142 [gfs2] [<e0486b3a>] gfs2_delete_inode+0x3d/0x142 [gfs2] [<e0486afd>] gfs2_delete_inode+0x0/0x142 [gfs2] [<c0183859>] generic_delete_inode+0xa5/0x10f [<c01832fd>] iput+0x64/0x66 [<e048b408>] gfs2_inplace_reserve_i+0x525/0x57d [gfs2] [<e047243a>] gfs2_write_alloc_required+0x185/0x1c4 [gfs2] [<e0482396>] gfs2_prepare_write+0x16b/0x274 [gfs2] [<c0154027>] generic_file_buffered_write+0x230/0x607 [<e047414b>] gfs2_dirent_find+0x3a/0x56 [gfs2] [<e04738bd>] gfs2_dirent_scan+0x9b/0x153 [gfs2] [<e0479eef>] gfs2_glock_put+0x1e/0x103 [gfs2] [<e0479eef>] gfs2_glock_put+0x1e/0x103 [gfs2] [<c0128245>] current_fs_time+0x4a/0x55 [<c01548a4>] __generic_file_aio_write_nolock+0x4a6/0x52a [<e047af20>] gfs2_holder_uninit+0xb/0x1b [gfs2] [<c0154b02>] generic_file_write+0x0/0x94 [<c0154a58>] __generic_file_write_nolock+0x86/0x9a [<c01341a5>] autoremove_wake_function+0x0/0x2d [<e047b044>] gfs2_glock_dq+0x71/0x7b [gfs2] [<e0479eef>] gfs2_glock_put+0x1e/0x103 [gfs2] [<e047af20>] gfs2_holder_uninit+0xb/0x1b [gfs2] [<e04838e7>] gfs2_llseek+0x6f/0x94 [gfs2] [<c02fe3b0>] mutex_lock+0xb/0x19 [<c0154b3c>] generic_file_write+0x3a/0x94 [<c0154b02>] generic_file_write+0x0/0x94 [<c016e643>] vfs_write+0xa1/0x143 [<c016ec35>] sys_write+0x3c/0x63 [<c0103f7b>] syscall_call+0x7/0xb ======================= GFS2: fsid=Smoke_Cluster:gfs2.1.2: gfs2_delete_inode: -5 So the two states here are both showing "free" but for whatever reason, it looks like GFS2 is trying to delete an already deleted inode. My guess is that the node in question already did the deletion some time earlier, but due to the change being journalled and not yet written back there is thus a window when it can appear that an inode is "unlinked, but still allocated" in the inplace rgrp, but in fact has been marked free. So we might potentially be scanning through the wrong copy of the buffer for some reason. I can't see any other situation under which there could be a race like this as the locking has all the other angles covered. Created attachment 159314 [details]
Quick fix to rgblk_search
This won't fix this bug, but should probably be added to the fix for this bug
when its eventually discovered. Zero is a valid block number since it
represents the offset from the start of the rgrps data blocks.
I hit this bug again with the latest patches (including the one above in comment #7) with revolver. GFS2: fsid=Smoke_Cluster:gfs2.1.3: fatal: filesystem consistency error GFS2: fsid=Smoke_Cluster:gfs2.1.3: RG = 15728800 GFS2: fsid=Smoke_Cluster:gfs2.1.3: function = gfs2_setbit, file = fs/gfs2/rg rp.c, line = 85 ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at mm/filemap.c:553 invalid opcode: 0000 [1] SMP last sysfs file: /fs/gfs2/Smoke_Cluster:gfs2.1/lock_module/block CPU 1 Modules linked in: autofs4(U) hidp(U) rfcomm(U) l2cap(U) bluetooth(U) lock_dlm (U) gfs2(U) dlm(U) configfs(U) sunrpc(U) ipv6(U) cpufreq_ondemand(U) video(U) sbs(U) backlight(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) acpi_memhotplu g(U) ac(U) parport_pc(U) lp(U) parport(U) sg(U) i2c_nforce2(U) tg3(U) k8_edac( U) i2c_core(U) edac_mc(U) serio_raw(U) qla2xxx(U) pcspkr(U) k8temp(U) hwmon(U) scsi_transport_fc(U) shpchp(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod( U) sata_nv(U) libata(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) ehci_hcd(U) ohci_ hcd(U) uhci_hcd(U) Pid: 5893, comm: growfiles Not tainted 2.6.18-prep_07_13_2007 #1 RIP: 0010:[<ffffffff80017922>] [<ffffffff80017922>] unlock_page+0xf/0x2f RSP: 0018:ffff81006c11fbd8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff810001a3df40 RCX: ffff81006f4bea60 RDX: ffff810056cab550 RSI: ffff810001a3df40 RDI: ffff810001a3df40 RBP: ffff810001a3df40 R08: ffff81000000cb00 R09: 0000000000054316 R10: 0000000000000000 R11: 0000000000000001 R12: 000000000000e000 R13: 0000000000001000 R14: 000000001c34ef50 R15: 0000000000000000 FS: 00002aaaaaab8210(0000) GS:ffff8100026ed7c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 000000369b0c85f0 CR3: 000000006c0cf000 CR4: 00000000000006e0 Process growfiles (pid: 5893, threadinfo ffff81006c11e000, task ffff81007fd3b8 20) Stack: fffffffffffffffb ffffffff8000f99a 0000000000001000 ffff81006c11ff50 000000000000e000 0000000000000001 ffff81006c11fdd8 0000000000000001 0000100000000000 ffff81007f0bf8c0 ffff810056cab660 ffffffff8844f6c0 Call Trace: [<ffffffff8000f99a>] generic_file_buffered_write+0x27a/0x6d8 [<ffffffff88432758>] :gfs2:gfs2_dirent_scan+0xb3/0x17e [<ffffffff88438e18>] :gfs2:gfs2_glock_put+0x26/0x133 [<ffffffff88432f48>] :gfs2:gfs2_dirent_find+0x0/0x54 [<ffffffff8000ddc3>] current_fs_time+0x3b/0x40 I can hit this pretty consistently: All I have to do is mkfs.gfs2 with a small journal size: -J 8 and revolver will hit it very quickly without even killing a node or doing any recovery. Call Trace: [<ffffffff883db9f2>] :gfs2:rgblk_free+0x139/0x158 [<ffffffff883dba38>] :gfs2:gfs2_free_di+0x27/0xf0 [<ffffffff883cbddb>] :gfs2:gfs2_dinode_dealloc+0x13c/0x17c [<ffffffff883d76c0>] :gfs2:gfs2_delete_inode+0xee/0x157 [<ffffffff883d7618>] :gfs2:gfs2_delete_inode+0x46/0x157 [<ffffffff883d75d2>] :gfs2:gfs2_delete_inode+0x0/0x157 [<ffffffff8002efc1>] generic_delete_inode+0xc6/0x143 [<ffffffff883dc2fc>] :gfs2:gfs2_inplace_reserve_i+0x574/0x5d0 [<ffffffff883d28e7>] :gfs2:gfs2_prepare_write+0x18f/0x29b So it appears to be a race between an inode being deleted and deallocated and the bitmap being discovered in the deleted state by rgrp_try_unlink, which tries to deallocate it again. I'm assuming the bits are twiddled by the time rgrp_try_unlink discovers it and tries to deallocate the inode itself. Created attachment 159621 [details] Proposed patch My theory is that the rgrp_try_unlink function was grabbing hold of an unlinked inode that was still open. But with the file open, the disk inode may still have outstanding buffers and such in memory. I couldn't see any kind of iopen protection around rgrp_try_unlink. This is a proposed patch to rgrp.c that seems to improve things. I'm running revolver on it and it's only up to iteration 1.3 but it makes sense. I'm also running with a bunch of test code for debugging bug #248176 so if revolver runs all night, I'll need to sort through all the rubbish I added to the code to determine what to keep and what to get rid of before I can come up with a clean patch. I'm having problems running the revolver test. Something weird is definitely going on and I'm not sure what. After a few iterations, it seems to get stuck. In this particular case, revolver complained that roth-03 was dead, despite the fact that I could ssh to it and ls my gfs2 file system just fine. However, a qarsh command to roth-03 to ls the gfs2 fs went into a permanent hang. I did a sysrq-t and saw the call trace looks like: qarshd S 0000000000000001 0 2761 2442 2762 2763 (NOTLB) ffff81006c1ef9c8 0000000000000082 0000000000800000 ffffffff8001b97c 000000000000000a ffff81007b8a37e0 ffff8100026c8100 000000905ee68b4e 000000000000ccaa ffff81007b8a39c8 0000000000000001 ffff8100026c8100 Call Trace: [<ffffffff8001b97c>] generic_make_request+0x1e8/0x1ff [<ffffffff80061849>] schedule_timeout+0x8a/0xad [<ffffffff80092af2>] process_timeout+0x0/0x5 [<ffffffff80011301>] do_select+0x3fe/0x462 [<ffffffff8001e1c3>] __pollwait+0x0/0xe2 [<ffffffff800884c6>] default_wake_function+0x0/0xe [<ffffffff80031843>] ip_output+0x1fd/0x241 [<ffffffff80086944>] __wake_up_common+0x3e/0x68 [<ffffffff8002df00>] __wake_up+0x38/0x4f [<ffffffff80012858>] sock_def_readable+0x34/0x5f [<ffffffff8025a7cc>] unix_dgram_sendmsg+0x43d/0x4cf [<ffffffff80052b28>] sock_sendmsg+0xf3/0x110 [<ffffffff800dd1a3>] core_sys_select+0x1bc/0x265 [<ffffffff8000c1a0>] _atomic_dec_and_lock+0x39/0x57 [<ffffffff8002c7c9>] mntput_no_expire+0x19/0x89 [<ffffffff802094d2>] sys_sendto+0x11c/0x14f [<ffffffff800dd360>] sys_pselect7+0x114/0x210 [<ffffffff800b269b>] audit_syscall_entry+0x14d/0x180 [<ffffffff800dd4b0>] sys_pselect6+0x54/0x61 [<ffffffff8005b28d>] tracesys+0xd5/0xe0 So it seems qarshd was waiting for the ls. The ls of gfs2 was waiting on a glock, despite the fact that I have Steve Whitehouse's "initial attempt at fix" patch for bug #248439. Call trace looks like this: ls D ffffffff883c8abc 0 2762 2761 (NOTLB) ffff81006b755c88 0000000000000082 ffff81006e666480 ffffffff8839e11e 0000000000000005 ffff81006d419080 ffffffff802dcae0 00000064b8c910f4 0000000000004c4a ffff81006d419268 0000000000000000 ffff810072763800 Call Trace: [<ffffffff8839e11e>] :dlm:_request_lock+0x12d/0x24c [<ffffffff883c8abc>] :gfs2:just_schedule+0x0/0xd [<ffffffff883c8ac5>] :gfs2:just_schedule+0x9/0xd [<ffffffff800619b4>] __wait_on_bit+0x40/0x6f [<ffffffff883c8abc>] :gfs2:just_schedule+0x0/0xd [<ffffffff80061a4f>] out_of_line_wait_on_bit+0x6c/0x78 [<ffffffff8009b482>] wake_bit_function+0x0/0x23 [<ffffffff883c9a86>] :gfs2:glock_wait_internal+0x11e/0x259 [<ffffffff883c9d65>] :gfs2:gfs2_glock_nq+0x1a4/0x1ca [<ffffffff883d5969>] :gfs2:gfs2_getattr+0x7d/0xc3 [<ffffffff883d5961>] :gfs2:gfs2_getattr+0x75/0xc3 [<ffffffff8000ddf5>] vfs_getattr+0x2d/0xa9 [<ffffffff8003e645>] vfs_lstat_fd+0x2f/0x47 [<ffffffff883ca14d>] :gfs2:gfs2_holder_uninit+0xd/0x1f [<ffffffff8002a63f>] sys_newlstat+0x19/0x31 [<ffffffff8005b229>] tracesys+0x71/0xe0 [<ffffffff8005b28d>] tracesys+0xd5/0xe0 On roth-01, I discovered a "growfiles" process that was eating up 100% of the cpu. A sysrq-t there revealed: growfiles R running task 0 2745 2743 (NOTLB) genesis D ffffffff883c8abc 0 2746 1 2747 2672 (NOTLB) ffff81006b28fb78 0000000000000086 ffff81006ecb8308 ffffffff883ca14d 0000000000000003 ffff810069e1a860 ffffffff802dcae0 000000249029ae55 0000000000003fbc ffff810069e1aa48 0000000000000000 0000000000000001 Call Trace: [<ffffffff883ca14d>] :gfs2:gfs2_holder_uninit+0xd/0x1f [<ffffffff883c8abc>] :gfs2:just_schedule+0x0/0xd [<ffffffff883c8ac5>] :gfs2:just_schedule+0x9/0xd [<ffffffff800619b4>] __wait_on_bit+0x40/0x6f [<ffffffff883c8abc>] :gfs2:just_schedule+0x0/0xd [<ffffffff80061a4f>] out_of_line_wait_on_bit+0x6c/0x78 [<ffffffff8009b482>] wake_bit_function+0x0/0x23 [<ffffffff883c9a86>] :gfs2:glock_wait_internal+0x11e/0x259 [<ffffffff883c9d65>] :gfs2:gfs2_glock_nq+0x1a4/0x1ca [<ffffffff883d70b3>] :gfs2:gfs2_permission+0x7b/0xd4 [<ffffffff883d70ab>] :gfs2:gfs2_permission+0x73/0xd4 [<ffffffff8000d418>] permission+0x81/0xc8 [<ffffffff80009656>] __link_path_walk+0x173/0xf42 [<ffffffff8009b454>] autoremove_wake_function+0x0/0x2e [<ffffffff8000e5b4>] link_path_walk+0x5c/0xe5 [<ffffffff8000c7e3>] do_path_lookup+0x270/0x2e8 [<ffffffff80012064>] getname+0x15b/0x1c1 [<ffffffff800dc91b>] sys_mkdirat+0x3f/0xe4 [<ffffffff8005b229>] tracesys+0x71/0xe0 [<ffffffff8005b28d>] tracesys+0xd5/0xe0 So at this point, I'm thinking the glock hang patch isn't quite right. I'm tired so I haven't done a full analysis, but my initial guess would be that roth-01's growfiles process took out a glock and went into an infinite loop somehow, and roth-03 was waiting for roth-01. The positive news is that I haven't seen our file system corruption of invalid metadata blocks after several attempts at revolver, all of which run for a few minutes and then lock up in a similar manner. The patch in comment #10 cannot be correct. The "alloc" variable is really just used to skip whole bytes in case thay are certain not to match. Its a bit of a hack really as it could have been written to do a proper match here and not to just skip certain states. More importantly this: - if (!IS_ERR(inode)) + if (!inode->i_private && !IS_ERR(inode)) return inode; } cannot be correct either since the IS_ERR test must be done before dereferencing. Also gfs2_inode_lookup() calls iget() which checks for inode->i_private anyway, so that this cannot possibly occur (that was a fix to a previous bug). I was able to get the call stack for growfiles using crash: #0 [ffff81006ae558e8] schedule at ffffffff80060f39 ffff81006ae558f0: 0000000000000082 0000000000000004 ffff81006ae55900: ffff810072150000 0000000000000001 ffff81006ae55910: ffff81006f7c97a0 ffff810068c9e0c0 ffff81006ae55920: 0000241b8e29930d 0000000003cfcca2 ffff81006ae55930: ffff81006f7c9990 ffff810000000000 ffff81006ae55940: ffff81006af83040 0000000000000000 ffff81006ae55950: 0000000000000001 0000000000000000 ffff81006ae55960: ffff81006ae55aa8 ffffffff883cbccf ffff81006ae55970: ffff81000268ac58 ffff81007fe06400 ffff81006ae55980: ffffffffffffffff ffffffff8004d454 ffff81006ae55990: 0000000000000010 0000000000000286 ffff81006ae559a0: ffff81006ed26550 ffffffff883cbccf ffff81006ae559b0: ffff81000268ac58 ffff81007fe06400 ffff81006ae559c0: ffffffff883cbccf ffff81006ae559d8 ffff81006ae559d0: ffffffff80089a18 #1 [ffff81006ae559d0] __cond_resched at ffffffff80089a18 ffff81006ae559d8: ffff81006ae559e8 ffffffff8006105f #2 [ffff81006ae559e0] cond_resched at ffffffff8006105f ffff81006ae559e8: 0000000000000001 ffffffff800580aa #3 [ffff81006ae559f0] ifind at ffffffff800580aa ffff81006ae559f8: ffff81006b9f5580 000000000000f7d0 ffff81006ae55a08: ffff81000268ac58 ffff81007fe06400 ffff81006ae55a18: ffff81006ae55aa8 ffffffff800e06bd #4 [ffff81006ae55a20] iget5_locked at ffffffff800e06bd ffff81006ae55a28: ffffffff883cbceb ffff81006b9f5580 ffff81006ae55a38: 000000000000f7d0 0000000007c6df49 ffff81006ae55a48: ffff810072150000 ffffffffffffffff ffff81006ae55a58: 0000000000000000 ffffffff883cba92 #5 [ffff81006ae55a60] gfs2_inode_lookup at ffffffff883cba92 ffff81006ae55a68: 0000000000000212 ffff81006ae55a88 ffff81006ae55a78: 0000000000000018 ffff81006ae55a98 ffff81006ae55a88: 000000000000f7d1 ffffffff883db002 #6 [ffff81006ae55a90] rgblk_search at ffffffff883db002 ffff81006ae55a98: 020281006ed26550 0000000300000005 ffff81006ae55aa8: 0000000007c6df49 0000000000000005 ffff81006ae55ab8: ffffffff883cbc29 #7 [ffff81006ae55ab8] gfs2_inode_lookup at ffffffff883cbc29 ffff81006ae55ac0: ffff81006b9f5580 000000000000f7d0 ffff81006ae55ad0: ffff81006ae55bc0 ffff810072150000 ffff81006ae55ae0: ffff810068e1f040 ffff81006b9f5880 ffff81006ae55af0: ffffffff883db6dc #8 [ffff81006ae55af0] try_rgrp_unlink at ffffffff883db6dc ffff81006ae55af8: 0000000000000000 ffff810068e1f490 ffff81006ae55b08: ffff81006b9f5580 ffffffff883dc199 #9 [ffff81006ae55b10] gfs2_inplace_reserve_i at ffffffff883dc199 ffff81006ae55b18: ffff81006ae55ba8 000003296ae55ba8 ffff81006ae55b28: ffffffff883e1a98 ffff810072150000 ffff81006ae55b38: ffff810068e1f340 0000000000000000 ffff81006ae55b48: 00000001883c3c47 0000000000000000 ffff81006ae55b58: ffff8100721504e0 ffff8100721504e0 ffff81006ae55b68: 0000000000000000 ffff810068e1f040 ffff81006ae55b78: ffff810068e1f040 ffff81006ed14c78 ffff81006ae55b88: ffffffff883c3c47 ffffffff883c2bb6 #10 [ffff81006ae55b90] gfs2_dirent_search at ffffffff883c2bb6 ffff81006ae55b98: ffff81006ae55bf8 ffff810072150000 ffff81006ae55ba8: ffff81006f7549d0 ffff81006ed24040 ffff81006ae55bb8: 0000000000000000 0000000007c6df49 ffff81006ae55bc8: ffff810068e1f340 ffff81006ed24040 ffff81006ae55bd8: 0000000000000000 ffff810068e1f040 ffff81006ae55be8: ffff810072150000 ffff810068e1f340 ffff81006ae55bf8: ffff81006a0c9190 ffffffff883cc9c5 #11 [ffff81006ae55c00] gfs2_createi at ffffffff883cc9c5 ffff81006ae55c08: ffff810001eb2d48 ffffffff00000000 ffff81006ae55c18: ffff810068e14a60 0000000800000000 ffff81006ae55c28: ffff81006ed14c78 ffff81006ae55d58 ffff81006ae55c38: ffff81006ae55d90 ffffffff883d09e6 #12 [ffff81006ae55c40] gfs2_meta_wait at ffffffff883d09e6 ffff81006ae55c48: 0000000000000000 ffff810072150000 ffff81006ae55c58: 0000000007c4af10 000000000000125d ffff81006ae55c68: 0000000046a02aec 0000000009059200 ffff81006ae55c78: ffff81006ae55ce8 0000000000000001 ffff81006ae55c88: 0000000000000292 0000000000000003 ffff81006ae55c98: ffff81006ae55cd8 ffffffff883c8da8 #13 [ffff81006ae55ca0] gfs2_glock_put at ffffffff883c8da8 ffff81006ae55ca8: ffff81006ae55d68 ffff81006ae55cd8 ffff81006ae55cb8: 0000000046a02aec 0000000009059200 ffff81006ae55cc8: ffff81006fd50608 ffffffff883ca209 #14 [ffff81006ae55cd0] gfs2_glmutex_lock at ffffffff883ca209 ffff81006ae55cd8: ffff81006ae55cd8 ffff81006ae55cd8 ffff81006ae55ce8: 0000000000000000 ffff810075540e50 ffff81006ae55cf8: 0000000000000000 00000000000000e1 ffff81006ae55d08: 0000000000000000 ffff810068e1f040 ffff81006ae55d18: ffff810068e1f040 ffff81006ae55d58 ffff81006ae55d28: ffff81006ed14c78 ffff81006ae55ea8 ffff81006ae55d38: ffff810072150000 ffffffff883d676d #15 [ffff81006ae55d40] gfs2_create at ffffffff883d676d ffff81006ae55d48: 000081fd00000003 ffff81006ed14c48 ffff81006ae55d58: ffff81006fd50660 ffff81006fd50660 ffff81006ae55d68: ffff81006fd50608 0000000100000ab9 ffff81006ae55d78: 0000000000000000 00000000000000c2 ffff81006ae55d88: ffffffff883cbf4b #16 [ffff81006ae55d88] gfs2_createi at ffffffff883cbf4b ffff81006ae55d90: ffff81006ed1eae0 ffff81006ed1eae0 ffff81006ae55da0: ffff81006ed1ea88 0000000100000ab9 ffff81006ae55db0: 0000000000000100 00000000000000c2 ffff81006ae55dc0: ffffffff883ca479 #17 [ffff81006ae55dc0] gfs2_glock_nq_num at ffffffff883ca479 ffff81006ae55dc8: 0000000000000003 ffff810068e1f040 ffff81006ae55dd8: 00000000000081fd 0000000000000000 ffff81006ae55de8: ffff81006ed14c48 ffff81006ae55ea8 ffff81006ae55df8: 0000000000000006 ffffffff80039b23 #18 [ffff81006ae55e00] vfs_create at ffffffff80039b23 ffff81006ae55e08: ffff81006d7d6c48 000000006ed14c48 ffff81006ae55e18: ffff81006ae55ea8 0000000000000005 ffff81006ae55e28: 0000000000008043 ffffffff8001a835 #19 [ffff81006ae55e30] open_namei at ffffffff8001a835 ffff81006ae55e38: 000001fd7f44b880 ffff810072cda000 ffff81006ae55e48: 0000000000000000 0000000000000000 ffff81006ae55e58: ffff81007ceac180 ffff81006ed14c48 ffff81006ae55e68: 0000000000000000 0000000000008042 ffff81006ae55e78: 0000000000008042 00000000ffffff9c ffff81006ae55e88: 0000000000000005 ffff810072cda000 ffff81006ae55e98: 0000000000000000 ffffffff80027074 #20 [ffff81006ae55ea0] do_filp_open at ffffffff80027074 ffff81006ae55ea8: ffff81006d7d6c48 ffff81007ceac180 ffff81006ae55eb8: 0000000bc3d7c7f2 ffff810072cda002 ffff81006ae55ec8: 0000000000000300 ffff810000000000 ffff81006ae55ed8: ffff81006e3d2000 0000000000000000 ffff81006ae55ee8: 00000000fffffff2 0000000000000005 ffff81006ae55ef8: ffff81006d7b9690 0000000000000005 ffff81006ae55f08: ffff81006d7b9680 ffff810072cda000 ffff81006ae55f18: 0000000000000000 000001ff00008043 ffff81006ae55f28: ffff81007f525480 0000000000008042 ffff81006ae55f38: 00000000ffffff9c 00000000000001ff ffff81006ae55f48: 0000000000008042 ffffffff800194b2 etc. So apparently it's looping in rgrp_try_unlink. Back to the drawing board. Steve, you made the comment that my suggested code if (!IS_ERR(inode) && !inode->i_private) return inode; was unnecessary because if inode->i_private was NULL, then iopen was broken. Well, as an experiment, I changed the code to: BUG_ON(inode->i_private); if (!IS_ERR(inode)) return inode; And I hit the BUG. So if what you told me is correct, then iopen is indeed broken. *** This bug has been marked as a duplicate of 248176 *** |