Bug 245009

Summary:

GFS2 - filesystem consistency error

Product:

Red Hat Enterprise Linux 5

Reporter:

Abhijith Das <adas>

Component:

kernel

Assignee:

Steve Whitehouse <swhiteho>

Status:

CLOSED DUPLICATE

QA Contact:

Dean Jansa <djansa>

Severity:

low

Docs Contact:

Priority:

low

Version:

5.0

CC:

cluster-maint, nstraz

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2007-08-14 13:48:36 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Debug patch	none
Quick fix to rgblk_search	none
Proposed patch	none

Description Abhijith Das 2007-06-20 15:01:31 UTC

Description of problem:
I hit this problem when I was running gfs2 recovery tests on smoke cluster using
revolver.
In the first iteration, revolver fenced merit. When merit came back up, it
triggered the following:

 GFS2: fsid=Smoke_Cluster:gfs2.1.3: fatal: filesystem consistency 
error
GFS2: fsid=Smoke_Cluster:gfs2.1.3:   RG = 15728800
GFS2: fsid=Smoke_Cluster:gfs2.1.3:   function = gfs2_setbit, file = fs/gfs2/rg
rp.c, line = 85
GFS2: fsid=Smoke_Cluster:gfs2.1.3: about to withdraw this file system
GFS2: fsid=Smoke_Cluster:gfs2.1.3: telling LM to withdraw
------------[ cut here ]------------
kernel BUG at fs/locks.c:2017!
invalid opcode: 0000 [#1]
SMP 
Modules linked in: lock_dlm gfs2 dlm configfs lpfc
CPU:    0
EIP:    0060:[<c0160f8e>]    Not tainted VLI
EFLAGS: 00010246   (2.6.22-rc5 #1)
EIP is at locks_remove_flock+0xc0/0xd9
eax: 00000081   ebx: d09020cc   ecx: d09020cc   edx: d080e000
esi: d1749c00   edi: d080fe88   ebp: d080fef4   esp: d080fe84
ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
Process doio (pid: 3634, ti=d080e000 task=d0ff7a90 task.ti=d080e000)
Stack: d0902030 00000000 00000000 00000000 00000000 00000000 00000000 00000e32
 
       00000000 00000000 00000000 d1749c00 00000202 00000000 00000000 ffffffff
 
       7fffffff 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 
Call Trace:
 [<c0104260>] show_trace_log_lvl+0x1a/0x2f
 [<c0104310>] show_stack_log_lvl+0x9b/0xa3
 [<c01044d0>] show_registers+0x1b8/0x289
 [<c01046b2>] die+0x111/0x210
 [<c041a378>] do_trap+0x8a/0xa3
 [<c0104b03>] do_invalid_op+0x88/0x92
 [<c041a14a>] error_code+0x72/0x78
 [<c0155bfd>] __fput+0x95/0x145
 [<c0155cc4>] fput+0x17/0x19
 [<c01538d2>] filp_close+0x54/0x5c
 [<c011c9f2>] put_files_struct+0x67/0xa9
 [<c011da67>] do_exit+0x1f9/0x690
 [<c011df6d>] sys_exit_group+0x0/0x11
 [<c011df7c>] sys_exit_group+0xf/0x11
 [<c01032b2>] sysenter_past_esp+0x5f/0x85
 =======================
Code: 00 eb 2e 39 70 28 75 27 0f b6 40 2c a8 02 74 09 89 d8 e8 a5 fe ff ff eb 
18 a8 20 74 0e ba 02 00 00 00 89 d8 e8 0f ff ff ff eb 06 <0f> 0b eb fe 89 c3 8
b 03 85 c0 75 cc e8 73 90 2b 00 83 c4 64 5b 
EIP: [<c0160f8e>] locks_remove_flock+0xc0/0xd9 SS:ESP 0068:d080fe84
Fixing recursive fault but reboot is needed!
[2;5~dlm: closing connection to node 1
dlm: closing connection to node 4
dlm: gfs2.1: group leave failed -512 0
GFS2: fsid=Smoke_Cluster:gfs2.1.3: withdrawn
 [<c0104260>] show_trace_log_lvl+0x1a/0x2f
 [<c0104caf>] show_trace+0x12/0x14
 [<c0104d08>] dump_stack+0x16/0x18
 [<e0535f8a>] gfs2_lm_withdraw+0x81/0x8d [gfs2]
 [<e05478ee>] gfs2_consist_rgrpd_i+0x50/0x57 [gfs2]
 [<e0542ec1>] gfs2_setbit+0x84/0x8c [gfs2]
 [<e05432f9>] rgblk_free+0x14a/0x168 [gfs2]
 [<e054334c>] gfs2_free_di+0x35/0x118 [gfs2]
 [<e0535015>] gfs2_dinode_dealloc+0x14b/0x185 [gfs2]
 [<e053f805>] gfs2_delete_inode+0xe7/0x151 [gfs2]
 [<c0164fa3>] generic_delete_inode+0x5e/0xc5
 [<c016501c>] generic_drop_inode+0x12/0x131
 [<e053f99f>] gfs2_drop_inode+0x33/0x35 [gfs2]
 [<c016481b>] iput+0x63/0x66
 [<e0544471>] gfs2_inplace_reserve_i+0x501/0x553 [gfs2]
 [<e053abc4>] gfs2_prepare_write+0x132/0x23d [gfs2]
 [<c013b77c>] generic_file_buffered_write+0x204/0x532
 [<c013becf>] __generic_file_aio_write_nolock+0x425/0x478
 [<c013bf75>] generic_file_aio_write+0x53/0xb1
 [<c0154cb6>] do_sync_write+0xc5/0x102
 [<c015540f>] vfs_write+0x90/0x119
 [<c01559ae>] sys_write+0x3d/0x61
 [<c01032b2>] sysenter_past_esp+0x5f/0x85
 =======================
dlm: lockspace 20003 from 5 type 1 not found
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000
0e0
 printing eip:
e006c8f7
*pde = 172b3067
*pte = 00000000
Oops: 0000 [#2]
SMP 
Modules linked in: lock_dlm gfs2 dlm configfs lpfc
CPU:    0
EIP:    0060:[<e006c8f7>]    Not tainted VLI
EFLAGS: 00010246   (2.6.22-rc5 #1)
EIP is at dlm_receive_rcom+0x57/0x4c1 [dlm]
eax: dcb2bd80   ebx: 00000030   ecx: c0543cb0   edx: 00000092
esi: 00000030   edi: 00000005   ebp: dcb2bd48   esp: dcb2bd00
ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
Process dlm_recv/0 (pid: 3110, ti=dcb2a000 task=df688580 task.ti=dcb2a000)
Stack: e00722d1 00020003 00000005 00000001 00004000 dcb2bd18 00000000 c044ff00
 
       dcb2bd80 00000000 dcb2be0c c0394f4d 00001000 d4995000 dcb2bd80 00000030
 
       00000030 dcb2bd80 dcb2be20 e006a665 00000030 00001000 00000000 00000000
 
Call Trace:
 [<c0104260>] show_trace_log_lvl+0x1a/0x2f
 [<c0104310>] show_stack_log_lvl+0x9b/0xa3
 [<c01044d0>] show_registers+0x1b8/0x289
 [<c01046b2>] die+0x111/0x210
 [<c041b782>] do_page_fault+0x425/0x4f1
 [<c041a14a>] error_code+0x72/0x78
 [<e006a665>] dlm_process_incoming_buffer+0x137/0x1aa [dlm]
 [<e006c352>] receive_from_sock+0x516/0x678 [dlm]
 [<e006b600>] process_recv_sockets+0x12/0x19 [dlm]
 [<c012789f>] run_workqueue+0x76/0xf9
 [<c0128108>] worker_thread+0xbb/0xc7
 [<c012a66f>] kthread+0x3b/0x64
 [<c0103f03>] kernel_thread_helper+0x7/0x10
 =======================
Code: 8b 41 10 8b 51 04 89 7c 24 08 c7 04 24 d1 22 07 e0 89 44 24 0c 89 54 24 
04 e8 4b ef 0a e0 8b 45 d8 83 78 10 01 0f 85 62 04 00 00 <8b> 0d e0 00 00 00 8
d 45 f0 ba 40 00 00 00 89 04 24 89 f8 e8 49 
EIP: [<e006c8f7>] dlm_receive_rcom+0x57/0x4c1 [dlm] SS:ESP 0068:dcb2bd00

Comment 1 Steve Whitehouse 2007-06-20 15:36:21 UTC

Created attachment 157475 [details]
Debug patch

This should gather the required information. I'm not so sure about pushing this
upstream yet as I think it might be wise to merge some of it into the
gfs2_consist_rgrp() call. I need to check the other callers of that first.

Comment 2 Kiersten (Kerri) Anderson 2007-06-25 15:39:15 UTC

Problem occured while running revolver recovery testing.  Needs to be addressed
prior to release.

Comment 3 RHEL Program Management 2007-06-25 15:43:30 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Steve Whitehouse 2007-06-27 01:41:50 UTC

*** Bug 245647 has been marked as a duplicate of this bug. ***

Comment 5 Abhijith Das 2007-06-28 18:50:44 UTC

I hit this bug again and here's the debugging info:
GFS2: current state 0 new state 0 block offset 5690
 [<e0489fb3>] gfs2_setbit+0x8a/0xb1 [gfs2]
 [<e048a961>] rgblk_free+0x13d/0x161 [gfs2]
 [<e048b789>] gfs2_free_data+0x20/0x89 [gfs2]
 [<e0472bec>] do_strip+0x381/0x3f4 [gfs2]
 [<e048115c>] gfs2_meta_indirect_buffer+0x214/0x268 [gfs2]
 [<e04717cd>] recursive_scan+0xea/0x16b [gfs2]
 [<e0471908>] trunc_dealloc+0xba/0xdb [gfs2]
 [<e047286b>] do_strip+0x0/0x3f4 [gfs2]
 [<e047acf0>] gfs2_glock_nq+0x164/0x184 [gfs2]
 [<e0486bd0>] gfs2_delete_inode+0xd3/0x142 [gfs2]
 [<e0486b3a>] gfs2_delete_inode+0x3d/0x142 [gfs2]
 [<e0486afd>] gfs2_delete_inode+0x0/0x142 [gfs2]
 [<c0183859>] generic_delete_inode+0xa5/0x10f
 [<c01832fd>] iput+0x64/0x66
 [<e048b408>] gfs2_inplace_reserve_i+0x525/0x57d [gfs2]
 [<e047243a>] gfs2_write_alloc_required+0x185/0x1c4 [gfs2]
 [<e0482396>] gfs2_prepare_write+0x16b/0x274 [gfs2]
 [<c0154027>] generic_file_buffered_write+0x230/0x607
 [<e047414b>] gfs2_dirent_find+0x3a/0x56 [gfs2]
 [<e04738bd>] gfs2_dirent_scan+0x9b/0x153 [gfs2]
 [<e0479eef>] gfs2_glock_put+0x1e/0x103 [gfs2]
 [<e0479eef>] gfs2_glock_put+0x1e/0x103 [gfs2]
 [<c0128245>] current_fs_time+0x4a/0x55
 [<c01548a4>] __generic_file_aio_write_nolock+0x4a6/0x52a
 [<e047af20>] gfs2_holder_uninit+0xb/0x1b [gfs2]
 [<c0154b02>] generic_file_write+0x0/0x94
 [<c0154a58>] __generic_file_write_nolock+0x86/0x9a
 [<c01341a5>] autoremove_wake_function+0x0/0x2d
 [<e047b044>] gfs2_glock_dq+0x71/0x7b [gfs2]
 [<e0479eef>] gfs2_glock_put+0x1e/0x103 [gfs2]
 [<e047af20>] gfs2_holder_uninit+0xb/0x1b [gfs2]
 [<e04838e7>] gfs2_llseek+0x6f/0x94 [gfs2]
 [<c02fe3b0>] mutex_lock+0xb/0x19
 [<c0154b3c>] generic_file_write+0x3a/0x94
 [<c0154b02>] generic_file_write+0x0/0x94
 [<c016e643>] vfs_write+0xa1/0x143
 [<c016ec35>] sys_write+0x3c/0x63
 [<c0103f7b>] syscall_call+0x7/0xb
 =======================
GFS2: fsid=Smoke_Cluster:gfs2.1.2: fatal: filesystem consistency error
GFS2: fsid=Smoke_Cluster:gfs2.1.2:   RG = 10813675
GFS2: fsid=Smoke_Cluster:gfs2.1.2:   function = gfs2_setbit, file = fs/gfs2/rg
rp.c, line = 88
GFS2: fsid=Smoke_Cluster:gfs2.1.2: about to withdraw this file system
GFS2: fsid=Smoke_Cluster:gfs2.1.2: telling LM to withdraw
dlm: closing connection to node 1
dlm: connecting to 1
dlm: gfs2.1: group leave failed -512 0
GFS2: fsid=Smoke_Cluster:gfs2.1.2: withdrawn
 [<e047db64>] gfs2_lm_withdraw+0x73/0x7f [gfs2]
 [<e048e1cb>] gfs2_consist_rgrpd_i+0x2a/0x2f [gfs2]
 [<e048a961>] rgblk_free+0x13d/0x161 [gfs2]
 [<e048b789>] gfs2_free_data+0x20/0x89 [gfs2]
 [<e0472bec>] do_strip+0x381/0x3f4 [gfs2]
 [<e048115c>] gfs2_meta_indirect_buffer+0x214/0x268 [gfs2]
 [<e04717cd>] recursive_scan+0xea/0x16b [gfs2]
 [<e0471908>] trunc_dealloc+0xba/0xdb [gfs2]
 [<e047286b>] do_strip+0x0/0x3f4 [gfs2]
 [<e047acf0>] gfs2_glock_nq+0x164/0x184 [gfs2]
 [<e0486bd0>] gfs2_delete_inode+0xd3/0x142 [gfs2]
 [<e0486b3a>] gfs2_delete_inode+0x3d/0x142 [gfs2]
 [<e0486afd>] gfs2_delete_inode+0x0/0x142 [gfs2]
 [<c0183859>] generic_delete_inode+0xa5/0x10f
 [<c01832fd>] iput+0x64/0x66
 [<e048b408>] gfs2_inplace_reserve_i+0x525/0x57d [gfs2]
 [<e047243a>] gfs2_write_alloc_required+0x185/0x1c4 [gfs2]
 [<e0482396>] gfs2_prepare_write+0x16b/0x274 [gfs2]
 [<c0154027>] generic_file_buffered_write+0x230/0x607
 [<e047414b>] gfs2_dirent_find+0x3a/0x56 [gfs2]
 [<e04738bd>] gfs2_dirent_scan+0x9b/0x153 [gfs2]
 [<e0479eef>] gfs2_glock_put+0x1e/0x103 [gfs2]
 [<e0479eef>] gfs2_glock_put+0x1e/0x103 [gfs2]
 [<c0128245>] current_fs_time+0x4a/0x55
 [<c01548a4>] __generic_file_aio_write_nolock+0x4a6/0x52a
 [<e047af20>] gfs2_holder_uninit+0xb/0x1b [gfs2]
 [<c0154b02>] generic_file_write+0x0/0x94
 [<c0154a58>] __generic_file_write_nolock+0x86/0x9a
 [<c01341a5>] autoremove_wake_function+0x0/0x2d
 [<e047b044>] gfs2_glock_dq+0x71/0x7b [gfs2]
 [<e0479eef>] gfs2_glock_put+0x1e/0x103 [gfs2]
 [<e047af20>] gfs2_holder_uninit+0xb/0x1b [gfs2]
 [<e04838e7>] gfs2_llseek+0x6f/0x94 [gfs2]
 [<c02fe3b0>] mutex_lock+0xb/0x19
 [<c0154b3c>] generic_file_write+0x3a/0x94
 [<c0154b02>] generic_file_write+0x0/0x94
 [<c016e643>] vfs_write+0xa1/0x143
 [<c016ec35>] sys_write+0x3c/0x63
 [<c0103f7b>] syscall_call+0x7/0xb
 =======================
GFS2: fsid=Smoke_Cluster:gfs2.1.2: gfs2_delete_inode: -5

Comment 6 Steve Whitehouse 2007-07-04 14:29:06 UTC

So the two states here are both showing "free" but for whatever reason, it looks
like GFS2 is trying to delete an already deleted inode. My guess is that the
node in question already did the deletion some time earlier, but due to the
change being journalled and not yet written back there is thus a window when it
can appear that an inode is "unlinked, but still allocated" in the inplace rgrp,
but in fact has been marked free.

So we might potentially be scanning through the wrong copy of the buffer for
some reason. I can't see any other situation under which there could be a race
like this as the locking has all the other angles covered.

Comment 7 Steve Whitehouse 2007-07-16 10:43:37 UTC

Created attachment 159314 [details]
Quick fix to rgblk_search

This won't fix this bug, but should probably be added to the fix for this bug
when its eventually discovered. Zero is a valid block number since it
represents the offset from the start of the rgrps data blocks.

Comment 8 Abhijith Das 2007-07-18 18:55:28 UTC

I hit this bug again with the latest patches (including the one above in comment
#7) with revolver.

GFS2: fsid=Smoke_Cluster:gfs2.1.3: fatal: filesystem consistency error
GFS2: fsid=Smoke_Cluster:gfs2.1.3:   RG = 15728800
GFS2: fsid=Smoke_Cluster:gfs2.1.3:   function = gfs2_setbit, file = fs/gfs2/rg
rp.c, line = 85
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mm/filemap.c:553
invalid opcode: 0000 [1] SMP 
last sysfs file: /fs/gfs2/Smoke_Cluster:gfs2.1/lock_module/block
CPU 1 
Modules linked in: autofs4(U) hidp(U) rfcomm(U) l2cap(U) bluetooth(U) lock_dlm
(U) gfs2(U) dlm(U) configfs(U) sunrpc(U) ipv6(U) cpufreq_ondemand(U) video(U) 
sbs(U) backlight(U) i2c_ec(U) button(U) battery(U) asus_acpi(U) acpi_memhotplu
g(U) ac(U) parport_pc(U) lp(U) parport(U) sg(U) i2c_nforce2(U) tg3(U) k8_edac(
U) i2c_core(U) edac_mc(U) serio_raw(U) qla2xxx(U) pcspkr(U) k8temp(U) hwmon(U)
 scsi_transport_fc(U) shpchp(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(
U) sata_nv(U) libata(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) ehci_hcd(U) ohci_
hcd(U) uhci_hcd(U)
Pid: 5893, comm: growfiles Not tainted 2.6.18-prep_07_13_2007 #1
RIP: 0010:[<ffffffff80017922>]  [<ffffffff80017922>] unlock_page+0xf/0x2f
RSP: 0018:ffff81006c11fbd8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff810001a3df40 RCX: ffff81006f4bea60
RDX: ffff810056cab550 RSI: ffff810001a3df40 RDI: ffff810001a3df40
RBP: ffff810001a3df40 R08: ffff81000000cb00 R09: 0000000000054316
R10: 0000000000000000 R11: 0000000000000001 R12: 000000000000e000
R13: 0000000000001000 R14: 000000001c34ef50 R15: 0000000000000000
FS:  00002aaaaaab8210(0000) GS:ffff8100026ed7c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000369b0c85f0 CR3: 000000006c0cf000 CR4: 00000000000006e0
Process growfiles (pid: 5893, threadinfo ffff81006c11e000, task ffff81007fd3b8
20)
Stack:  fffffffffffffffb ffffffff8000f99a 0000000000001000 ffff81006c11ff50
 000000000000e000 0000000000000001 ffff81006c11fdd8 0000000000000001
 0000100000000000 ffff81007f0bf8c0 ffff810056cab660 ffffffff8844f6c0
Call Trace:
 [<ffffffff8000f99a>] generic_file_buffered_write+0x27a/0x6d8
 [<ffffffff88432758>] :gfs2:gfs2_dirent_scan+0xb3/0x17e
 [<ffffffff88438e18>] :gfs2:gfs2_glock_put+0x26/0x133
 [<ffffffff88432f48>] :gfs2:gfs2_dirent_find+0x0/0x54
 [<ffffffff8000ddc3>] current_fs_time+0x3b/0x40

Comment 9 Robert Peterson 2007-07-19 16:13:51 UTC

I can hit this pretty consistently: All I have to do is mkfs.gfs2 with
a small journal size: -J 8 and revolver will hit it very quickly without
even killing a node or doing any recovery.

Call Trace:
 [<ffffffff883db9f2>] :gfs2:rgblk_free+0x139/0x158
 [<ffffffff883dba38>] :gfs2:gfs2_free_di+0x27/0xf0
 [<ffffffff883cbddb>] :gfs2:gfs2_dinode_dealloc+0x13c/0x17c
 [<ffffffff883d76c0>] :gfs2:gfs2_delete_inode+0xee/0x157
 [<ffffffff883d7618>] :gfs2:gfs2_delete_inode+0x46/0x157
 [<ffffffff883d75d2>] :gfs2:gfs2_delete_inode+0x0/0x157
 [<ffffffff8002efc1>] generic_delete_inode+0xc6/0x143
 [<ffffffff883dc2fc>] :gfs2:gfs2_inplace_reserve_i+0x574/0x5d0
 [<ffffffff883d28e7>] :gfs2:gfs2_prepare_write+0x18f/0x29b

So it appears to be a race between an inode being deleted and
deallocated and the bitmap being discovered in the deleted state
by rgrp_try_unlink, which tries to deallocate it again.
I'm assuming the bits are twiddled by the time rgrp_try_unlink
discovers it and tries to deallocate the inode itself.

Comment 10 Robert Peterson 2007-07-19 22:29:33 UTC

Created attachment 159621 [details]
Proposed patch

My theory is that the rgrp_try_unlink function was grabbing hold of
an unlinked inode that was still open.	But with the file open, the
disk inode may still have outstanding buffers and such in memory.
I couldn't see any kind of iopen protection around rgrp_try_unlink.
This is a proposed patch to rgrp.c that seems to improve things.
I'm running revolver on it and it's only up to iteration 1.3 but it
makes sense.

I'm also running with a bunch of test code for debugging bug #248176
so if revolver runs all night, I'll need to sort through all the
rubbish I added to the code to determine what to keep and what to get
rid of before I can come up with a clean patch.

Comment 11 Robert Peterson 2007-07-20 04:09:07 UTC

I'm having problems running the revolver test.  Something weird is
definitely going on and I'm not sure what.  After a few iterations, it
seems to get stuck.  In this particular case, revolver complained that
roth-03 was dead, despite the fact that I could ssh to it and ls my
gfs2 file system just fine.  However, a qarsh command to roth-03 to ls
the gfs2 fs went into a permanent hang.  I did a sysrq-t and saw
the call trace looks like:

qarshd        S 0000000000000001     0  2761   2442  2762    2763       (NOTLB)
 ffff81006c1ef9c8 0000000000000082 0000000000800000 ffffffff8001b97c
 000000000000000a ffff81007b8a37e0 ffff8100026c8100 000000905ee68b4e
 000000000000ccaa ffff81007b8a39c8 0000000000000001 ffff8100026c8100
Call Trace:
 [<ffffffff8001b97c>] generic_make_request+0x1e8/0x1ff
 [<ffffffff80061849>] schedule_timeout+0x8a/0xad
 [<ffffffff80092af2>] process_timeout+0x0/0x5
 [<ffffffff80011301>] do_select+0x3fe/0x462
 [<ffffffff8001e1c3>] __pollwait+0x0/0xe2
 [<ffffffff800884c6>] default_wake_function+0x0/0xe
 [<ffffffff80031843>] ip_output+0x1fd/0x241
 [<ffffffff80086944>] __wake_up_common+0x3e/0x68
 [<ffffffff8002df00>] __wake_up+0x38/0x4f
 [<ffffffff80012858>] sock_def_readable+0x34/0x5f
 [<ffffffff8025a7cc>] unix_dgram_sendmsg+0x43d/0x4cf
 [<ffffffff80052b28>] sock_sendmsg+0xf3/0x110
 [<ffffffff800dd1a3>] core_sys_select+0x1bc/0x265
 [<ffffffff8000c1a0>] _atomic_dec_and_lock+0x39/0x57
 [<ffffffff8002c7c9>] mntput_no_expire+0x19/0x89
 [<ffffffff802094d2>] sys_sendto+0x11c/0x14f
 [<ffffffff800dd360>] sys_pselect7+0x114/0x210
 [<ffffffff800b269b>] audit_syscall_entry+0x14d/0x180
 [<ffffffff800dd4b0>] sys_pselect6+0x54/0x61
 [<ffffffff8005b28d>] tracesys+0xd5/0xe0

So it seems qarshd was waiting for the ls.  The ls of gfs2 was waiting
on a glock, despite the fact that I have Steve Whitehouse's "initial
attempt at fix" patch for bug #248439.  Call trace looks like this:

ls            D ffffffff883c8abc     0  2762   2761                     (NOTLB)
 ffff81006b755c88 0000000000000082 ffff81006e666480 ffffffff8839e11e
 0000000000000005 ffff81006d419080 ffffffff802dcae0 00000064b8c910f4
 0000000000004c4a ffff81006d419268 0000000000000000 ffff810072763800
Call Trace:
 [<ffffffff8839e11e>] :dlm:_request_lock+0x12d/0x24c
 [<ffffffff883c8abc>] :gfs2:just_schedule+0x0/0xd
 [<ffffffff883c8ac5>] :gfs2:just_schedule+0x9/0xd
 [<ffffffff800619b4>] __wait_on_bit+0x40/0x6f
 [<ffffffff883c8abc>] :gfs2:just_schedule+0x0/0xd
 [<ffffffff80061a4f>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8009b482>] wake_bit_function+0x0/0x23
 [<ffffffff883c9a86>] :gfs2:glock_wait_internal+0x11e/0x259
 [<ffffffff883c9d65>] :gfs2:gfs2_glock_nq+0x1a4/0x1ca
 [<ffffffff883d5969>] :gfs2:gfs2_getattr+0x7d/0xc3
 [<ffffffff883d5961>] :gfs2:gfs2_getattr+0x75/0xc3
 [<ffffffff8000ddf5>] vfs_getattr+0x2d/0xa9
 [<ffffffff8003e645>] vfs_lstat_fd+0x2f/0x47
 [<ffffffff883ca14d>] :gfs2:gfs2_holder_uninit+0xd/0x1f
 [<ffffffff8002a63f>] sys_newlstat+0x19/0x31
 [<ffffffff8005b229>] tracesys+0x71/0xe0
 [<ffffffff8005b28d>] tracesys+0xd5/0xe0

On roth-01, I discovered a "growfiles" process that was eating
up 100% of the cpu.  A sysrq-t there revealed:

growfiles     R  running task       0  2745   2743                     (NOTLB)
genesis       D ffffffff883c8abc     0  2746      1          2747  2672 (NOTLB)
 ffff81006b28fb78 0000000000000086 ffff81006ecb8308 ffffffff883ca14d
 0000000000000003 ffff810069e1a860 ffffffff802dcae0 000000249029ae55
 0000000000003fbc ffff810069e1aa48 0000000000000000 0000000000000001
Call Trace:
 [<ffffffff883ca14d>] :gfs2:gfs2_holder_uninit+0xd/0x1f
 [<ffffffff883c8abc>] :gfs2:just_schedule+0x0/0xd
 [<ffffffff883c8ac5>] :gfs2:just_schedule+0x9/0xd
 [<ffffffff800619b4>] __wait_on_bit+0x40/0x6f
 [<ffffffff883c8abc>] :gfs2:just_schedule+0x0/0xd
 [<ffffffff80061a4f>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8009b482>] wake_bit_function+0x0/0x23
 [<ffffffff883c9a86>] :gfs2:glock_wait_internal+0x11e/0x259
 [<ffffffff883c9d65>] :gfs2:gfs2_glock_nq+0x1a4/0x1ca
 [<ffffffff883d70b3>] :gfs2:gfs2_permission+0x7b/0xd4
 [<ffffffff883d70ab>] :gfs2:gfs2_permission+0x73/0xd4
 [<ffffffff8000d418>] permission+0x81/0xc8
 [<ffffffff80009656>] __link_path_walk+0x173/0xf42
 [<ffffffff8009b454>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8000e5b4>] link_path_walk+0x5c/0xe5
 [<ffffffff8000c7e3>] do_path_lookup+0x270/0x2e8
 [<ffffffff80012064>] getname+0x15b/0x1c1
 [<ffffffff800dc91b>] sys_mkdirat+0x3f/0xe4
 [<ffffffff8005b229>] tracesys+0x71/0xe0
 [<ffffffff8005b28d>] tracesys+0xd5/0xe0

So at this point, I'm thinking the glock hang patch isn't quite right.
I'm tired so I haven't done a full analysis, but my initial guess would
be that roth-01's growfiles process took out a glock and went into an
infinite loop somehow, and roth-03 was waiting for roth-01.

The positive news is that I haven't seen our file system corruption
of invalid metadata blocks after several attempts at revolver, all of
which run for a few minutes and then lock up in a similar manner.

Comment 12 Steve Whitehouse 2007-07-20 10:08:57 UTC

The patch in comment #10 cannot be correct. The "alloc" variable is really just
used to skip whole bytes in case thay are certain not to match. Its a bit of a
hack really as it could have been written to do a proper match here and not to
just skip certain states.

More importantly this:

-		if (!IS_ERR(inode))
+		if (!inode->i_private && !IS_ERR(inode))
 			return inode;
 	}

cannot be correct either since the IS_ERR test must be done before
dereferencing. Also gfs2_inode_lookup() calls iget() which checks for
inode->i_private anyway, so that this cannot possibly occur (that was a fix to a
previous bug).

Comment 13 Robert Peterson 2007-07-20 14:28:40 UTC

I was able to get the call stack for growfiles using crash:

 #0 [ffff81006ae558e8] schedule at ffffffff80060f39
    ffff81006ae558f0: 0000000000000082 0000000000000004 
    ffff81006ae55900: ffff810072150000 0000000000000001 
    ffff81006ae55910: ffff81006f7c97a0 ffff810068c9e0c0 
    ffff81006ae55920: 0000241b8e29930d 0000000003cfcca2 
    ffff81006ae55930: ffff81006f7c9990 ffff810000000000 
    ffff81006ae55940: ffff81006af83040 0000000000000000 
    ffff81006ae55950: 0000000000000001 0000000000000000 
    ffff81006ae55960: ffff81006ae55aa8 ffffffff883cbccf 
    ffff81006ae55970: ffff81000268ac58 ffff81007fe06400 
    ffff81006ae55980: ffffffffffffffff ffffffff8004d454 
    ffff81006ae55990: 0000000000000010 0000000000000286 
    ffff81006ae559a0: ffff81006ed26550 ffffffff883cbccf 
    ffff81006ae559b0: ffff81000268ac58 ffff81007fe06400 
    ffff81006ae559c0: ffffffff883cbccf ffff81006ae559d8 
    ffff81006ae559d0: ffffffff80089a18 
 #1 [ffff81006ae559d0] __cond_resched at ffffffff80089a18
    ffff81006ae559d8: ffff81006ae559e8 ffffffff8006105f 
 #2 [ffff81006ae559e0] cond_resched at ffffffff8006105f
    ffff81006ae559e8: 0000000000000001 ffffffff800580aa 
 #3 [ffff81006ae559f0] ifind at ffffffff800580aa
    ffff81006ae559f8: ffff81006b9f5580 000000000000f7d0 
    ffff81006ae55a08: ffff81000268ac58 ffff81007fe06400 
    ffff81006ae55a18: ffff81006ae55aa8 ffffffff800e06bd 
 #4 [ffff81006ae55a20] iget5_locked at ffffffff800e06bd
    ffff81006ae55a28: ffffffff883cbceb ffff81006b9f5580 
    ffff81006ae55a38: 000000000000f7d0 0000000007c6df49 
    ffff81006ae55a48: ffff810072150000 ffffffffffffffff 
    ffff81006ae55a58: 0000000000000000 ffffffff883cba92 
 #5 [ffff81006ae55a60] gfs2_inode_lookup at ffffffff883cba92
    ffff81006ae55a68: 0000000000000212 ffff81006ae55a88 
    ffff81006ae55a78: 0000000000000018 ffff81006ae55a98 
    ffff81006ae55a88: 000000000000f7d1 ffffffff883db002 
 #6 [ffff81006ae55a90] rgblk_search at ffffffff883db002
    ffff81006ae55a98: 020281006ed26550 0000000300000005 
    ffff81006ae55aa8: 0000000007c6df49 0000000000000005 
    ffff81006ae55ab8: ffffffff883cbc29 
 #7 [ffff81006ae55ab8] gfs2_inode_lookup at ffffffff883cbc29
    ffff81006ae55ac0: ffff81006b9f5580 000000000000f7d0 
    ffff81006ae55ad0: ffff81006ae55bc0 ffff810072150000 
    ffff81006ae55ae0: ffff810068e1f040 ffff81006b9f5880 
    ffff81006ae55af0: ffffffff883db6dc 
 #8 [ffff81006ae55af0] try_rgrp_unlink at ffffffff883db6dc
    ffff81006ae55af8: 0000000000000000 ffff810068e1f490 
    ffff81006ae55b08: ffff81006b9f5580 ffffffff883dc199 
 #9 [ffff81006ae55b10] gfs2_inplace_reserve_i at ffffffff883dc199
    ffff81006ae55b18: ffff81006ae55ba8 000003296ae55ba8 
    ffff81006ae55b28: ffffffff883e1a98 ffff810072150000 
    ffff81006ae55b38: ffff810068e1f340 0000000000000000 
    ffff81006ae55b48: 00000001883c3c47 0000000000000000 
    ffff81006ae55b58: ffff8100721504e0 ffff8100721504e0 
    ffff81006ae55b68: 0000000000000000 ffff810068e1f040 
    ffff81006ae55b78: ffff810068e1f040 ffff81006ed14c78 
    ffff81006ae55b88: ffffffff883c3c47 ffffffff883c2bb6 
#10 [ffff81006ae55b90] gfs2_dirent_search at ffffffff883c2bb6
    ffff81006ae55b98: ffff81006ae55bf8 ffff810072150000 
    ffff81006ae55ba8: ffff81006f7549d0 ffff81006ed24040 
    ffff81006ae55bb8: 0000000000000000 0000000007c6df49 
    ffff81006ae55bc8: ffff810068e1f340 ffff81006ed24040 
    ffff81006ae55bd8: 0000000000000000 ffff810068e1f040 
    ffff81006ae55be8: ffff810072150000 ffff810068e1f340 
    ffff81006ae55bf8: ffff81006a0c9190 ffffffff883cc9c5 
#11 [ffff81006ae55c00] gfs2_createi at ffffffff883cc9c5
    ffff81006ae55c08: ffff810001eb2d48 ffffffff00000000 
    ffff81006ae55c18: ffff810068e14a60 0000000800000000 
    ffff81006ae55c28: ffff81006ed14c78 ffff81006ae55d58 
    ffff81006ae55c38: ffff81006ae55d90 ffffffff883d09e6 
#12 [ffff81006ae55c40] gfs2_meta_wait at ffffffff883d09e6
    ffff81006ae55c48: 0000000000000000 ffff810072150000 
    ffff81006ae55c58: 0000000007c4af10 000000000000125d 
    ffff81006ae55c68: 0000000046a02aec 0000000009059200 
    ffff81006ae55c78: ffff81006ae55ce8 0000000000000001 
    ffff81006ae55c88: 0000000000000292 0000000000000003 
    ffff81006ae55c98: ffff81006ae55cd8 ffffffff883c8da8 
#13 [ffff81006ae55ca0] gfs2_glock_put at ffffffff883c8da8
    ffff81006ae55ca8: ffff81006ae55d68 ffff81006ae55cd8 
    ffff81006ae55cb8: 0000000046a02aec 0000000009059200 
    ffff81006ae55cc8: ffff81006fd50608 ffffffff883ca209 
#14 [ffff81006ae55cd0] gfs2_glmutex_lock at ffffffff883ca209
    ffff81006ae55cd8: ffff81006ae55cd8 ffff81006ae55cd8 
    ffff81006ae55ce8: 0000000000000000 ffff810075540e50 
    ffff81006ae55cf8: 0000000000000000 00000000000000e1 
    ffff81006ae55d08: 0000000000000000 ffff810068e1f040 
    ffff81006ae55d18: ffff810068e1f040 ffff81006ae55d58 
    ffff81006ae55d28: ffff81006ed14c78 ffff81006ae55ea8 
    ffff81006ae55d38: ffff810072150000 ffffffff883d676d 
#15 [ffff81006ae55d40] gfs2_create at ffffffff883d676d
    ffff81006ae55d48: 000081fd00000003 ffff81006ed14c48 
    ffff81006ae55d58: ffff81006fd50660 ffff81006fd50660 
    ffff81006ae55d68: ffff81006fd50608 0000000100000ab9 
    ffff81006ae55d78: 0000000000000000 00000000000000c2 
    ffff81006ae55d88: ffffffff883cbf4b 
#16 [ffff81006ae55d88] gfs2_createi at ffffffff883cbf4b
    ffff81006ae55d90: ffff81006ed1eae0 ffff81006ed1eae0 
    ffff81006ae55da0: ffff81006ed1ea88 0000000100000ab9 
    ffff81006ae55db0: 0000000000000100 00000000000000c2 
    ffff81006ae55dc0: ffffffff883ca479 
#17 [ffff81006ae55dc0] gfs2_glock_nq_num at ffffffff883ca479
    ffff81006ae55dc8: 0000000000000003 ffff810068e1f040 
    ffff81006ae55dd8: 00000000000081fd 0000000000000000 
    ffff81006ae55de8: ffff81006ed14c48 ffff81006ae55ea8 
    ffff81006ae55df8: 0000000000000006 ffffffff80039b23 
#18 [ffff81006ae55e00] vfs_create at ffffffff80039b23
    ffff81006ae55e08: ffff81006d7d6c48 000000006ed14c48 
    ffff81006ae55e18: ffff81006ae55ea8 0000000000000005 
    ffff81006ae55e28: 0000000000008043 ffffffff8001a835 
#19 [ffff81006ae55e30] open_namei at ffffffff8001a835
    ffff81006ae55e38: 000001fd7f44b880 ffff810072cda000 
    ffff81006ae55e48: 0000000000000000 0000000000000000 
    ffff81006ae55e58: ffff81007ceac180 ffff81006ed14c48 
    ffff81006ae55e68: 0000000000000000 0000000000008042 
    ffff81006ae55e78: 0000000000008042 00000000ffffff9c 
    ffff81006ae55e88: 0000000000000005 ffff810072cda000 
    ffff81006ae55e98: 0000000000000000 ffffffff80027074 
#20 [ffff81006ae55ea0] do_filp_open at ffffffff80027074
    ffff81006ae55ea8: ffff81006d7d6c48 ffff81007ceac180 
    ffff81006ae55eb8: 0000000bc3d7c7f2 ffff810072cda002 
    ffff81006ae55ec8: 0000000000000300 ffff810000000000 
    ffff81006ae55ed8: ffff81006e3d2000 0000000000000000 
    ffff81006ae55ee8: 00000000fffffff2 0000000000000005 
    ffff81006ae55ef8: ffff81006d7b9690 0000000000000005 
    ffff81006ae55f08: ffff81006d7b9680 ffff810072cda000 
    ffff81006ae55f18: 0000000000000000 000001ff00008043 
    ffff81006ae55f28: ffff81007f525480 0000000000008042 
    ffff81006ae55f38: 00000000ffffff9c 00000000000001ff 
    ffff81006ae55f48: 0000000000008042 ffffffff800194b2 
etc.
So apparently it's looping in rgrp_try_unlink.  Back to the drawing
board.

Comment 14 Robert Peterson 2007-07-20 19:12:29 UTC

Steve, you made the comment that my suggested code

		if (!IS_ERR(inode) && !inode->i_private)
			return inode;

was unnecessary because if inode->i_private was NULL, then iopen was
broken.  Well, as an experiment, I changed the code to:
                BUG_ON(inode->i_private);
                if (!IS_ERR(inode))
                        return inode;
And I hit the BUG.  So if what you told me is correct, then iopen is
indeed broken.

Comment 15 Steve Whitehouse 2007-08-14 13:48:36 UTC


*** This bug has been marked as a duplicate of 248176 ***