Bug 231910
| Summary: | GFS2 withdraws if you are trying to remove a file while doing a du | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Josef Bacik <jbacik> | ||||||||||
| Component: | kernel | Assignee: | Don Zickus <dzickus> | ||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||||||
| Severity: | medium | Docs Contact: | |||||||||||
| Priority: | high | ||||||||||||
| Version: | 5.1 | CC: | adas, amirkin, bmarzins, kanderso, lwang, rkenna, swhiteho | ||||||||||
| Target Milestone: | --- | ||||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | All | ||||||||||||
| OS: | Linux | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | RHBA-2007-0959 | Doc Type: | Bug Fix | ||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2007-11-07 19:43:46 UTC | Type: | --- | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Attachments: |
|
||||||||||||
|
Description
Josef Bacik
2007-03-12 22:29:16 UTC
This looks like a problem related to a glock being demoted incorrectly. If the glock is in a transaction, the transaction must be committed before the glock is demoted. The glock's various operations in glops.c should take account of that and it might well be a wrong assumption of the GLF_DEMOTE patch. I'll take a look at that today. I've looked through the patch and I can't spot anything that might have the effect you are seeing. I've also traced through the code relating to xattrs and not found a lot thats enlightening there either. The check thats triggering on line 61 of lops.c is just checking to see that, at the end of the transaction, the glock in question is still locked in the exclusive state. The call to getxattr is, as a "read-only" call, a bit of a red-herring I think as it just happens to be the next place the lock is used and the EIO returned is due to the I/O failing as a result of the earlier triggered assertion. So there are two questions which spring to mind. Firstly, are you using selinux on this filesystem? and secondly can you get the assertion to print out which type of glock has actually caused this assertion to trigger? as that will be a big clue in solving this. Our team has encountered the same assertion during running bonnie++ test on GFS2 partition. It is 100% reproducable. I'll try to add some debug and find out what type of glock caused this assertion. rh5cluster1.gsslab.rdu.redhat.com login: gl type is 2 GFS2: fsid=rhel5cluster:gfs2lfs2.0: fatal: assertion "gfs2_glock_is_held_excl(gl)" failed GFS2: fsid=rhel5cluster:gfs2lfs2.0: function = glock_lo_after_commit, file = fs/gfs2/lops.c, line = 64 GFS2: fsid=rhel5cluster:gfs2lfs2.0: about to withdraw this file system GFS2: fsid=rhel5cluster:gfs2lfs2.0: telling LM to withdraw so the gl type is LM_TYPE_INODE. glock type 2 state 0 GFS2: fsid=alpha:mygfs.0: fatal: assertion "gfs2_glock_is_held_excl(gl)" failed GFS2: fsid=alpha:mygfs.0: function = glock_lo_after_commit, file = fs/gfs2/lops.c, line = 63 GFS2: fsid=alpha:mygfs.0: about to withdraw this file system GFS2: fsid=alpha:mygfs.0: telling LM to withdraw In our case glock type is LM_TYPE_INODE too, and its state is LM_ST_UNLOCKED. Created attachment 150247 [details]
A test patch to try and narrow down the cause of this bug
Here is a test patch whose aim is to gather information to help solve this
particular bug. In order for it to be useful, I'll need the full output once
the bug has triggered, and also, since one of the items is a pid, I'd like to
know which process it relates to. Since the main culprits here are the gfs2/dlm
daemons, a "ps" after mounting gfs2, but before running the test which triggers
this should be almost as good as data gathered afterwards (in case thats not
possible).
I have been through all the glock and lops code as it applied to the inode
glocks looking for the cause of this bug and its rather elusive it seems. We
know (from other asserts, not triggered) that the glock was certainly in the
exclusive mode when it was added to the transaction. The aim of the patch is to
catch the act of lock demotion of the lock thats still in the transaction and
thus find out which process is responsible.
The patch should apply cleanly to the current GFS2 -nmw git tree which has
recently been rebased against 2.6.21-rc4.
I have had reports that in some cases, there are glocks which have a zero ref
count (which should be impossible) triggering an assert elsewhere in the code
and I have a strong suspicion that the two are linked, but this seems the
easier symptom to use to track down the problem.
Do you plan to fix this bug in RHEL5 (2.6.18-8.el5) kernel? It will be fixed for release 5.1.0 but it needs to (and can) make it into the upstream kernel first, or at least as far as my -nmw git tree anyway. So the answer is yes, it will be fixed, but I'm not sure whether it will be in -8 or a later one. Steve,
I hit this assert with your debug patch in place. This is with the latest code
from the nmw tree.
Below is the output from the debug patch, followed by 'ps' under kdb and finally
the backtrace('btp') of the active process, lock_dlm1. Let me know if you need
more info.
Glock 0xffff81007b7a28b8 (2, 5248259)
gl_flags =
gl_ref = 4
gl_state = 0
gl_owner = xdoio
gl_ip = 18446744071696625455
req_gh = no
req_bh = no
lvb_count = 0
object = yes
le = yes
reclaim = no
aspace = 0xffff810069c02040 nrpages = 0
ail = 0
Inode:
num = 65/5248259
type = 8
i_flags = 0
------------[ cut here ]------------
kernel BUG at fs/gfs2/glock.c:123!
invalid opcode: 0000 [1] SMP
Entering kdb (current=0xffff81007d362280, pid 6738) on processor 1 Oops: <NULL
>
due to oops @ 0xffffffff880515b4
r15 = 0x0000000000000001 r14 = 0xffffffff88067880
r13 = 0x0000000000000000 r12 = 0xffffffff88069399
rbp = 0x0000000000000000 rbx = 0xffff81007b7a28b8
r11 = 0x0000000000000000 r10 = 0x000000000000001f
r9 = 0x0000000000000020 r8 = 0x000000000000000d
rax = 0x0000000000000000 rcx = 0x000000000000635c
rdx = 0x00000000ffffffff rsi = 0xffffffff807c3780
rdi = 0xffffffff806b9e7c orig_rax = 0xffffffffffffffff
rip = 0xffffffff880515b4 cs = 0x0000000000000010
eflags = 0x0000000000010246 rsp = 0xffff810071309e00
ss = 0x0000000000000000 ®s = 0xffff810071309d68
[1]kdb> ps
49 sleeping system daemon (state M) processes suppressed
Task Addr Pid Parent [*] cpu State Thread Command
0xffff8100717b6760 7303 7302 1 0 R 0xffff8100717b6a20 xdoio
0xffff81007d362280 6738 11 1 1 R 0xffff81007d362540 *lock_dlm1
0xffff810037d814e0 1 0 0 1 S 0xffff810037d817a0 init
0xffff81007f2089a0 950 11 0 1 R 0xffff81007f208c60 kjournald
0xffff81007f3594e0 999 1 0 1 S 0xffff81007f3597a0 udevd
0xffff81007f2130c0 5874 1 0 0 S 0xffff81007f213380 dhclient
0xffff81007f362080 5975 1 0 0 D 0xffff81007f362340 syslogd
0xffff81007f209080 5978 1 0 0 S 0xffff81007f209340 klogd
0xffff81007c7b03c0 5989 1 0 0 S 0xffff81007c7b0680 irqbalance
0xffff81007c001560 6009 1 0 0 S 0xffff81007c001820 portmap
0xffff81007f39ef00 6029 1 0 1 S 0xffff81007f39f1c0 rpc.statd
0xffff81007d26b760 6103 1 0 0 S 0xffff81007d26ba20 ccsd
0xffff810002ffcf00 6104 1 0 0 S 0xffff810002ffd1c0 ccsd
0xffff81007f362760 6109 1 0 0 S 0xffff81007f362a20 aisexec
0xffff81007f358040 6110 1 0 1 S 0xffff81007f358300 aisexec
0xffff81007f394ec0 6111 1 0 1 S 0xffff81007f395180 aisexec
0xffff81007cf10860 6118 1 0 0 S 0xffff81007cf10b20 aisexec
0xffff81007f1c2f80 6119 1 0 0 S 0xffff81007f1c3240 aisexec
0xffff81007c2655a0 6140 1 0 1 S 0xffff81007c265860 aisexec
[1]more>
Only 'q' or 'Q' are processed at more prompt, input ignored
0xffff81007f1c21c0 6141 1 0 0 S 0xffff81007f1c2480 aisexec
0xffff81007c024400 6142 1 0 0 S 0xffff81007c0246c0 aisexec
0xffff81007f244b20 6143 1 0 0 S 0xffff81007f244de0 aisexec
0xffff81007f18e820 6144 1 0 0 S 0xffff81007f18eae0 aisexec
0xffff81007f279520 6145 1 0 0 S 0xffff81007f2797e0 aisexec
0xffff81007f36d560 6178 1 0 0 S 0xffff81007f36d820 aisexec
0xffff81007f2137a0 6179 1 0 0 S 0xffff81007f213a60 aisexec
0xffff81007f244440 6735 1 0 0 S 0xffff81007f244700 aisexec
0xffff81007f245200 6736 1 0 0 S 0xffff81007f2454c0 aisexec
0xffff81007cf11620 6742 1 0 0 S 0xffff81007cf118e0 aisexec
0xffff81007d26a9a0 6744 1 0 0 S 0xffff81007d26ac60 aisexec
0xffff81007c2ec140 6117 1 0 0 S 0xffff81007c2ec400 groupd
0xffff81007cf10f40 6125 1 0 0 S 0xffff81007cf11200 fenced
0xffff81007f24f4e0 6131 1 0 1 S 0xffff81007f24f7a0 dlm_controld
0xffff81007d363040 6137 1 0 1 R 0xffff81007d363300 gfs_controld
0xffff81007f36ce80 6157 1 0 0 S 0xffff81007f36d140 dbus-daemon
0xffff81007f36c7a0 6168 1 0 0 S 0xffff81007f36ca60 clvmd
0xffff810037c62240 6185 1 0 0 S 0xffff810037c62500 clvmd
0xffff81007f358720 6186 1 0 1 S 0xffff81007f3589e0 clvmd
0xffff81007f362e40 6278 1 0 1 S 0xffff81007f363100 pcscd
0xffff81007f1ff720 6310 1 0 0 S 0xffff81007f1ff9e0 pcscd
0xffff81007f1c3660 6301 1 0 0 S 0xffff81007f1c3920 automount
[1]more>
Only 'q' or 'Q' are processed at more prompt, input ignored
0xffff81007f36c0c0 6302 1 0 0 S 0xffff81007f36c380 automount
0xffff81007f0b0860 6303 1 0 1 S 0xffff81007f0b0b20 automount
0xffff81007f18f5e0 6306 1 0 0 S 0xffff81007f18f8a0 automount
0xffff81007f1c28a0 6309 1 0 0 S 0xffff81007f1c2b60 automount
0xffff81007f1cc8e0 6320 1 0 1 S 0xffff81007f1ccba0 acpid
0xffff81007f24e720 6330 1 0 1 S 0xffff81007f24e9e0 cupsd
0xffff81007f278e40 6343 1 0 0 S 0xffff81007f279100 sshd
0xffff81007f39f5e0 6354 1 0 0 S 0xffff81007f39f8a0 xinetd
0xffff81007c000e80 6367 1 0 1 S 0xffff81007c001140 ntpd
0xffff810037c63000 6386 1 0 0 S 0xffff810037c632c0 sendmail
0xffff81007f2458e0 6393 1 0 1 S 0xffff81007f245ba0 sendmail
0xffff81007c264ec0 6409 1 0 1 S 0xffff81007c265180 gpm
0xffff81007f1cc200 6421 1 0 1 S 0xffff81007f1cc4c0 crond
0xffff81007f358e00 6447 1 0 1 S 0xffff81007f3590c0 xfs
0xffff81007f18ef00 6466 1 0 1 S 0xffff81007f18f1c0 atd
0xffff81007c264100 6494 1 0 1 S 0xffff81007c2643c0 yum-updatesd
0xffff81007f3955a0 6505 1 0 0 S 0xffff81007f395860 avahi-daemon
0xffff81007f3947e0 6506 6505 0 0 S 0xffff81007f394aa0 avahi-daemon
0xffff81007d2097a0 6516 1 0 0 S 0xffff81007d209a60 hald
0xffff81007c0258a0 6517 6516 0 1 S 0xffff81007c025b60 hald-runner
0xffff81007f209760 6523 6517 0 1 S 0xffff81007f209a20
hald-addon-acpi
0xffff81007c2ec820 6524 6517 0 0 S 0xffff81007c2ecae0
hald-addon-cpuf
[1]more>
Only 'q' or 'Q' are processed at more prompt, input ignored
0xffff81007c2ecf00 6530 6517 0 0 S 0xffff81007c2ed1c0
hald-addon-keyb
0xffff81007c0000c0 6532 6517 0 1 S 0xffff81007c000380
hald-addon-keyb
0xffff81007cf10180 6569 1 0 0 S 0xffff81007cf10440 smartd
0xffff81007d208300 6572 1 0 1 S 0xffff81007d2085c0 agetty
0xffff81007c0251c0 6573 1 0 1 S 0xffff81007c025480 mingetty
0xffff81007c2647e0 6576 1 0 1 S 0xffff81007c264aa0 mingetty
0xffff81007c7b6e40 6581 1 0 1 S 0xffff81007c7b7100 mingetty
0xffff81007f1855a0 6582 1 0 0 S 0xffff81007f185860 mingetty
0xffff81007f1847e0 6583 1 0 0 S 0xffff81007f184aa0 mingetty
0xffff81007f184100 6584 1 0 0 S 0xffff81007f1843c0 mingetty
0xffff81007cc90a60 6659 6343 0 0 S 0xffff81007cc90d20 sshd
0xffff81007c0cd8e0 6661 6659 0 0 S 0xffff81007c0cdba0 bash
0xffff81007d362280 6738 11 1 1 R 0xffff81007d362540 *lock_dlm1
0xffff81007f212300 7030 6354 0 0 S 0xffff81007f2125c0 btimed
0xffff81007f278080 7299 6354 0 1 S 0xffff81007f278340 qarshd
0xffff81007f34f8e0 7300 7299 0 1 R 0xffff81007f34fba0 xdoio
0xffff8100711ea720 7301 7300 0 0 R 0xffff8100711ea9e0 xdoio
0xffff810037c62920 7302 6354 0 0 S 0xffff810037c62be0 qarshd
0xffff8100717b6760 7303 7302 1 0 R 0xffff8100717b6a20 xdoio
0xffff8100717b6e40 7304 7303 0 0 R 0xffff8100717b7100 xdoio
0xffff81007f2082c0 7305 6354 0 0 S 0xffff81007f208580 qarshd
0xffff81007f24e040 7306 7305 0 1 R 0xffff81007f24e300 xdoio
[1]more>
nly 'q' or 'Q' are processed at more prompt, input ignored
0xffff81007f24ee00 7307 6354 0 1 S 0xffff81007f24f0c0 qarshd
0xffff81007f39e820 7308 7306 0 1 R 0xffff81007f39eae0 xdoio
0xffff8100717b6080 7309 7307 0 1 R 0xffff8100717b6340 xdoio
0xffff81007c7b6760 7310 7309 0 0 R 0xffff81007c7b6a20 xdoio
[1]kdb> btp 6738
Stack traceback for pid 6738
0xffff81007d362280 6738 11 1 1 R 0xffff81007d362540 *lock_dlm1
rsp rip Function (args)
0xffff810071309de8 0xffffffff880515b4 [gfs2]glock_in_wrong_state+0x42
0xffff810071309e28 0xffffffff88052d0b [gfs2]drop_bh+0x14b
0xffff810071309e58 0xffffffff880526e2 [gfs2]gfs2_glock_cb+0xcb
0xffff810071309e78 0xffffffff880c0896 [lock_dlm]gdlm_thread+0x512
0xffff810071309ea0 0xffffffff8021e1a8 default_wake_function
0xffff810071309ef0 0xffffffff880c0384 [lock_dlm]gdlm_thread
0xffff810071309f08 0xffffffff8023410e keventd_create_kthread
0xffff810071309f18 0xffffffff80234365 kthread+0xd1
0xffff810071309f48 0xffffffff8020a298 child_rip+0xa
0xffff810071309f60 0xffffffff8023410e keventd_create_kthread
0xffff810071309fc8 0xffffffff80234294 kthread
0xffff810071309fd8 0xffffffff8020a28e child_rip
Did the: print_symbol(KERN_INFO "tried to demote glock in transaction from %s\n", gl->gl_demoter); line not print out? Also it would be good if you could turn the gl_ip into a symbol name as its not much use otherwise. A btp of the xdoio process would have been useful too as its probably been caused by a combination of the two threads. It seems that my hunch that it should be possible to trap the bug earlier was correct. The odd thing here is that it appears that the glock has no pages and is unlocked as you'd expect at this stage in a demotion. It does however still appear to be part of a transaction which is unexpected, since if its run through the demotion code (as suggested by the lack of pages) then it should have flushed the log at the same time. We do at least now know that its caused by a demote request from a remote node and that its an exclusive -> unlocked transistion. I'll keep looking to see if I can narrow down the cause further. Hi Steve,
Here's the additional info you requested when I hit the bug again.
It has the print_symbol() stuff in it. backtraces for the lock_dlm/1 process and
the xdoio processes (4 of them). I also did an id on the gl_ip and looks like it
points to gfs2_glock_dq. Let me know if you need more info.
Thanks,
--Abhi
tried to demote glock in transaction from blocking_cb+0x16/0x33 [gfs2]
drop_bh: pid 3561
Glock 0xcc04ccf0 (2, 5243952)
gl_flags =
gl_ref = 4
gl_state = 0
gl_owner = xdoio
gl_ip = 3763384709
req_gh = no
req_bh = no
lvb_count = 0
object = yes
le = yes
reclaim = no
aspace = 0xdbf0ecc0 nrpages = 0
ail = 0
Inode:
num = 41/5243952
type = 8
i_flags = 0
------------[ cut here ]------------
kernel BUG at fs/gfs2/glock.c:123!
invalid opcode: 0000 [#1]
SMP
Modules linked in: lock_dlm gfs2 dlm configfs lpfc
CPU: 0
EIP: 0060:[<e0509b9b>] Not tainted VLI
EFLAGS: 00010292 (2.6.20 #1)
EIP is at glock_in_wrong_state+0x3c/0x40 [gfs2]
eax: 00000000 ebx: cc04ccf0 ecx: ffffffff edx: 00000046
esi: e0520a12 edi: cea25000 ebp: 00000000 esp: dbe8ff18
ds: 007b es: 007b ss: 0068
Process lock_dlm1 (pid: 3561, ti=dbe8e000 task=c343ca70 task.ti=dbe8e000)
Stack: e05207ce e0520a12 00000de9 cc04ccf0 00000000 e050afb6 dbe8ffac cea25000
e051f5a0 cea25000 cc04ccf0 dbe8ffac c2a559c0 e050aa7a 00000286 c012a8ee
df724000 fffefffe dbe8ffbc e006284c 00000000 00000000 00000082 df5abf34
Call Trace:
[<e050afb6>] drop_bh+0x118/0x132 [gfs2]
[<e050aa7a>] gfs2_glock_cb+0xb1/0x116 [gfs2]
[<c012a8ee>] remove_wait_queue+0x31/0x35
[<e006284c>] gdlm_thread+0x5af/0x5fc [lock_dlm]
[<c0116336>] default_wake_function+0x0/0xd
[<e006229d>] gdlm_thread+0x0/0x5fc [lock_dlm]
[<c012a6a1>] kthread+0xb0/0xdb
[<c012a5f1>] kthread+0x0/0xdb
[<c0104253>] kernel_thread_helper+0x7/0x10
=======================
Code: e8 33 9d c2 df 8b 43 34 8b 80 a4 00 00 00 89 74 24 04 c7 04 24 ce 07 52
e0 89 44 24 08 e8 06 15 c1 df 89 da 31 c0 e8 47 fb ff ff <0f> 0b eb fe 55 89 c
5 57 89 d7 56 53 e8 84 f6 ff ff 89 c3 89 c6
EIP: [<e0509b9b>] glock_in_wrong_state+0x3c/0x40 [gfs2] SS:ESP 0068:dbe8ff18
Entering kdb (current=0xc343ca70, pid 3561) on processor 0 Oops: invalid opcod
e
due to oops @ 0xe0509b9b
eax = 0x00000000 ebx = 0xcc04ccf0 ecx = 0xffffffff edx = 0x00000046
esi = 0xe0520a12 edi = 0xcea25000 esp = 0xdbe8ff18 eip = 0xe0509b9b
ebp = 0x00000000 xss = 0xc0380068 xcs = 0x00000060 eflags = 0x00010292
xds = 0x0050007b xes = 0x0000007b origeax = 0xffffffff ®s = 0xdbe8fee0
[0]kdb> ps
37 sleeping system daemon (state M) processes suppressed
Task Addr Pid Parent [*] cpu State Thread Command
0xc343ca70 3561 7 1 0 R 0xc343cc20 *lock_dlm1
0xdf586a70 1 0 0 0 S 0xdf586c20 init
0xdf50b550 960 1 0 0 S 0xdf50b700 udevd
0xdf263550 2658 1 0 0 S 0xdf263700 dhclient
0xdf6aa030 2759 1 0 0 S 0xdf6aa1e0 syslogd
0xdf69b550 2762 1 0 0 R 0xdf69b700 klogd
0xdf407030 2792 1 0 0 S 0xdf4071e0 portmap
0xdf6a6030 2812 1 0 0 S 0xdf6a61e0 rpc.statd
0xdf57d550 2882 1 0 0 S 0xdf57d700 ccsd
0xdf6b0a70 2883 1 0 0 S 0xdf6b0c20 ccsd
0xc14ea030 2888 1 0 0 R 0xc14ea1e0 aisexec
0xdf50ba70 2889 1 0 0 S 0xdf50bc20 aisexec
0xdf407a70 2920 1 0 0 S 0xdf407c20 aisexec
0xdf50b030 2927 1 0 0 S 0xdf50b1e0 aisexec
0xdec5c550 2928 1 0 0 S 0xdec5c700 aisexec
0xc14ea550 2947 1 0 0 S 0xc14ea700 aisexec
0xc1485550 2948 1 0 0 S 0xc1485700 aisexec
0xdf695a70 2949 1 0 0 S 0xdf695c20 aisexec
0xc1480a70 2950 1 0 0 S 0xc1480c20 aisexec
[0]more>
0xdf595550 2953 1 0 0 S 0xdf595700 aisexec
0xc1480550 2954 1 0 0 S 0xc1480700 aisexec
0xdf40b030 2983 1 0 0 S 0xdf40b1e0 aisexec
0xdf6aa550 2984 1 0 0 S 0xdf6aa700 aisexec
0xcea21a70 3556 1 0 0 S 0xcea21c20 aisexec
0xcea21550 3557 1 0 0 S 0xcea21700 aisexec
0xd0db5030 3570 1 0 0 S 0xd0db51e0 aisexec
0xd0dbfa70 3571 1 0 0 S 0xd0dbfc20 aisexec
0xdf595030 2926 1 0 0 S 0xdf5951e0 groupd
0xdec5ca70 2934 1 0 0 S 0xdec5cc20 fenced
0xc1485a70 2940 1 0 0 S 0xc1485c20 dlm_controld
0xdf695550 2946 1 0 0 S 0xdf695700 gfs_controld
0xdf57da70 2966 1 0 0 S 0xdf57dc20 dbus-daemon
0xdf6f5030 2977 1 0 0 S 0xdf6f51e0 clvmd
0xdf263030 2993 1 0 0 S 0xdf2631e0 clvmd
0xdf263a70 2994 1 0 0 S 0xdf263c20 clvmd
0xdf69b030 3081 1 0 0 S 0xdf69b1e0 pcscd
0xc4b8c550 3086 1 0 0 S 0xc4b8c700 pcscd
0xd6ea2550 3108 1 0 0 S 0xd6ea2700 automount
0xdec5ba70 3109 1 0 0 S 0xdec5bc20 automount
0xd6850a70 3110 1 0 0 S 0xd6850c20 automount
0xd6ea2a70 3113 1 0 0 S 0xd6ea2c20 automount
0xd6ea2030 3116 1 0 0 S 0xd6ea21e0 automount
[0]more>
0xcd7cea70 3126 1 0 0 S 0xcd7cec20 acpid
0xc69dfa70 3136 1 0 0 S 0xc69dfc20 cupsd
0xcd7ce030 3149 1 0 0 S 0xcd7ce1e0 sshd
0xdd688030 3160 1 0 0 S 0xdd6881e0 xinetd
0xc69df550 3173 1 0 0 S 0xc69df700 ntpd
0xc343c030 3193 1 0 0 S 0xc343c1e0 sendmail
0xc343c550 3201 1 0 0 S 0xc343c700 sendmail
0xdcc23a70 3220 1 0 0 S 0xdcc23c20 gpm
0xdcc2f030 3246 1 0 0 S 0xdcc2f1e0 crond
0xd8cfc550 3270 1 0 0 S 0xd8cfc700 xfs
0xdcc23550 3289 1 0 0 S 0xdcc23700 atd
0xdcc23030 3301 1 0 0 S 0xdcc231e0 yum-updatesd
0xde941550 3312 1 0 0 S 0xde941700 avahi-daemon
0xde941030 3313 3312 0 0 S 0xde9411e0 avahi-daemon
0xcd7ce550 3323 1 0 0 S 0xcd7ce700 hald
0xc69df030 3324 3323 0 0 S 0xc69df1e0 hald-runner
0xdcc2fa70 3330 3324 0 0 S 0xdcc2fc20 hald-addon-acpi
0xd8cfca70 3337 3324 0 0 S 0xd8cfcc20 hald-addon-keyb
0xdd688a70 3339 3324 0 0 S 0xdd688c20 hald-addon-keyb
0xd8cfc030 3348 3324 0 0 S 0xd8cfc1e0 hald-addon-stor
0xdd688550 3390 1 0 0 S 0xdd688700 smartd
0xdf6f2030 3393 1 0 0 S 0xdf6f21e0 agetty
0xd0d31550 3394 1 0 0 S 0xd0d31700 mingetty
[0]more>
Only 'q' or 'Q' are processed at more prompt, input ignored
0xde941a70 3395 1 0 0 S 0xde941c20 mingetty
0xdcc2f550 3396 1 0 0 S 0xdcc2f700 mingetty
0xd3f97a70 3397 1 0 0 S 0xd3f97c20 mingetty
0xd0d31030 3398 1 0 0 S 0xd0d311e0 mingetty
0xd3f97550 3399 1 0 0 S 0xd3f97700 mingetty
0xd0d31a70 3464 3149 0 0 S 0xd0d31c20 sshd
0xd3f97030 3468 3464 0 0 S 0xd3f971e0 bash
0xc343ca70 3561 7 1 0 R 0xc343cc20 *lock_dlm1
0xccffca70 3562 7 0 0 R 0xccffcc20 lock_dlm2
0xd6850550 3591 3160 0 0 S 0xd6850700 btimed
0xc4b8c030 3664 3160 0 0 S 0xc4b8c1e0 qarshd
0xc4b8ca70 3665 3664 0 0 R 0xc4b8cc20 xdoio
0xdf6aaa70 3666 3665 0 0 R 0xdf6aac20 xdoio
0xd6850030 3671 3160 0 0 S 0xd68501e0 qarshd
0xdf69ba70 3672 3671 0 0 R 0xdf69bc20 xdoio
0xd0db5550 3673 3672 0 0 R 0xd0db5700 xdoio
[0]kdb> btp 3561
Stack traceback for pid 3561
0xc343ca70 3561 7 1 0 R 0xc343cc20 *lock_dlm1
esp eip Function (args)
0xdbe8ff0c 0xe0509b9b [gfs2]glock_in_wrong_state+0x3c
0xdbe8ff2c 0xe050afb6 [gfs2]drop_bh+0x118
0xdbe8ff4c 0xe050aa7a [gfs2]gfs2_glock_cb+0xb1
0xdbe8ff54 0xc012a8ee remove_wait_queue+0x31
0xdbe8ff64 0xe006284c [lock_dlm]gdlm_thread+0x5af
0xdbe8ffa0 0xc0116336 default_wake_function
0xdbe8ffc4 0xe006229d [lock_dlm]gdlm_thread
0xdbe8ffcc 0xc012a6a1 kthread+0xb0
0xdbe8ffd8 0xc012a5f1 kthread
0xdbe8ffe4 0xc0104253 kernel_thread_helper+0x7
[0]kdb> btp 3665
Stack traceback for pid 3665
0xc4b8ca70 3665 3664 0 0 R 0xc4b8cc20 xdoio
esp eip Function (args)
0xcc063f40 0xc0410033 __sched_text_start+0x863
0xcc063f60 0xc039a83b net_rx_action+0x94
0xcc063f98 0xc011ea32 irq_exit+0x35
0xcc063f9c 0xc01055e3 do_IRQ+0x87
0xcc063fb4 0xc010367e work_resched+0x5
[0]kdb> btp 3666
Stack traceback for pid 3666
0xdf6aaa70 3666 3665 0 0 R 0xdf6aac20 xdoio
esp eip Function (args)
0xcc049f30 0xc0410033 __sched_text_start+0x863
0xcc049f48 0xc015c35c locks_free_lock+0x3a
0xcc049f50 0xc015d729 fcntl_setlk64+0x1c4
0xcc049fa4 0xc01163ed sys_sched_yield+0x59
0xcc049fb4 0xc01035a0 sysenter_past_esp+0x5d
[0]kdb> btp 3672
Stack traceback for pid 3672
0xdf69ba70 3672 3671 0 0 R 0xdf69bc20 xdoio
esp eip Function (args)
0xc7cb9f40 0xc0410033 __sched_text_start+0x863
0xc7cb9f64 0xc0121bab run_timer_softirq+0x36
0xc7cb9fb4 0xc010367e work_resched+0x5
[0]kdb> btp 3673
Stack traceback for pid 3673
0xd0db5550 3673 3672 0 0 R 0xd0db5700 xdoio
esp eip Function (args)
0xdbf53f30 0xc0410033 __sched_text_start+0x863
0xdbf53f48 0xc015c35c locks_free_lock+0x3a
0xdbf53f50 0xc015d729 fcntl_setlk64+0x1c4
0xdbf53fa4 0xc01163ed sys_sched_yield+0x59
0xdbf53fb4 0xc01035a0 sysenter_past_esp+0x5d
0xdbf53fdc 0xc0410033 __sched_text_start+0x863
gl_ip = 3763384709 = 0xe050b185
[0]kdb> id 0xe050b185
0xe050b185 gfs2_glock_dq+0x22: lea 0x1c(%ebx),%edi
0xe050b188 gfs2_glock_dq+0x25: mov %edi,%eax
0xe050b18a gfs2_glock_dq+0x27: call 0xc041151e _spin_lock
0xe050b18f gfs2_glock_dq+0x2c: mov 0x4(%esi),%eax
0xe050b192 gfs2_glock_dq+0x2f: mov (%esi),%edx
0xe050b194 gfs2_glock_dq+0x31: mov %edx,(%eax)
0xe050b196 gfs2_glock_dq+0x33: mov %eax,0x4(%edx)
0xe050b199 gfs2_glock_dq+0x36: lea 0x40(%ebx),%eax
0xe050b19c gfs2_glock_dq+0x39: mov %esi,(%esi)
0xe050b19e gfs2_glock_dq+0x3b: cmp %eax,0x40(%ebx)
0xe050b1a1 gfs2_glock_dq+0x3e: mov %esi,0x4(%esi)
0xe050b1a4 gfs2_glock_dq+0x41: jne 0xe050b1c4 gfs2_glock_dq+0x61
0xe050b1a6 gfs2_glock_dq+0x43: movb $0x1,0x1c(%ebx)
0xe050b1aa gfs2_glock_dq+0x47: mov 0x18(%ebp),%edx
0xe050b1ad gfs2_glock_dq+0x4a: test %edx,%edx
0xe050b1af gfs2_glock_dq+0x4c: je 0xe050b1b5 gfs2_glock_dq+0x52
Created attachment 151413 [details]
An updated test patch
Sorry, it looks like there was a race in the previous patch I posted. Here is
an updated version which fixes this problem.
I've had a good look through the results though (which may well still be
correct, though it seems as well to be certain) and I still can't see what is
going on here.
I have been totally unable to recreate this issue. I've tried various types of bonnie++ runs, as well as du's in combination with rm's. Are people still able to reproduce this on the latest gfs2 code? If so, could someone please put a list of commands necessary to reproduce this into the bugzilla, along with information about their setup. bonnie++ version: 1.02
Command line: bonnie++ -d DIR -s 2000 -n 200:50000:0:100 -u root -m server
Cluster config: only one node is present in cluster
<?xml version="1.0"?>
<cluster name="alpha" config_version="2">
<cman>
</cman>
<clusternodes>
<clusternode name="ts25" votes="1" nodeid="1">
<fence>
<method name="single">
<device name="human" ipaddr="ts25"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice name="human" agent="fence_manual"/>
</fencedevices>
</cluster>
I was not able to test the latest gfs2 code, but this bug can be easy reproduced
on RHEL5 kernel.
i can still reproduce this by rm -rf'ing the kernel source and then du'ing on another node. Hi Steve, Can you, please, prepare a debug patch for 2.6.18-el5.8 kernel. *** Bug 218984 has been marked as a duplicate of this bug. *** O.k. I've figured out basically what's wrong here. I have a patch that even stops it. However, I don't understand This chunk of code very well, so I have no idea if my solution is the correct way to solve the problem. What is happening is: One node has a file locked exclusively, and another node requests either a shared or exclusive lock on the file. In the process of lock_dlm handling this request, it calls inode_go_sync() to make sure that everything associated with this glock is synced to disk. However, at the end of inode_go_sync(), the glock and some associated buffers are still on the incore log. Actually, the gfs2_log_flush() in inode_go_sync() removes it from the log, but the filemap_fdatawrite() adds it back. My solution to this is to always flush the log again at the end of inode_go_sync(), instead of only if there are blocks on the glock's active items list. This does make sure that the glock is definitely out of the log, but I'm not sure that this is the most efficient solution, or even the correct place to fix the problem at. I'll post the patch, and keep looking into the problem to see if there's a more sensible solution. If someone else, especially Steve, could have a look at this and tell me what they think, that would be great. Created attachment 153557 [details]
Force gfs2_ail_empty_gl to flush log
This patch makes the problem go away. However, I'm still not sure if this is
the correct way to fix it.
Its possible that the problem is just that we are flushing things in the wrong
order, in fact for ordered writes, it looks like this is certainly the case as
we ought to ensure that we flush the log _after_ the data has been written to
disk. So in this case the inode_go_sync() should do a flush and a wait on the
data before the log flush is called, so that the problem should disappear due to
the different ordering. For writeback, we can try something which is potentially
a bit more efficient, in that we don't need to wait for the data to be written
out before the log flush, so that (pseudo code) it might look something like this:
filemap_fdatawrite();
if (ordered_data)
filemap_fdatawait();
gfs2_log_flush();
if (writeback_data)
filemap_fdatawait();
The other thing that I wonder about is whether in fact the call to
filemap_fdatawrite() should be adding the glock to the transaction, although I
suppose that in the ordered case it has to, otherwise we'd not get proper
flushing of all the data at the correct time.
The changes to the ordering of these operations have a lot of potential for
changing the performance, so I'd suggest trying a few postmark runs before &
after any changes just to see what difference it will make.
Just out of interest, does this bug go away if you mount with writeback rather
than ordered?
Ben, is there any more progress on this one? Created attachment 154134 [details]
flush after filemap_fdatawrite
This fixes the problem for me. It simply moves the flush until after the inode
filemap_fdatawrite, so that the databufs from it get flushed as well.
I'll push this into -nmw just as soon as Linus pulls the current tree. Just a heads up. I went though and ran postmark on clean filesystems with a fresh installed kernel for both with and without this patch. I was wrong about the performance imporvement. Doing five runs on each, it ran faster without the patch on three, and faster with on two. But there wasn't a huge difference either way. Now in the upstream git tree, so its ready to be posted to rhkernel for RHEL at this stage. *** Bug 232010 has been marked as a duplicate of this bug. *** I've just posted this to rhkernel-list to try and ensure that it gets into RHEL 5.1 before Don goes off on holiday. in 2.6.18-19.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 *** Bug 235462 has been marked as a duplicate of this bug. *** An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html |