Bug 231910
Summary: | GFS2 withdraws if you are trying to remove a file while doing a du | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Josef Bacik <jbacik> | ||||||||||
Component: | kernel | Assignee: | Don Zickus <dzickus> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | high | ||||||||||||
Version: | 5.1 | CC: | adas, amirkin, bmarzins, kanderso, lwang, rkenna, swhiteho | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | RHBA-2007-0959 | Doc Type: | Bug Fix | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2007-11-07 19:43:46 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Josef Bacik
2007-03-12 22:29:16 UTC
This looks like a problem related to a glock being demoted incorrectly. If the glock is in a transaction, the transaction must be committed before the glock is demoted. The glock's various operations in glops.c should take account of that and it might well be a wrong assumption of the GLF_DEMOTE patch. I'll take a look at that today. I've looked through the patch and I can't spot anything that might have the effect you are seeing. I've also traced through the code relating to xattrs and not found a lot thats enlightening there either. The check thats triggering on line 61 of lops.c is just checking to see that, at the end of the transaction, the glock in question is still locked in the exclusive state. The call to getxattr is, as a "read-only" call, a bit of a red-herring I think as it just happens to be the next place the lock is used and the EIO returned is due to the I/O failing as a result of the earlier triggered assertion. So there are two questions which spring to mind. Firstly, are you using selinux on this filesystem? and secondly can you get the assertion to print out which type of glock has actually caused this assertion to trigger? as that will be a big clue in solving this. Our team has encountered the same assertion during running bonnie++ test on GFS2 partition. It is 100% reproducable. I'll try to add some debug and find out what type of glock caused this assertion. rh5cluster1.gsslab.rdu.redhat.com login: gl type is 2 GFS2: fsid=rhel5cluster:gfs2lfs2.0: fatal: assertion "gfs2_glock_is_held_excl(gl)" failed GFS2: fsid=rhel5cluster:gfs2lfs2.0: function = glock_lo_after_commit, file = fs/gfs2/lops.c, line = 64 GFS2: fsid=rhel5cluster:gfs2lfs2.0: about to withdraw this file system GFS2: fsid=rhel5cluster:gfs2lfs2.0: telling LM to withdraw so the gl type is LM_TYPE_INODE. glock type 2 state 0 GFS2: fsid=alpha:mygfs.0: fatal: assertion "gfs2_glock_is_held_excl(gl)" failed GFS2: fsid=alpha:mygfs.0: function = glock_lo_after_commit, file = fs/gfs2/lops.c, line = 63 GFS2: fsid=alpha:mygfs.0: about to withdraw this file system GFS2: fsid=alpha:mygfs.0: telling LM to withdraw In our case glock type is LM_TYPE_INODE too, and its state is LM_ST_UNLOCKED. Created attachment 150247 [details]
A test patch to try and narrow down the cause of this bug
Here is a test patch whose aim is to gather information to help solve this
particular bug. In order for it to be useful, I'll need the full output once
the bug has triggered, and also, since one of the items is a pid, I'd like to
know which process it relates to. Since the main culprits here are the gfs2/dlm
daemons, a "ps" after mounting gfs2, but before running the test which triggers
this should be almost as good as data gathered afterwards (in case thats not
possible).
I have been through all the glock and lops code as it applied to the inode
glocks looking for the cause of this bug and its rather elusive it seems. We
know (from other asserts, not triggered) that the glock was certainly in the
exclusive mode when it was added to the transaction. The aim of the patch is to
catch the act of lock demotion of the lock thats still in the transaction and
thus find out which process is responsible.
The patch should apply cleanly to the current GFS2 -nmw git tree which has
recently been rebased against 2.6.21-rc4.
I have had reports that in some cases, there are glocks which have a zero ref
count (which should be impossible) triggering an assert elsewhere in the code
and I have a strong suspicion that the two are linked, but this seems the
easier symptom to use to track down the problem.
Do you plan to fix this bug in RHEL5 (2.6.18-8.el5) kernel? It will be fixed for release 5.1.0 but it needs to (and can) make it into the upstream kernel first, or at least as far as my -nmw git tree anyway. So the answer is yes, it will be fixed, but I'm not sure whether it will be in -8 or a later one. Steve, I hit this assert with your debug patch in place. This is with the latest code from the nmw tree. Below is the output from the debug patch, followed by 'ps' under kdb and finally the backtrace('btp') of the active process, lock_dlm1. Let me know if you need more info. Glock 0xffff81007b7a28b8 (2, 5248259) gl_flags = gl_ref = 4 gl_state = 0 gl_owner = xdoio gl_ip = 18446744071696625455 req_gh = no req_bh = no lvb_count = 0 object = yes le = yes reclaim = no aspace = 0xffff810069c02040 nrpages = 0 ail = 0 Inode: num = 65/5248259 type = 8 i_flags = 0 ------------[ cut here ]------------ kernel BUG at fs/gfs2/glock.c:123! invalid opcode: 0000 [1] SMP Entering kdb (current=0xffff81007d362280, pid 6738) on processor 1 Oops: <NULL > due to oops @ 0xffffffff880515b4 r15 = 0x0000000000000001 r14 = 0xffffffff88067880 r13 = 0x0000000000000000 r12 = 0xffffffff88069399 rbp = 0x0000000000000000 rbx = 0xffff81007b7a28b8 r11 = 0x0000000000000000 r10 = 0x000000000000001f r9 = 0x0000000000000020 r8 = 0x000000000000000d rax = 0x0000000000000000 rcx = 0x000000000000635c rdx = 0x00000000ffffffff rsi = 0xffffffff807c3780 rdi = 0xffffffff806b9e7c orig_rax = 0xffffffffffffffff rip = 0xffffffff880515b4 cs = 0x0000000000000010 eflags = 0x0000000000010246 rsp = 0xffff810071309e00 ss = 0x0000000000000000 ®s = 0xffff810071309d68 [1]kdb> ps 49 sleeping system daemon (state M) processes suppressed Task Addr Pid Parent [*] cpu State Thread Command 0xffff8100717b6760 7303 7302 1 0 R 0xffff8100717b6a20 xdoio 0xffff81007d362280 6738 11 1 1 R 0xffff81007d362540 *lock_dlm1 0xffff810037d814e0 1 0 0 1 S 0xffff810037d817a0 init 0xffff81007f2089a0 950 11 0 1 R 0xffff81007f208c60 kjournald 0xffff81007f3594e0 999 1 0 1 S 0xffff81007f3597a0 udevd 0xffff81007f2130c0 5874 1 0 0 S 0xffff81007f213380 dhclient 0xffff81007f362080 5975 1 0 0 D 0xffff81007f362340 syslogd 0xffff81007f209080 5978 1 0 0 S 0xffff81007f209340 klogd 0xffff81007c7b03c0 5989 1 0 0 S 0xffff81007c7b0680 irqbalance 0xffff81007c001560 6009 1 0 0 S 0xffff81007c001820 portmap 0xffff81007f39ef00 6029 1 0 1 S 0xffff81007f39f1c0 rpc.statd 0xffff81007d26b760 6103 1 0 0 S 0xffff81007d26ba20 ccsd 0xffff810002ffcf00 6104 1 0 0 S 0xffff810002ffd1c0 ccsd 0xffff81007f362760 6109 1 0 0 S 0xffff81007f362a20 aisexec 0xffff81007f358040 6110 1 0 1 S 0xffff81007f358300 aisexec 0xffff81007f394ec0 6111 1 0 1 S 0xffff81007f395180 aisexec 0xffff81007cf10860 6118 1 0 0 S 0xffff81007cf10b20 aisexec 0xffff81007f1c2f80 6119 1 0 0 S 0xffff81007f1c3240 aisexec 0xffff81007c2655a0 6140 1 0 1 S 0xffff81007c265860 aisexec [1]more> Only 'q' or 'Q' are processed at more prompt, input ignored 0xffff81007f1c21c0 6141 1 0 0 S 0xffff81007f1c2480 aisexec 0xffff81007c024400 6142 1 0 0 S 0xffff81007c0246c0 aisexec 0xffff81007f244b20 6143 1 0 0 S 0xffff81007f244de0 aisexec 0xffff81007f18e820 6144 1 0 0 S 0xffff81007f18eae0 aisexec 0xffff81007f279520 6145 1 0 0 S 0xffff81007f2797e0 aisexec 0xffff81007f36d560 6178 1 0 0 S 0xffff81007f36d820 aisexec 0xffff81007f2137a0 6179 1 0 0 S 0xffff81007f213a60 aisexec 0xffff81007f244440 6735 1 0 0 S 0xffff81007f244700 aisexec 0xffff81007f245200 6736 1 0 0 S 0xffff81007f2454c0 aisexec 0xffff81007cf11620 6742 1 0 0 S 0xffff81007cf118e0 aisexec 0xffff81007d26a9a0 6744 1 0 0 S 0xffff81007d26ac60 aisexec 0xffff81007c2ec140 6117 1 0 0 S 0xffff81007c2ec400 groupd 0xffff81007cf10f40 6125 1 0 0 S 0xffff81007cf11200 fenced 0xffff81007f24f4e0 6131 1 0 1 S 0xffff81007f24f7a0 dlm_controld 0xffff81007d363040 6137 1 0 1 R 0xffff81007d363300 gfs_controld 0xffff81007f36ce80 6157 1 0 0 S 0xffff81007f36d140 dbus-daemon 0xffff81007f36c7a0 6168 1 0 0 S 0xffff81007f36ca60 clvmd 0xffff810037c62240 6185 1 0 0 S 0xffff810037c62500 clvmd 0xffff81007f358720 6186 1 0 1 S 0xffff81007f3589e0 clvmd 0xffff81007f362e40 6278 1 0 1 S 0xffff81007f363100 pcscd 0xffff81007f1ff720 6310 1 0 0 S 0xffff81007f1ff9e0 pcscd 0xffff81007f1c3660 6301 1 0 0 S 0xffff81007f1c3920 automount [1]more> Only 'q' or 'Q' are processed at more prompt, input ignored 0xffff81007f36c0c0 6302 1 0 0 S 0xffff81007f36c380 automount 0xffff81007f0b0860 6303 1 0 1 S 0xffff81007f0b0b20 automount 0xffff81007f18f5e0 6306 1 0 0 S 0xffff81007f18f8a0 automount 0xffff81007f1c28a0 6309 1 0 0 S 0xffff81007f1c2b60 automount 0xffff81007f1cc8e0 6320 1 0 1 S 0xffff81007f1ccba0 acpid 0xffff81007f24e720 6330 1 0 1 S 0xffff81007f24e9e0 cupsd 0xffff81007f278e40 6343 1 0 0 S 0xffff81007f279100 sshd 0xffff81007f39f5e0 6354 1 0 0 S 0xffff81007f39f8a0 xinetd 0xffff81007c000e80 6367 1 0 1 S 0xffff81007c001140 ntpd 0xffff810037c63000 6386 1 0 0 S 0xffff810037c632c0 sendmail 0xffff81007f2458e0 6393 1 0 1 S 0xffff81007f245ba0 sendmail 0xffff81007c264ec0 6409 1 0 1 S 0xffff81007c265180 gpm 0xffff81007f1cc200 6421 1 0 1 S 0xffff81007f1cc4c0 crond 0xffff81007f358e00 6447 1 0 1 S 0xffff81007f3590c0 xfs 0xffff81007f18ef00 6466 1 0 1 S 0xffff81007f18f1c0 atd 0xffff81007c264100 6494 1 0 1 S 0xffff81007c2643c0 yum-updatesd 0xffff81007f3955a0 6505 1 0 0 S 0xffff81007f395860 avahi-daemon 0xffff81007f3947e0 6506 6505 0 0 S 0xffff81007f394aa0 avahi-daemon 0xffff81007d2097a0 6516 1 0 0 S 0xffff81007d209a60 hald 0xffff81007c0258a0 6517 6516 0 1 S 0xffff81007c025b60 hald-runner 0xffff81007f209760 6523 6517 0 1 S 0xffff81007f209a20 hald-addon-acpi 0xffff81007c2ec820 6524 6517 0 0 S 0xffff81007c2ecae0 hald-addon-cpuf [1]more> Only 'q' or 'Q' are processed at more prompt, input ignored 0xffff81007c2ecf00 6530 6517 0 0 S 0xffff81007c2ed1c0 hald-addon-keyb 0xffff81007c0000c0 6532 6517 0 1 S 0xffff81007c000380 hald-addon-keyb 0xffff81007cf10180 6569 1 0 0 S 0xffff81007cf10440 smartd 0xffff81007d208300 6572 1 0 1 S 0xffff81007d2085c0 agetty 0xffff81007c0251c0 6573 1 0 1 S 0xffff81007c025480 mingetty 0xffff81007c2647e0 6576 1 0 1 S 0xffff81007c264aa0 mingetty 0xffff81007c7b6e40 6581 1 0 1 S 0xffff81007c7b7100 mingetty 0xffff81007f1855a0 6582 1 0 0 S 0xffff81007f185860 mingetty 0xffff81007f1847e0 6583 1 0 0 S 0xffff81007f184aa0 mingetty 0xffff81007f184100 6584 1 0 0 S 0xffff81007f1843c0 mingetty 0xffff81007cc90a60 6659 6343 0 0 S 0xffff81007cc90d20 sshd 0xffff81007c0cd8e0 6661 6659 0 0 S 0xffff81007c0cdba0 bash 0xffff81007d362280 6738 11 1 1 R 0xffff81007d362540 *lock_dlm1 0xffff81007f212300 7030 6354 0 0 S 0xffff81007f2125c0 btimed 0xffff81007f278080 7299 6354 0 1 S 0xffff81007f278340 qarshd 0xffff81007f34f8e0 7300 7299 0 1 R 0xffff81007f34fba0 xdoio 0xffff8100711ea720 7301 7300 0 0 R 0xffff8100711ea9e0 xdoio 0xffff810037c62920 7302 6354 0 0 S 0xffff810037c62be0 qarshd 0xffff8100717b6760 7303 7302 1 0 R 0xffff8100717b6a20 xdoio 0xffff8100717b6e40 7304 7303 0 0 R 0xffff8100717b7100 xdoio 0xffff81007f2082c0 7305 6354 0 0 S 0xffff81007f208580 qarshd 0xffff81007f24e040 7306 7305 0 1 R 0xffff81007f24e300 xdoio [1]more> nly 'q' or 'Q' are processed at more prompt, input ignored 0xffff81007f24ee00 7307 6354 0 1 S 0xffff81007f24f0c0 qarshd 0xffff81007f39e820 7308 7306 0 1 R 0xffff81007f39eae0 xdoio 0xffff8100717b6080 7309 7307 0 1 R 0xffff8100717b6340 xdoio 0xffff81007c7b6760 7310 7309 0 0 R 0xffff81007c7b6a20 xdoio [1]kdb> btp 6738 Stack traceback for pid 6738 0xffff81007d362280 6738 11 1 1 R 0xffff81007d362540 *lock_dlm1 rsp rip Function (args) 0xffff810071309de8 0xffffffff880515b4 [gfs2]glock_in_wrong_state+0x42 0xffff810071309e28 0xffffffff88052d0b [gfs2]drop_bh+0x14b 0xffff810071309e58 0xffffffff880526e2 [gfs2]gfs2_glock_cb+0xcb 0xffff810071309e78 0xffffffff880c0896 [lock_dlm]gdlm_thread+0x512 0xffff810071309ea0 0xffffffff8021e1a8 default_wake_function 0xffff810071309ef0 0xffffffff880c0384 [lock_dlm]gdlm_thread 0xffff810071309f08 0xffffffff8023410e keventd_create_kthread 0xffff810071309f18 0xffffffff80234365 kthread+0xd1 0xffff810071309f48 0xffffffff8020a298 child_rip+0xa 0xffff810071309f60 0xffffffff8023410e keventd_create_kthread 0xffff810071309fc8 0xffffffff80234294 kthread 0xffff810071309fd8 0xffffffff8020a28e child_rip Did the: print_symbol(KERN_INFO "tried to demote glock in transaction from %s\n", gl->gl_demoter); line not print out? Also it would be good if you could turn the gl_ip into a symbol name as its not much use otherwise. A btp of the xdoio process would have been useful too as its probably been caused by a combination of the two threads. It seems that my hunch that it should be possible to trap the bug earlier was correct. The odd thing here is that it appears that the glock has no pages and is unlocked as you'd expect at this stage in a demotion. It does however still appear to be part of a transaction which is unexpected, since if its run through the demotion code (as suggested by the lack of pages) then it should have flushed the log at the same time. We do at least now know that its caused by a demote request from a remote node and that its an exclusive -> unlocked transistion. I'll keep looking to see if I can narrow down the cause further. Hi Steve, Here's the additional info you requested when I hit the bug again. It has the print_symbol() stuff in it. backtraces for the lock_dlm/1 process and the xdoio processes (4 of them). I also did an id on the gl_ip and looks like it points to gfs2_glock_dq. Let me know if you need more info. Thanks, --Abhi tried to demote glock in transaction from blocking_cb+0x16/0x33 [gfs2] drop_bh: pid 3561 Glock 0xcc04ccf0 (2, 5243952) gl_flags = gl_ref = 4 gl_state = 0 gl_owner = xdoio gl_ip = 3763384709 req_gh = no req_bh = no lvb_count = 0 object = yes le = yes reclaim = no aspace = 0xdbf0ecc0 nrpages = 0 ail = 0 Inode: num = 41/5243952 type = 8 i_flags = 0 ------------[ cut here ]------------ kernel BUG at fs/gfs2/glock.c:123! invalid opcode: 0000 [#1] SMP Modules linked in: lock_dlm gfs2 dlm configfs lpfc CPU: 0 EIP: 0060:[<e0509b9b>] Not tainted VLI EFLAGS: 00010292 (2.6.20 #1) EIP is at glock_in_wrong_state+0x3c/0x40 [gfs2] eax: 00000000 ebx: cc04ccf0 ecx: ffffffff edx: 00000046 esi: e0520a12 edi: cea25000 ebp: 00000000 esp: dbe8ff18 ds: 007b es: 007b ss: 0068 Process lock_dlm1 (pid: 3561, ti=dbe8e000 task=c343ca70 task.ti=dbe8e000) Stack: e05207ce e0520a12 00000de9 cc04ccf0 00000000 e050afb6 dbe8ffac cea25000 e051f5a0 cea25000 cc04ccf0 dbe8ffac c2a559c0 e050aa7a 00000286 c012a8ee df724000 fffefffe dbe8ffbc e006284c 00000000 00000000 00000082 df5abf34 Call Trace: [<e050afb6>] drop_bh+0x118/0x132 [gfs2] [<e050aa7a>] gfs2_glock_cb+0xb1/0x116 [gfs2] [<c012a8ee>] remove_wait_queue+0x31/0x35 [<e006284c>] gdlm_thread+0x5af/0x5fc [lock_dlm] [<c0116336>] default_wake_function+0x0/0xd [<e006229d>] gdlm_thread+0x0/0x5fc [lock_dlm] [<c012a6a1>] kthread+0xb0/0xdb [<c012a5f1>] kthread+0x0/0xdb [<c0104253>] kernel_thread_helper+0x7/0x10 ======================= Code: e8 33 9d c2 df 8b 43 34 8b 80 a4 00 00 00 89 74 24 04 c7 04 24 ce 07 52 e0 89 44 24 08 e8 06 15 c1 df 89 da 31 c0 e8 47 fb ff ff <0f> 0b eb fe 55 89 c 5 57 89 d7 56 53 e8 84 f6 ff ff 89 c3 89 c6 EIP: [<e0509b9b>] glock_in_wrong_state+0x3c/0x40 [gfs2] SS:ESP 0068:dbe8ff18 Entering kdb (current=0xc343ca70, pid 3561) on processor 0 Oops: invalid opcod e due to oops @ 0xe0509b9b eax = 0x00000000 ebx = 0xcc04ccf0 ecx = 0xffffffff edx = 0x00000046 esi = 0xe0520a12 edi = 0xcea25000 esp = 0xdbe8ff18 eip = 0xe0509b9b ebp = 0x00000000 xss = 0xc0380068 xcs = 0x00000060 eflags = 0x00010292 xds = 0x0050007b xes = 0x0000007b origeax = 0xffffffff ®s = 0xdbe8fee0 [0]kdb> ps 37 sleeping system daemon (state M) processes suppressed Task Addr Pid Parent [*] cpu State Thread Command 0xc343ca70 3561 7 1 0 R 0xc343cc20 *lock_dlm1 0xdf586a70 1 0 0 0 S 0xdf586c20 init 0xdf50b550 960 1 0 0 S 0xdf50b700 udevd 0xdf263550 2658 1 0 0 S 0xdf263700 dhclient 0xdf6aa030 2759 1 0 0 S 0xdf6aa1e0 syslogd 0xdf69b550 2762 1 0 0 R 0xdf69b700 klogd 0xdf407030 2792 1 0 0 S 0xdf4071e0 portmap 0xdf6a6030 2812 1 0 0 S 0xdf6a61e0 rpc.statd 0xdf57d550 2882 1 0 0 S 0xdf57d700 ccsd 0xdf6b0a70 2883 1 0 0 S 0xdf6b0c20 ccsd 0xc14ea030 2888 1 0 0 R 0xc14ea1e0 aisexec 0xdf50ba70 2889 1 0 0 S 0xdf50bc20 aisexec 0xdf407a70 2920 1 0 0 S 0xdf407c20 aisexec 0xdf50b030 2927 1 0 0 S 0xdf50b1e0 aisexec 0xdec5c550 2928 1 0 0 S 0xdec5c700 aisexec 0xc14ea550 2947 1 0 0 S 0xc14ea700 aisexec 0xc1485550 2948 1 0 0 S 0xc1485700 aisexec 0xdf695a70 2949 1 0 0 S 0xdf695c20 aisexec 0xc1480a70 2950 1 0 0 S 0xc1480c20 aisexec [0]more> 0xdf595550 2953 1 0 0 S 0xdf595700 aisexec 0xc1480550 2954 1 0 0 S 0xc1480700 aisexec 0xdf40b030 2983 1 0 0 S 0xdf40b1e0 aisexec 0xdf6aa550 2984 1 0 0 S 0xdf6aa700 aisexec 0xcea21a70 3556 1 0 0 S 0xcea21c20 aisexec 0xcea21550 3557 1 0 0 S 0xcea21700 aisexec 0xd0db5030 3570 1 0 0 S 0xd0db51e0 aisexec 0xd0dbfa70 3571 1 0 0 S 0xd0dbfc20 aisexec 0xdf595030 2926 1 0 0 S 0xdf5951e0 groupd 0xdec5ca70 2934 1 0 0 S 0xdec5cc20 fenced 0xc1485a70 2940 1 0 0 S 0xc1485c20 dlm_controld 0xdf695550 2946 1 0 0 S 0xdf695700 gfs_controld 0xdf57da70 2966 1 0 0 S 0xdf57dc20 dbus-daemon 0xdf6f5030 2977 1 0 0 S 0xdf6f51e0 clvmd 0xdf263030 2993 1 0 0 S 0xdf2631e0 clvmd 0xdf263a70 2994 1 0 0 S 0xdf263c20 clvmd 0xdf69b030 3081 1 0 0 S 0xdf69b1e0 pcscd 0xc4b8c550 3086 1 0 0 S 0xc4b8c700 pcscd 0xd6ea2550 3108 1 0 0 S 0xd6ea2700 automount 0xdec5ba70 3109 1 0 0 S 0xdec5bc20 automount 0xd6850a70 3110 1 0 0 S 0xd6850c20 automount 0xd6ea2a70 3113 1 0 0 S 0xd6ea2c20 automount 0xd6ea2030 3116 1 0 0 S 0xd6ea21e0 automount [0]more> 0xcd7cea70 3126 1 0 0 S 0xcd7cec20 acpid 0xc69dfa70 3136 1 0 0 S 0xc69dfc20 cupsd 0xcd7ce030 3149 1 0 0 S 0xcd7ce1e0 sshd 0xdd688030 3160 1 0 0 S 0xdd6881e0 xinetd 0xc69df550 3173 1 0 0 S 0xc69df700 ntpd 0xc343c030 3193 1 0 0 S 0xc343c1e0 sendmail 0xc343c550 3201 1 0 0 S 0xc343c700 sendmail 0xdcc23a70 3220 1 0 0 S 0xdcc23c20 gpm 0xdcc2f030 3246 1 0 0 S 0xdcc2f1e0 crond 0xd8cfc550 3270 1 0 0 S 0xd8cfc700 xfs 0xdcc23550 3289 1 0 0 S 0xdcc23700 atd 0xdcc23030 3301 1 0 0 S 0xdcc231e0 yum-updatesd 0xde941550 3312 1 0 0 S 0xde941700 avahi-daemon 0xde941030 3313 3312 0 0 S 0xde9411e0 avahi-daemon 0xcd7ce550 3323 1 0 0 S 0xcd7ce700 hald 0xc69df030 3324 3323 0 0 S 0xc69df1e0 hald-runner 0xdcc2fa70 3330 3324 0 0 S 0xdcc2fc20 hald-addon-acpi 0xd8cfca70 3337 3324 0 0 S 0xd8cfcc20 hald-addon-keyb 0xdd688a70 3339 3324 0 0 S 0xdd688c20 hald-addon-keyb 0xd8cfc030 3348 3324 0 0 S 0xd8cfc1e0 hald-addon-stor 0xdd688550 3390 1 0 0 S 0xdd688700 smartd 0xdf6f2030 3393 1 0 0 S 0xdf6f21e0 agetty 0xd0d31550 3394 1 0 0 S 0xd0d31700 mingetty [0]more> Only 'q' or 'Q' are processed at more prompt, input ignored 0xde941a70 3395 1 0 0 S 0xde941c20 mingetty 0xdcc2f550 3396 1 0 0 S 0xdcc2f700 mingetty 0xd3f97a70 3397 1 0 0 S 0xd3f97c20 mingetty 0xd0d31030 3398 1 0 0 S 0xd0d311e0 mingetty 0xd3f97550 3399 1 0 0 S 0xd3f97700 mingetty 0xd0d31a70 3464 3149 0 0 S 0xd0d31c20 sshd 0xd3f97030 3468 3464 0 0 S 0xd3f971e0 bash 0xc343ca70 3561 7 1 0 R 0xc343cc20 *lock_dlm1 0xccffca70 3562 7 0 0 R 0xccffcc20 lock_dlm2 0xd6850550 3591 3160 0 0 S 0xd6850700 btimed 0xc4b8c030 3664 3160 0 0 S 0xc4b8c1e0 qarshd 0xc4b8ca70 3665 3664 0 0 R 0xc4b8cc20 xdoio 0xdf6aaa70 3666 3665 0 0 R 0xdf6aac20 xdoio 0xd6850030 3671 3160 0 0 S 0xd68501e0 qarshd 0xdf69ba70 3672 3671 0 0 R 0xdf69bc20 xdoio 0xd0db5550 3673 3672 0 0 R 0xd0db5700 xdoio [0]kdb> btp 3561 Stack traceback for pid 3561 0xc343ca70 3561 7 1 0 R 0xc343cc20 *lock_dlm1 esp eip Function (args) 0xdbe8ff0c 0xe0509b9b [gfs2]glock_in_wrong_state+0x3c 0xdbe8ff2c 0xe050afb6 [gfs2]drop_bh+0x118 0xdbe8ff4c 0xe050aa7a [gfs2]gfs2_glock_cb+0xb1 0xdbe8ff54 0xc012a8ee remove_wait_queue+0x31 0xdbe8ff64 0xe006284c [lock_dlm]gdlm_thread+0x5af 0xdbe8ffa0 0xc0116336 default_wake_function 0xdbe8ffc4 0xe006229d [lock_dlm]gdlm_thread 0xdbe8ffcc 0xc012a6a1 kthread+0xb0 0xdbe8ffd8 0xc012a5f1 kthread 0xdbe8ffe4 0xc0104253 kernel_thread_helper+0x7 [0]kdb> btp 3665 Stack traceback for pid 3665 0xc4b8ca70 3665 3664 0 0 R 0xc4b8cc20 xdoio esp eip Function (args) 0xcc063f40 0xc0410033 __sched_text_start+0x863 0xcc063f60 0xc039a83b net_rx_action+0x94 0xcc063f98 0xc011ea32 irq_exit+0x35 0xcc063f9c 0xc01055e3 do_IRQ+0x87 0xcc063fb4 0xc010367e work_resched+0x5 [0]kdb> btp 3666 Stack traceback for pid 3666 0xdf6aaa70 3666 3665 0 0 R 0xdf6aac20 xdoio esp eip Function (args) 0xcc049f30 0xc0410033 __sched_text_start+0x863 0xcc049f48 0xc015c35c locks_free_lock+0x3a 0xcc049f50 0xc015d729 fcntl_setlk64+0x1c4 0xcc049fa4 0xc01163ed sys_sched_yield+0x59 0xcc049fb4 0xc01035a0 sysenter_past_esp+0x5d [0]kdb> btp 3672 Stack traceback for pid 3672 0xdf69ba70 3672 3671 0 0 R 0xdf69bc20 xdoio esp eip Function (args) 0xc7cb9f40 0xc0410033 __sched_text_start+0x863 0xc7cb9f64 0xc0121bab run_timer_softirq+0x36 0xc7cb9fb4 0xc010367e work_resched+0x5 [0]kdb> btp 3673 Stack traceback for pid 3673 0xd0db5550 3673 3672 0 0 R 0xd0db5700 xdoio esp eip Function (args) 0xdbf53f30 0xc0410033 __sched_text_start+0x863 0xdbf53f48 0xc015c35c locks_free_lock+0x3a 0xdbf53f50 0xc015d729 fcntl_setlk64+0x1c4 0xdbf53fa4 0xc01163ed sys_sched_yield+0x59 0xdbf53fb4 0xc01035a0 sysenter_past_esp+0x5d 0xdbf53fdc 0xc0410033 __sched_text_start+0x863 gl_ip = 3763384709 = 0xe050b185 [0]kdb> id 0xe050b185 0xe050b185 gfs2_glock_dq+0x22: lea 0x1c(%ebx),%edi 0xe050b188 gfs2_glock_dq+0x25: mov %edi,%eax 0xe050b18a gfs2_glock_dq+0x27: call 0xc041151e _spin_lock 0xe050b18f gfs2_glock_dq+0x2c: mov 0x4(%esi),%eax 0xe050b192 gfs2_glock_dq+0x2f: mov (%esi),%edx 0xe050b194 gfs2_glock_dq+0x31: mov %edx,(%eax) 0xe050b196 gfs2_glock_dq+0x33: mov %eax,0x4(%edx) 0xe050b199 gfs2_glock_dq+0x36: lea 0x40(%ebx),%eax 0xe050b19c gfs2_glock_dq+0x39: mov %esi,(%esi) 0xe050b19e gfs2_glock_dq+0x3b: cmp %eax,0x40(%ebx) 0xe050b1a1 gfs2_glock_dq+0x3e: mov %esi,0x4(%esi) 0xe050b1a4 gfs2_glock_dq+0x41: jne 0xe050b1c4 gfs2_glock_dq+0x61 0xe050b1a6 gfs2_glock_dq+0x43: movb $0x1,0x1c(%ebx) 0xe050b1aa gfs2_glock_dq+0x47: mov 0x18(%ebp),%edx 0xe050b1ad gfs2_glock_dq+0x4a: test %edx,%edx 0xe050b1af gfs2_glock_dq+0x4c: je 0xe050b1b5 gfs2_glock_dq+0x52 Created attachment 151413 [details]
An updated test patch
Sorry, it looks like there was a race in the previous patch I posted. Here is
an updated version which fixes this problem.
I've had a good look through the results though (which may well still be
correct, though it seems as well to be certain) and I still can't see what is
going on here.
I have been totally unable to recreate this issue. I've tried various types of bonnie++ runs, as well as du's in combination with rm's. Are people still able to reproduce this on the latest gfs2 code? If so, could someone please put a list of commands necessary to reproduce this into the bugzilla, along with information about their setup. bonnie++ version: 1.02 Command line: bonnie++ -d DIR -s 2000 -n 200:50000:0:100 -u root -m server Cluster config: only one node is present in cluster <?xml version="1.0"?> <cluster name="alpha" config_version="2"> <cman> </cman> <clusternodes> <clusternode name="ts25" votes="1" nodeid="1"> <fence> <method name="single"> <device name="human" ipaddr="ts25"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice name="human" agent="fence_manual"/> </fencedevices> </cluster> I was not able to test the latest gfs2 code, but this bug can be easy reproduced on RHEL5 kernel. i can still reproduce this by rm -rf'ing the kernel source and then du'ing on another node. Hi Steve, Can you, please, prepare a debug patch for 2.6.18-el5.8 kernel. *** Bug 218984 has been marked as a duplicate of this bug. *** O.k. I've figured out basically what's wrong here. I have a patch that even stops it. However, I don't understand This chunk of code very well, so I have no idea if my solution is the correct way to solve the problem. What is happening is: One node has a file locked exclusively, and another node requests either a shared or exclusive lock on the file. In the process of lock_dlm handling this request, it calls inode_go_sync() to make sure that everything associated with this glock is synced to disk. However, at the end of inode_go_sync(), the glock and some associated buffers are still on the incore log. Actually, the gfs2_log_flush() in inode_go_sync() removes it from the log, but the filemap_fdatawrite() adds it back. My solution to this is to always flush the log again at the end of inode_go_sync(), instead of only if there are blocks on the glock's active items list. This does make sure that the glock is definitely out of the log, but I'm not sure that this is the most efficient solution, or even the correct place to fix the problem at. I'll post the patch, and keep looking into the problem to see if there's a more sensible solution. If someone else, especially Steve, could have a look at this and tell me what they think, that would be great. Created attachment 153557 [details]
Force gfs2_ail_empty_gl to flush log
This patch makes the problem go away. However, I'm still not sure if this is
the correct way to fix it.
Its possible that the problem is just that we are flushing things in the wrong order, in fact for ordered writes, it looks like this is certainly the case as we ought to ensure that we flush the log _after_ the data has been written to disk. So in this case the inode_go_sync() should do a flush and a wait on the data before the log flush is called, so that the problem should disappear due to the different ordering. For writeback, we can try something which is potentially a bit more efficient, in that we don't need to wait for the data to be written out before the log flush, so that (pseudo code) it might look something like this: filemap_fdatawrite(); if (ordered_data) filemap_fdatawait(); gfs2_log_flush(); if (writeback_data) filemap_fdatawait(); The other thing that I wonder about is whether in fact the call to filemap_fdatawrite() should be adding the glock to the transaction, although I suppose that in the ordered case it has to, otherwise we'd not get proper flushing of all the data at the correct time. The changes to the ordering of these operations have a lot of potential for changing the performance, so I'd suggest trying a few postmark runs before & after any changes just to see what difference it will make. Just out of interest, does this bug go away if you mount with writeback rather than ordered? Ben, is there any more progress on this one? Created attachment 154134 [details]
flush after filemap_fdatawrite
This fixes the problem for me. It simply moves the flush until after the inode
filemap_fdatawrite, so that the databufs from it get flushed as well.
I'll push this into -nmw just as soon as Linus pulls the current tree. Just a heads up. I went though and ran postmark on clean filesystems with a fresh installed kernel for both with and without this patch. I was wrong about the performance imporvement. Doing five runs on each, it ran faster without the patch on three, and faster with on two. But there wasn't a huge difference either way. Now in the upstream git tree, so its ready to be posted to rhkernel for RHEL at this stage. *** Bug 232010 has been marked as a duplicate of this bug. *** I've just posted this to rhkernel-list to try and ensure that it gets into RHEL 5.1 before Don goes off on holiday. in 2.6.18-19.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 *** Bug 235462 has been marked as a duplicate of this bug. *** An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html |