Description of problem: Hi, i'm trying to use GFS2 with CTDB service and have seen a problem with fcntl() locks. I've noticed a smbd proccess in D state, this happen every time I try to perform a "smbstatus" on cluster. What I see in log.ctdb is: 2009/05/25 17:54:11.251152 [ 7948]: server/ctdb_traverse.c:231 Traverse all timeout on database:locking.tdb Version-Release number of selected component (if applicable): kmod-gfs2-1.92-1.1.el5_2.2 kmod-gfs2-1.92-1.1.el5 gfs2-utils-0.1.53-1.el5_3.3 How reproducible: Performing smbstatus for CTDB + GFS2 working cluster. Actual results: Here is some other info that may help: [root@aramis ~]# ps ax | grep DN 25687 pts/2 S+ 0:00 grep DN 32187 ? DN 0:16 smbd [root@aramis ~]# grep 32187 /proc/locks 13: POSIX ADVISORY WRITE 32187 fd:02:262155 10172 10172 14: FLOCK MSNFS READ 32187 fd:18:2845324 0 EOF [root@aramis ~]# ps fax | grep ctdb 28805 pts/0 S+ 0:00 \_ grep ctdb 7948 ? Ss 3:32 ctdbd --reclock=/fs_dlm/ctdb/ctdb.lock --public-addresses=/etc/ctdb/public_addresses --public-interface=bond1 -d 2 --notification-script=/etc/ctdb/notify.sh 7950 ? S 0:00 \_ ctdbd --reclock=/fs_dlm/ctdb/ctdb.lock --public-addresses=/etc/ctdb/public_addresses --public-interface=bond1 -d 2 --notification-script=/etc/ctdb/notify.sh 5551 ? S 0:00 \_ ctdbd --reclock=/fs_dlm/ctdb/ctdb.lock --public-addresses=/etc/ctdb/public_addresses --public-interface=bond1 -d 2 --notification-script=/etc/ctdb/notify.sh 6170 ? S 0:00 \_ ctdbd --reclock=/fs_dlm/ctdb/ctdb.lock --public-addresses=/etc/ctdb/public_addresses --public-interface=bond1 -d 2 --notification-script=/etc/ctdb/notify.sh 7245 ? S 0:00 \_ ctdbd --reclock=/fs_dlm/ctdb/ctdb.lock --public-addresses=/etc/ctdb/public_addresses --public-interface=bond1 -d 2 --notification-script=/etc/ctdb/notify.sh 16438 ? S 0:00 \_ ctdbd --reclock=/fs_dlm/ctdb/ctdb.lock --public-addresses=/etc/ctdb/public_addresses --public-interface=bond1 -d 2 --notification-script=/etc/ctdb/notify.sh 8932 ? S 0:00 \_ ctdbd --reclock=/fs_dlm/ctdb/ctdb.lock --public-addresses=/etc/ctdb/public_addresses --public-interface=bond1 -d 2 --notification-script=/etc/ctdb/notify.sh 4817 ? S 0:00 \_ ctdbd --reclock=/fs_dlm/ctdb/ctdb.lock --public-addresses=/etc/ctdb/public_addresses --public-interface=bond1 -d 2 --notification-script=/etc/ctdb/notify.sh 6723 ? S 0:00 \_ ctdbd --reclock=/fs_dlm/ctdb/ctdb.lock --public-addresses=/etc/ctdb/public_addresses --public-interface=bond1 -d 2 --notification-script=/etc/ctdb/notify.sh 7884 ? S 0:00 \_ ctdbd --reclock=/fs_dlm/ctdb/ctdb.lock --public-addresses=/etc/ctdb/public_addresses --public-interface=bond1 -d 2 --notification-script=/etc/ctdb/notify.sh 10697 ? S 0:00 \_ ctdbd --reclock=/fs_dlm/ctdb/ctdb.lock --public-addresses=/etc/ctdb/public_addresses --public-interface=bond1 -d 2 --notification-script=/etc/ctdb/notify.sh [root@aramis ~]# strace -ttT -f -p 10697 Process 10697 attached - interrupt to quit 18:37:32.637990 fcntl(26, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=10172, len=1} [1]+ Stopped strace -ttT -f -p 10697 [root@aramis ~]# strace -ttT -f -p 7884 Process 7884 attached - interrupt to quit 18:37:44.455618 fcntl(26, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=10172, len=1} [2]+ Stopped strace -ttT -f -p 7884 [root@aramis ~]# strace -ttT -f -p 6723 Process 6723 attached - interrupt to quit 18:37:53.686407 fcntl(26, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=10172, len=1} [3]+ Stopped strace -ttT -f -p 6723 [root@aramis ~]# strace -ttT -f -p 4817 Process 4817 attached - interrupt to quit 18:38:02.395082 fcntl(26, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=10172, len=1} [4]+ Stopped strace -ttT -f -p 4817 Additional info: Here is my smbstatus output, followed by a tail on ctdb log: Samba version 3.3.4 PID Username Group Machine ------------------------------------------------------------------- 2:17791 fernando usuarios fernandonb (192.168.4.181) 3:31736 alisson usuarios alissonpc (192.168.4.166) 2:17442 fabiane usuarios fabianepc (192.168.5.182) 3:15669 rodrigoc usuarios rodrigocpc (192.168.5.67) 3:21737 renan usuarios renanpc (192.168.4.191) 3:5579 clovis usuarios clovispc (192.168.5.91) 3:28810 lacerda usuarios lacerdapc (192.168.5.111) 2:13151 cristina usuarios giovannipc1 (192.168.4.221) 3:6308 fmoraes usuarios fmoraespc (192.168.5.82) 2:15174 rosa usuarios rosapc (192.168.5.47) 2:32187 canuto usuarios canutopc (192.168.4.144) 3:9593 sergio usuarios sergiopc (192.168.5.95) 2:23807 giovanni usuarios giovannipc (192.168.4.224) 2:19120 flaviana usuarios flaviana10pc (192.168.5.157) 3:32555 joaquim usuarios joaquimpc (192.168.5.115) 3:24506 henrique usuarios henriquenb (192.168.5.49) 0:15360 soraia usuarios soraiapc (192.168.5.124) 2:11541 amarildo usuarios amarildomaq (192.168.4.254) 3:2548 andrei usuarios andreipc (192.168.5.33) 2:11445 kraus usuarios suportepc (192.168.4.210) 0:16921 virginia usuarios virginiapc (192.168.4.250) 0:11480 frederico usuarios fredericopc (192.168.4.234) 0:6751 cicero usuarios ciceropc (192.168.5.104) 0:15361 cristiano usuarios cristiano02pc (192.168.4.146) 0:29084 vargas usuarios repro01pc (192.168.4.164) 2:1938 vanessa usuarios vanessapc (192.168.5.107) 0:14708 leocampos usuarios leocampospc (192.168.4.142) 1:2949 canuto usuarios canutopc (192.168.4.144) 1:20740 gean usuarios geanpc (192.168.5.123) 1:22698 vanessa usuarios vanessapc (192.168.5.107) 1:11725 roni usuarios roni11pc (192.168.4.251) 1:18545 marcia usuarios marciapc (192.168.5.227) 1:20373 marceloluiz usuarios marceloluizpc (192.168.5.173) 1:18546 silvio usuarios silvionb (192.168.4.172) 1:19669 claudia usuarios claudiapc (192.168.4.233) 1:25232 andreza usuarios andrezapc (192.168.5.174) 1:16812 eduardo usuarios eduardopc (192.168.5.251) 1:20520 rodrigob usuarios rodrigobpc (192.168.5.253) 1:9237 rodrigob usuarios rodrigobpc (192.168.5.253) 1:10847 flaviocj usuarios flaviocj2pc (192.168.5.214) 1:18660 pbrentan usuarios pbrentanmaq (192.168.4.131) Service pid machine Connected at ------------------------------------------------------- troca 1:18545 marciapc Mon May 25 16:08:46 2009 TROCA 0:6751 ciceropc Mon May 25 17:49:09 2009 troca 1:18546 silvionb Mon May 25 16:08:46 2009 troca 0:16921 virginiapc Mon May 25 18:07:19 2009 kraus 2:11445 suportepc Mon May 25 16:08:48 2009 troca 1:10847 flaviocj2pc Mon May 25 16:53:03 2009 docs 2:1938 vanessapc Mon May 25 16:50:28 2009 troca 0:15360 soraiapc Mon May 25 16:08:44 2009 cristiano 0:15361 cristiano02pc Mon May 25 17:11:18 2009 troca 1:20373 marceloluizpc Mon May 25 16:12:04 2009 troca 0:15361 cristiano02pc Mon May 25 16:08:44 2009 util 1:22698 vanessapc Mon May 25 16:47:06 2009 DOCS 1:11725 roni11pc Mon May 25 18:51:44 2009 RONI 1:11725 roni11pc Mon May 25 18:51:31 2009 TROCA 1:11725 roni11pc Mon May 25 18:51:52 2009 UTIL 1:11725 roni11pc Mon May 25 18:51:37 2009 renan 3:21737 renanpc Mon May 25 18:28:04 2009 TROCA 2:32187 canutopc Mon May 25 08:07:28 2009 troca 1:19669 claudiapc Mon May 25 16:10:49 2009 troca 1:25232 andrezapc Mon May 25 16:20:51 2009 troca 2:1938 vanessapc Mon May 25 16:50:28 2009 IPC$ 1:2949 canutopc Mon May 25 15:45:36 2009 troca 1:9237 rodrigobpc Mon May 25 17:48:30 2009 eduardo 1:16812 eduardopc Mon May 25 18:02:15 2009 docs 1:22698 vanessapc Mon May 25 16:47:06 2009 troca 3:31736 alissonpc Mon May 25 16:08:44 2009 util 2:1938 vanessapc Mon May 25 16:50:28 2009 CANUTO 2:32187 canutopc Mon May 25 08:07:26 2009 util 3:2548 andreipc Mon May 25 17:20:29 2009 troca 3:32555 joaquimpc Mon May 25 16:46:44 2009 IPC$ 3:9593 sergiopc Mon May 25 17:02:36 2009 vanessa 2:1938 vanessapc Mon May 25 16:50:28 2009 util 2:11445 suportepc Mon May 25 16:08:48 2009 util 1:20740 geanpc Mon May 25 16:12:43 2009 IPC$ 1:11725 roni11pc Mon May 25 18:51:52 2009 silvio 1:18546 silvionb Mon May 25 16:08:44 2009 claudia 1:19669 claudiapc Mon May 25 16:10:50 2009 henrique 3:24506 henriquenb Mon May 25 17:52:10 2009 canuto 1:2949 canutopc Mon May 25 15:42:19 2009 fmoraes 3:6308 fmoraespc Mon May 25 16:21:21 2009 CICERO 0:6751 ciceropc Mon May 25 17:49:02 2009 joaquim 3:32555 joaquimpc Mon May 25 16:10:16 2009 soraia 0:15360 soraiapc Mon May 25 16:08:44 2009 IPC$ 1:20520 rodrigobpc Mon May 25 17:10:48 2009 troca 2:11541 amarildomaq Mon May 25 16:08:53 2009 troca 3:6308 fmoraespc Mon May 25 16:21:21 2009 leocampos 0:14708 leocampospc Mon May 25 16:09:09 2009 troca 3:21737 renanpc Mon May 25 17:47:09 2009 alisson 3:31736 alissonpc Mon May 25 16:11:28 2009 cristina 2:13151 giovannipc1 Mon May 25 17:10:16 2009 util 1:18660 pbrentanmaq Mon May 25 16:08:59 2009 troca 3:5579 clovispc Mon May 25 17:18:25 2009 TROCA 2:13151 giovannipc1 Mon May 25 17:10:17 2009 troca 2:17442 fabianepc Mon May 25 18:16:20 2009 IPC$ 2:32187 canutopc Mon May 25 15:41:01 2009 frederico 0:11480 fredericopc Mon May 25 17:57:22 2009 andrei 3:2548 andreipc Mon May 25 17:20:29 2009 ROSA 2:15174 rosapc Mon May 25 16:15:34 2009 docs 0:15361 cristiano02pc Mon May 25 17:11:18 2009 TROCA 3:28810 lacerdapc Mon May 25 18:00:04 2009 troca 1:20520 rodrigobpc Mon May 25 17:10:38 2009 troca 1:18660 pbrentanmaq Mon May 25 16:08:59 2009 troca 1:2949 canutopc Mon May 25 15:42:17 2009 sergio 3:9593 sergiopc Mon May 25 16:51:12 2009 IPC$ 2:32187 canutopc Mon May 25 15:41:01 2009 troca 2:17791 fernandonb Mon May 25 16:20:16 2009 troca 2:23807 giovannipc Mon May 25 16:31:20 2009 IPC$ 1:2949 canutopc Mon May 25 15:45:36 2009 pbrentan 1:18660 pbrentanmaq Mon May 25 16:08:58 2009 troca 1:22698 vanessapc Mon May 25 16:16:17 2009 troca 1:20740 geanpc Mon May 25 18:11:47 2009 docs 3:2548 andreipc Mon May 25 17:20:29 2009 troca 0:29084 repro01pc Mon May 25 16:33:35 2009 TROCA 2:19120 flaviana10pc Mon May 25 17:21:04 2009 LACERDA 3:28810 lacerdapc Mon May 25 18:00:04 2009 troca 3:2548 andreipc Mon May 25 17:12:54 2009 docs 2:11445 suportepc Mon May 25 16:08:48 2009 vanessa 1:22698 vanessapc Mon May 25 16:23:13 2009 rodrigoc 3:15669 rodrigocpc Mon May 25 16:37:52 2009 marcia 1:18545 marciapc Mon May 25 16:08:44 2009 troca 2:11445 suportepc Mon May 25 16:08:47 2009 amarildo 2:11541 amarildomaq Mon May 25 16:08:53 2009 troca 0:14708 leocampospc Mon May 25 16:08:25 2009 troca 3:9593 sergiopc Mon May 25 16:26:58 2009 util 0:15361 cristiano02pc Mon May 25 16:08:44 2009 IPC$ 1:9237 rodrigobpc Mon May 25 17:48:35 2009 Locked files: Pid Uid DenyMode Access R/W Oplock SharePath Name Time -------------------------------------------------------------------------------------------------- 1:22698 1688 DENY_NONE 0x100081 RDONLY NONE /home/vanessa . Mon May 25 16:23:14 2009 1:22698 1688 DENY_NONE 0x100081 RDONLY NONE /home/vanessa . Mon May 25 16:24:35 2009 1:2949 1551 DENY_NONE 0x20089 RDONLY EXCLUSIVE+BATCH /home/canuto casa/bloco.dwg Mon May 25 15:45:36 2009 3:6308 581 DENY_NONE 0x100081 RDONLY NONE /geral/troca fmoraes Mon May 25 16:21:38 2009 3:31736 1026 DENY_NONE 0x100081 RDONLY NONE /geral/troca alisson Mon May 25 17:09:33 2009 2:32187 1551 DENY_NONE 0x100081 RDONLY NONE /geral/troca canuto Mon May 25 15:40:44 2009 3:32555 1707 DENY_NONE 0x100081 RDONLY NONE /home/joaquim . Mon May 25 18:26:08 2009 3:32555 1707 DENY_NONE 0x100081 RDONLY NONE /home/joaquim . Mon May 25 18:26:08 2009 1:2949 1551 DENY_WRITE 0x20089 RDONLY EXCLUSIVE+BATCH /geral/troca canuto/meu.dgn Mon May 25 15:45:36 2009 No locked files [root@aramis ~]# tail -1 /var/log/log.ctdb 2009/05/25 18:52:19.146643 [ 7948]: server/ctdb_traverse.c:231 Traverse all timeout on database:locking.tdb
I doubt that smbstatus triggers the hanging smbd. It rather uncovers the existing problem, because smbstatus tries to walk the locks and can't pass the lock held by the hanging smbd. That smbd seems to try to perform an operation that fails under the lock. Michael
Created attachment 345533 [details] strace, sysrq-t, gfs2_tool lockdumps Here is the data into the package: On one node: # strace -tt -s 256 -v -o /tmp/strace-$(uname -n).out smbstatus Where smbstatus is the command I've been using to cause the hang. Now while it is in the hung state, I do this from every node: # echo 't' > /proc/sysrq-trigger # gfs2_tool lockdump <mountpoint> > /tmp/lockdump-<mountpoint>-$(uname -n).out # echo 't' > /proc/sysrq-trigger So is attached: - strace*.out - /var/log/messages - lockdump-geral-*.out If there is anything more to help you, please let me know.
Created attachment 345534 [details] output of net conf list = samba conf
Ok, I've noticed that using that config attached for samba services the GFS2 distributed lock wasn't working correctly, as you can see on this paste: [root@athos samba-3.3.4]# smbstatus | grep cap-04 3:3863 1522 DENY_WRITE 0x2019f RDWR NONE /geral/troca Backup-Recuperados/juliana/cap-04.odt Wed May 27 12:32:46 2009 0:27752 1442 DENY_WRITE 0x2019f RDWR NONE /geral/troca Backup-Recuperados/juliana/cap-04.odt Wed May 27 12:32:09 2009 I'm with 2 files opened on two different nodes and both have RDWR oplock. Samba community advise me to use, on smb.conf: fileid:algorithm = fsname vfs objects = fileid This solves THAT (oplocks, dlm) problem. I don't know if this is relative for GFS2 and if it is going to be the solution of the problem, but I tought it could help.
Again, samba community guide me to disable flock() calls for smbd (smbd/open.c) and recompile it. Until this moment is everything working as expected, without zombie process. But the system is working a few hours. This seems to be a WORKAROUND for the issue. I'll wait some days to post it as workaround.
I don't think I understand the issue from the above reports. Can you state clearly what the problem is? There is nothing in the strace attached to indicate any issue with flock that I can see. I didn't think that smb used flock anyway.
Hi Steve, Well, the problem is that I get a zombie smbd process after a few hours using Samba+CTDB with GFS2. I can't reproduce the problem now, once I've patched smbd/open.c file to test the workaround proposed by a member of samba team, IIRC the flock() call on open.c is just for use with GPFS and will not hurt have it disabled when using another filesystem. A filtered output of "echo 't' > /proc/sysrq-trigger" while a zombie smbd process was running can be view at this paste: http://pastebin.ca/1436324 If you need more info, just let me know and I'll try to provide it. As the servers are in production use I can't play much with it but I'll try to create an equal situation to reproduce this situation.
(In reply to comment #6) > I don't think I understand the issue from the above reports. Can you state > clearly what the problem is? There is nothing in the strace attached to > indicate any issue with flock that I can see. I didn't think that smb used > flock anyway. Samba normally does not use flock(), if I recall correctly this flock is a GPFS specific optimization that have do do with share modes. It's not strictly required, but it seem to show we have a problem with gfs2 if a process get stuck in D state when using it.
Hi folks, as promised I'm sending that patch to comment flock() call into smbd/open.c for samba-3.3.4 version. Follow as plain text here, and I'll attach it too. diff -Nur samba-3.3.4.orig/source/smbd/open.c samba-3.3.4.noflock/source/smbd/open.c --- samba-3.3.4.orig/source/smbd/open.c 2009-05-28 14:21:55.000000000 -0300 +++ samba-3.3.4.noflock/source/smbd/open.c 2009-05-27 11:55:59.000000000 -0300 @@ -2005,7 +2005,7 @@ locking database for permission to set this deny mode. If the kernel refuses the operations then the kernel is wrong. note that GPFS supports it as well - jmcd */ - +/* if (fsp->fh->fd != -1) { ret_flock = SMB_VFS_KERNEL_FLOCK(fsp, share_access); if(ret_flock == -1 ){ @@ -2016,7 +2016,7 @@ return NT_STATUS_SHARING_VIOLATION; } } - +*/ /* * At this point onwards, we can guarentee that the share entry * is locked, whether we created the file or not, and that the
Created attachment 345954 [details] This is a workaround for the problem described here. As mentioned before, this is not a solution for the bug, this is a Workaround to use samba-3.3.4+ctdb with GFS2 filesystem. What the patch does is just comment a flock() call for smbd when opening files.
Created attachment 346051 [details] Patch to return error on LOCK_MAND Browsing through the smbd code, I noticed that it uses LOCK_MAND with LOCK_READ, LOCK_WRITE or LOCK_RW instead of the LOCK_SH and LOCK_EX that gfs2 recognizes. GFS2's locking is undefined (probably erroneous) with LOCK_MAND+friends and this could be the reason why smdb hangs. According to Steve Whitehouse, this should've been caught by a condition check on the setgid (S_ISGID) bit on the inode in question, as a LOCK_MAND should not be issued against an inode that doesn't have this bit set. This patch removes the check against S_ISGID bit of the inode->i_mode and directly checks for LOCK_MAND in fl->fl_type. Could you verify that this is indeed the problem? If so, this patch should gracefully return an error code instead of crashing smbd. Thanks! --Abhi
Hi Abhi, OK. I'll test this changes and post results. --Flavio
The patch for this is upstream. I think we should go ahead and post this for 5.5 and if we get some positive feedback about it, we can also dup for 5.4 (maybe .z depending on timing).
Hi, I'm using this patch without any problem. As we've (Abhi and me) talked with 'vl' from Samba team, samba code doesn't any verification about -EOPNOTSUPP at flock() call and so this works as that samba workaround. Since 'vl' could think this is a "bug" by samba, probably we need to know how samba will treat with this on newer versions. As I said on IRC, I'd needed to remove my users from system, was really unstable. So, now I've no more than 10 users doing tests on it. Even with none of this patches/changes, 10 users doesn't seem to be a problem. I don't know how to test this setup now, I can't put all my users to work on it, maybe is possible to write some benchmark/testing tool to simulate users access. I'll be glad to use it on my testing setup.
Posted patch in comment #11 to rhkernel-list for inclusion in RHEL 5.5
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-169.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~ RHEL 5.5 Beta has been released! There should be a fix present in this release that addresses your request. Please test and report back results here, by March 3rd 2010 (2010-03-03) or sooner. Upon successful verification of this request, post your results and update the Verified field in Bugzilla with the appropriate value. If you encounter any issues while testing, please describe them and set this bug into NEED_INFO. If you encounter new defects or have additional patch(es) to request for inclusion, please clone this bug per each request and escalate through your support representative.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days