GlusterFS does not work well with MS Office 2010 and Samba "posix locking = yes". Despite the documented instructions to the contrary: Gluster 3.2: Adding the CIFS Protocol http://gluster.org/community/documentation/index.php/Gluster_3.2:_Adding_the_CIFS_Protocol The Samba CIFS "posix locking" option cannot be set to "yes" when GlusterFS is used. The problem can be seen when attempting to open a large (e.g. 25 MB) .doc files stored on an GlusterFS NAS volume using Samba CIFS. It cannot be opened using MS Office Word 2010 from Windows 2008 R2, or Windows 7. When it fails, the MS Office Word 2010 app hangs for a long time at 0% downloaded. When it fails, the error code 0x700468 is shown. The failure occurs with multiple client machines. Opening the same .doc file using a Windows 2000 or Windows XP client and MS Office Word 2003 works fine though. Opening the same doc file using a Windows 7 client and MS Office Word 2010 with the file on a Windows 2000 file server works fine. Opening the same doc file using a Windows 7 client and MS Office Word 2010 with the Samba bypassing GlusterFS and accessing the brick storage directly works fine. Only when GlusterFS is in the loop does the problem occur. A Gluster user has documented Samba settings that do work for this case: [Gluster-users] Using Samba and MSOffice files http://gluster.org/pipermail/gluster-users/2011-November/009127.html posix locking = no directory mask = 2770 force directory mode = 2770 create mask = 2770 force create mode = 2770 force security mode = 2770 force directory security mode = 2770 force group = <your group>
Can you please check with 3.3.0 release and see if its fixed?
Can you point me to the change list, or where in the source the fix was implemented so that, when I have time, I can add logging to make sure that the code path with the fix was executed? Thanks.
Jeff, This bug has not been recreated locally and neither do we have the logs when you faced the issue. So we haven't root-caused the issue to any 'bug' yet. We would like to know if you continue to see the problem in 3.3.0 version and if so we could spend efforts on understanding what is causing the issue now (given that a few fixes in locks xlator have made it into the code since 3.2.0 release). Thanks
I use RHEL 6.4 x86_64 with glusterfs-fuse-3.3.0-1.el6.x86_64 glusterfs-server-3.3.0-1.el6.x86_64 glusterfs-3.3.0-1.el6.x86_64 samba-common-3.5.10-125.el6.x86_64 samba-3.5.10-125.el6.x86_64 samba-winbind-clients-3.5.10-125.el6.x86_64 and ext4 FS as bricks in a distributed replicated volume. There is the same problem accessing Office 2010 Files through Samba on a Gluster Volume. Here are some log extracts. 1st The Samba Log Part [2012/08/30 16:15:31.969035, 10] locking/locking.c:260(do_lock) do_lock: lock flavour WINDOWS_LOCK lock type WRITE start=2147483539 len=1 blocking_lock=true requested for fnum 20081 file lpar_report_20120809.xls [2012/08/30 16:15:31.969072, 10] lib/dbwrap_tdb.c:100(db_tdb_fetch_locked) Locking key 13000000000000000145 [2012/08/30 16:15:31.969117, 10] lib/dbwrap_tdb.c:129(db_tdb_fetch_locked) Allocated locked data 0x0x7fe8b46e29a0 [2012/08/30 16:15:31.969133, 10] locking/brlock.c:1896(brl_get_locks_internal) brl_get_locks_internal: 1 current locks on file_id 13:8f042d8895574501:0 [2012/08/30 16:15:31.969165, 10] locking/brlock.c:48(print_lock_struct) [0]: smbpid = 65279, tid = 1, pid = 2803, start = 2147483539, size = 1, fnum = 20081, PENDING_WRITE WINDOWS_LOCK [2012/08/30 16:15:31.969197, 5] locking/posix.c:968(set_posix_lock_windows_flavour) set_posix_lock_windows_flavour: File lpar_report_20120809.xls, offset = 2147483539, count = 1, type = WRITE [2012/08/30 16:15:31.969224, 10] locking/posix.c:172(posix_lock_in_range) posix_lock_in_range: offset_out = 2147483539, count_out = 1 [2012/08/30 16:15:31.969245, 10] locking/posix.c:739(posix_lock_list) posix_lock_list: curr: start=2147483539,size=1 [2012/08/30 16:15:31.969261, 5] locking/posix.c:1046(set_posix_lock_windows_flavour) set_posix_lock_windows_flavour: Real lock: Type = WRITE: offset = 2147483539, count = 1 [2012/08/30 16:15:31.969279, 8] locking/posix.c:197(posix_fcntl_lock) posix_fcntl_lock 26 6 2147483539 1 1 [2012/08/30 16:15:31.969296, 8] ../lib/util/util.c:217(fcntl_lock) fcntl_lock 26 6 2147483539 1 1 [2012/08/30 16:15:31.970195, 3] ../lib/util/util.c:228(fcntl_lock) fcntl_lock: fcntl lock gave errno 11 (Resource temporarily unavailable) [2012/08/30 16:15:31.970235, 3] ../lib/util/util.c:247(fcntl_lock) fcntl_lock: lock failed at offset 2147483539 count 1 op 6 type 1 (Resource temporarily unavailable) [2012/08/30 16:15:31.970259, 8] locking/posix.c:227(posix_fcntl_lock) posix_fcntl_lock: Lock call failed [2012/08/30 16:15:31.970275, 5] locking/posix.c:1051(set_posix_lock_windows_flavour) set_posix_lock_windows_flavour: Lock fail !: Type = WRITE: offset = 2147483539, count = 1. Errno = Resource temporarily unavailable [2012/08/30 16:15:31.970298, 10] locking/locking.c:280(do_lock) do_lock: returning status=NT_STATUS_FILE_LOCK_CONFLICT [2012/08/30 16:15:31.970315, 10] lib/dbwrap_tdb.c:42(db_tdb_record_destr) Unlocking key 13000000000000000145 [2012/08/30 16:15:31.970370, 10] smbd/blocking.c:476(process_lockingX) process_lockingX: only got 0 locks of 3 needed for file lpar_report_20120809.xls, fnum = 20081. Waiting.... [2012/08/30 16:15:31.970394, 10] smbd/blocking.c:123(recalc_brl_timeout) [2012/08/30 16:15:31.970404, 10] smbd/blocking.c:129(recalc_brl_timeout) Next timeout = 4.999991 seconds from now. [2012/08/30 16:15:36.970532, 10] lib/events.c:131(run_events) Running timed event "brl_timeout_fn" 0x7fe8b46679d0 [2012/08/30 16:15:36.970604, 3] smbd/sec_ctx.c:310(set_sec_ctx) setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0 [2012/08/30 16:15:36.970623, 5] auth/token_util.c:525(debug_nt_user_token) NT user token: (NULL) [2012/08/30 16:15:36.970637, 5] auth/token_util.c:551(debug_unix_user_token) UNIX token of user 0 Primary group is 0 and contains 0 supplementary groups [2012/08/30 16:15:36.970681, 5] smbd/uid.c:369(change_to_root_user) change_to_root_user: now uid=(0,0) gid=(0,0) [2012/08/30 16:15:36.970703, 10] smbd/blocking.c:690(process_blocking_lock_queue) Processing BLR = 0x7fe8b46bc190 [2012/08/30 16:15:36.970719, 10] locking/locking.c:260(do_lock) do_lock: lock flavour WINDOWS_LOCK lock type WRITE start=2147483539 len=1 blocking_lock=true requested for fnum 20081 file lpar_report_20120809.xls [2012/08/30 16:15:36.970756, 10] lib/dbwrap_tdb.c:100(db_tdb_fetch_locked) Locking key 13000000000000000145 [2012/08/30 16:15:36.970789, 10] lib/dbwrap_tdb.c:129(db_tdb_fetch_locked) Allocated locked data 0x0x7fe8b46e29a0 2nd The strace from lock translator of the gluster volume /var/log/glusterfs/bricks/export1.log [2012-08-30 16:15:11.926498] I [common.c:323:pl_trace_out] 0-clvolume1-locks: [GRANTED] Locker = {Pid=2803, lk-owner=7c258ce5198bf1ee, Transport=0x1460760, Frame=139} Lockee = {gfid=f093990d-e7c9-4a95-8f04-2d8895574501, fd=0x1444afc, path=/test/lpar_report_20120809.xls} Lock = {lock=FCNTL, cmd=SETLKW, type=READ, start=23596, len=0, pid=2803, lk-owner=7c258ce5198bf1ee} [2012-08-30 16:15:11.944204] I [common.c:271:pl_trace_in] 0-clvolume1-locks: [REQUEST] Locker = {Pid=2803, lk-owner=9c1b8784017e12dd, Transport=0x1460760, Frame=140} Lockee = {gfid=f093990d-e7c9-4a95-8f04-2d8895574501, fd=0x1444afc, path=/test/lpar_report_20120809.xls} Lock = {lock=FCNTL, cmd=SETLK, type=WRITE, start=2147483539, len=1, pid=2803, lk-owner=9c1b8784017e12dd} [2012-08-30 16:15:11.944268] I [common.c:323:pl_trace_out] 0-clvolume1-locks: [TRYAGAIN] Locker = {Pid=2803, lk-owner=9c1b8784017e12dd, Transport=0x1460760, Frame=140} Lockee = {gfid=f093990d-e7c9-4a95-8f04-2d8895574501, fd=0x1444afc, path=/test/lpar_report_20120809.xls} Lock = {lock=FCNTL, cmd=SETLK, type=WRITE, start=2147483539, len=1, pid=2803, lk-owner=9c1b8784017e12dd} [2012-08-30 16:15:11.944268] I [common.c:323:pl_trace_out] 0-clvolume1-locks: [TRYAGAIN] Locker = {Pid=2803, lk-owner=9c1b8784017e12dd, Transport=0x1460760, Frame=140} Lockee = {gfid=f093990d-e7c9-4a95-8f04-2d8895574501, fd=0x1444afc, path=/test/lpar_report_20120809.xls} Lock = {lock=FCNTL, cmd=SETLK, type=WRITE, start=2147483539, len=1, pid=2803, lk-owner=9c1b8784017e12dd} [2012-08-30 16:15:16.950534] I [common.c:271:pl_trace_in] 0-clvolume1-locks: [REQUEST] Locker = {Pid=2803, lk-owner=9c1b8784017e12dd, Transport=0x1460760, Frame=141} Lockee = {gfid=f093990d-e7c9-4a95-8f04-2d8895574501, fd=0x1444afc, path=/test/lpar_report_20120809.xls} Lock = {lock=FCNTL, cmd=SETLK, type=WRITE, start=2147483539, len=1, pid=2803, lk-owner=9c1b8784017e12dd} [2012-08-30 16:15:16.950633] I [common.c:323:pl_trace_out] 0-clvolume1-locks: [TRYAGAIN] Locker = {Pid=2803, lk-owner=9c1b8784017e12dd, Transport=0x1460760, Frame=141} Lockee = {gfid=f093990d-e7c9-4a95-8f04-2d8895574501, fd=0x1444afc, path=/test/lpar_report_20120809.xls} Lock = {lock=FCNTL, cmd=SETLK, type=WRITE, start=2147483539, len=1, pid=2803, lk-owner=9c1b8784017e12dd} [2012-08-30 16:15:16.950633] I [common.c:323:pl_trace_out] 0-clvolume1-locks: [TRYAGAIN] Locker = {Pid=2803, lk-owner=9c1b8784017e12dd, Transport=0x1460760, Frame=141} Lockee = {gfid=f093990d-e7c9-4a95-8f04-2d8895574501, fd=0x1444afc, path=/test/lpar_report_20120809.xls} Lock = {lock=FCNTL, cmd=SETLK, type=WRITE, start=2147483539, len=1, pid=2803, lk-owner=9c1b8784017e12dd} [2012-08-30 16:15:21.956933] I [common.c:271:pl_trace_in] 0-clvolume1-locks: [REQUEST] Locker = {Pid=2803, lk-owner=9c1b8784017e12dd, Transport=0x1460760, Frame=142} Lockee = {gfid=f093990d-e7c9-4a95-8f04-2d8895574501, fd=0x1444afc, path=/test/lpar_report_20120809.xls} Lock = {lock=FCNTL, cmd=SETLK, type=WRITE, start=2147483539, len=1, pid=2803, lk-owner=9c1b8784017e12dd} [2012-08-30 16:15:21.957299] I [common.c:323:pl_trace_out] 0-clvolume1-locks: [TRYAGAIN] Locker = {Pid=2803, lk-owner=9c1b8784017e12dd, Transport=0x1460760, Frame=142} Lockee = {gfid=f093990d-e7c9-4a95-8f04-2d8895574501, fd=0x1444afc, path=/test/lpar_report_20120809.xls} Lock = {lock=FCNTL, cmd=SETLK, type=WRITE, start=2147483539, len=1, pid=2803, lk-owner=9c1b8784017e12dd} [2012-08-30 16:15:21.957299] I [common.c:323:pl_trace_out] 0-clvolume1-locks: [TRYAGAIN] Locker = {Pid=2803, lk-owner=9c1b8784017e12dd, Transport=0x1460760, Frame=142} Lockee = {gfid=f093990d-e7c9-4a95-8f04-2d8895574501, fd=0x1444afc, path=/test/lpar_report_20120809.xls} Lock = {lock=FCNTL, cmd=SETLK, type=WRITE, start=2147483539, len=1, pid=2803, lk-owner=9c1b8784017e12dd} [2012-08-30 16:15:26.963536] I [common.c:271:pl_trace_in] 0-clvolume1-locks: [REQUEST] Locker = {Pid=2803, lk-owner=9c1b8784017e12dd, Transport=0x1460760, Frame=143} Lockee = {gfid=f093990d-e7c9-4a95-8f04-2d8895574501, fd=0x1444afc, path=/test/lpar_report_20120809.xls} Lock = {lock=FCNTL, cmd=SETLK, type=WRITE, start=2147483539, len=1, pid=2803, lk-owner=9c1b8784017e12dd} [2012-08-30 16:15:26.963631] I [common.c:323:pl_trace_out] 0-clvolume1-locks: [TRYAGAIN] Locker = {Pid=2803, lk-owner=9c1b8784017e12dd, Transport=0x1460760, Frame=143} Lockee = {gfid=f093990d-e7c9-4a95-8f04-2d8895574501, fd=0x1444afc, path=/test/lpar_report_20120809.xls} Lock = {lock=FCNTL, cmd=SETLK, type=WRITE, start=2147483539, len=1, pid=2803, lk-owner=9c1b8784017e12dd} [2012-08-30 16:15:31.969699] I [common.c:271:pl_trace_in] 0-clvolume1-locks: [REQUEST] Locker = {Pid=2803, lk-owner=9c1b8784017e12dd, Transport=0x1460760, Frame=144} Lockee = {gfid=f093990d-e7c9-4a95-8f04-2d8895574501, fd=0x1444afc, path=/test/lpar_report_20120809.xls} Lock = {lock=FCNTL, cmd=SETLK, type=WRITE, start=2147483539, len=1, pid=2803, lk-owner=9c1b8784017e12dd} [2012-08-30 16:15:31.970044] I [common.c:323:pl_trace_out] 0-clvolume1-locks: [TRYAGAIN] Locker = {Pid=2803, lk-owner=9c1b8784017e12dd, Transport=0x1460760, Frame=144} Lockee = {gfid=f093990d-e7c9-4a95-8f04-2d8895574501, fd=0x1444afc, path=/test/lpar_report_20120809.xls} Lock = {lock=FCNTL, cmd=SETLK, type=WRITE, start=2147483539, len=1, pid=2803, lk-owner=9c1b8784017e12dd} [2012-08-30 16:15:36.971559] I [common.c:271:pl_trace_in] 0-clvolume1-locks: [REQUEST] Locker = {Pid=2803, lk-owner=9c1b8784017e12dd, Transport=0x1460760, Frame=145} Lockee = {gfid=f093990d-e7c9-4a95-8f04-2d8895574501, fd=0x1444afc, path=/test/lpar_report_20120809.xls} Lock = {lock=FCNTL, cmd=SETLK, type=WRITE, start=2147483539, len=1, pid=2803, lk-owner=9c1b8784017e12dd} [2012-08-30 16:15:36.971689] I [common.c:323:pl_trace_out] 0-clvolume1-locks: [TRYAGAIN] Locker = {Pid=2803, lk-owner=9c1b8784017e12dd, Transport=0x1460760, Frame=145} Lockee = {gfid=f093990d-e7c9-4a95-8f04-2d8895574501, fd=0x1444afc, path=/test/lpar_report_20120809.xls} Lock = {lock=FCNTL, cmd=SETLK, type=WRITE, start=2147483539, len=1, pid=2803, lk-owner=9c1b8784017e12dd} with kind regards Andreas Schmechtig System Developer Telefon +49 (0) 4106 - 704 - 17 44 Telefax +49 (0) 4106 - 704 - 11 99 andreas.schmechtig
Oh, it is RHEL 6.3 and not 6.4. If you need futher logs or any test activity do not hesitate to contact me. This bug forces many of our employees to copy Excel files from the samba share on Gluster to their local Desktop, then editing the file and copy it back to the Samba share. with kind regards Andreas Schmechtig System Developer Telefon +49 (0) 4106 - 704 - 17 44 Telefax +49 (0) 4106 - 704 - 11 99 andreas.schmechtig
Note: Possible theory: READ LOCK: pid = 2803, lkowner = 7c258ce5198bf1ee, fd = 0x1444afc WRITE LOCK: pid = 2803, lkowner = 9c1b8784017e12dd, fd = 0x1444afc One reason why this (same pid, different lkowners) can happen is because the read lock was acquired by flock() [bsd lock] and write lock is attempted via fcntl() [posix lock]. On Linux the two lock spaces are orthogonal and samba is probably expecting the write lock to get granted in the "posix space" without contending with the "bsd space" read lock. However I notice that in our fuse-bridge.c we are ignoring ff->flock field in setlk/getlk and merging the lockspaces of both styles of lock into the same lock space (and hence they are ending up contending because of mismatching lkowners). This needs to be confirmed. Possible fixes: 1. pass the flock flag through till locks xlator and create a new 'locking domain' for flock. 2. add an extra condition in xlators/features/locks/src/common.c:same_owner() - that the owner is same if either lkowners match or pids match, given a common transport - and only insert but not merge the two locks from the different lkowners. This can let us remain agnostic to bsd/posix locking spaces (until we stumble upon that improbable app which holds two fds - one inherited from parent and one opened by it and expect flocks() on them to contend). Personally prefer option 1. :-)
Just noticed - the thing which probably refutes the theory in the previous comment is that the read lock has an offset (23596) but flock() is a full-file lock. Given this, I cannot think of a situation how different lkowners can be set for two different offsetted (therefore flock eliminated) lock requests on the same fd from the same PID (therefore fcntl eliminated). I suggest we start by getting an strace -f of smbd for the session.
I will try do generate a strace -f of smbd for the session (direct access of an excel file) tomorrow.
Created attachment 608548 [details] output of strace smbd
Created attachment 608549 [details] Debug Level 10 Samba Log
Created attachment 608550 [details] Trace output of the lock translator from the gluster volume clvolume1
The output provided seems to be taken with 'strace -p <pid>' without the -f flag (either that, or the wrong smbd was traced). There are no traces of any of the file access.
Hello Anand, excuse me, you are right. I have forgotten to use the -f parameter when tracing the smbd. I will provide a new set of log files from today (2012/09/03). with kind regard Andrea Schmechtig
Created attachment 609273 [details] Gluster volume log
Created attachment 609275 [details] Samba Log
Created attachment 609277 [details] Gluster volume log
Created attachment 609279 [details] Samba strace
Hello Anand, could you check the logs in the last 3 month. Please give us an update on this issue, and when it will be solved. We need a solution by the end of the year. Regards, Hermann
Hermann, We suspect this was due to a locking related issue inside glusterfs, should be fixed by http://review.gluster.org/4164 That patch is only in upstream codebase (3.4.0qa6 contains it). If you can confirm if that fixes the issue, we can backport it to 3.3.x branch too.
Hi, This fix hasn't helped with fcntl locking when running CTDB. There is a tool with the package called "ping_pong", which when ran on two nodes sharing the same GlusterFS file it is supposed to alternatley lock a shared file: https://wiki.samba.org/index.php/Ping_pong http://junkcode.samba.org/ftp/unpacked/junkcode/ping_pong.c This fcntl locking does not work with GlusterFS a via fuse or nfs mounted volume. I have tried the latest 3.4.2 version along with the 3.5.0beta2 and it still doesn't work. Thanks, Rich
Could someone confirm that this fix is in the 3.5.0 beta releases? If so, it still doesn't work ;-(
Ah, The problem I have had with the 3.5.0beta releases is that I'm unable to create a distributed setup... adding the 2nd server into the volume doesn't work at the "peer probe" stage.
The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug. If there has been no update before 9 December 2014, this bug will get automatocally closed.