Bug 1095525

Summary: GlusterFS MacOSX FUSE loop on symlink
Product: [Community] GlusterFS Reporter: dmons
Component: fuseAssignee: Harshavardhana <fharshav>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: bugs, cww, dennis, fharshav, gluster-bugs, jclift
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Mac OS   
Whiteboard:
Fixed In Version: glusterfs-3.6.0beta1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-11-11 08:31:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description dmons 2014-05-08 01:27:50 UTC
Description of problem:
GlusterFS built from git on MacOSX, mounted with OSXFUSE.

On the MacOSX client, when a symlink exists to another directory in the tree, an infinite loop of directories occurs when cd / ls operations are performed.

Version-Release number of selected component (if applicable):
git, built 2014-05-06

How reproducible:
Always

Steps to Reproduce:

Client/FUSE mount side (/prod is our GlusterFS volume mounted)

# cd /prod/backups/glustertest
# mkdir -p a/b/c/d
# mkdir -p 1/2/3/4
# find `pwd`
/prod/backups/glustertest
/prod/backups/glustertest/a
/prod/backups/glustertest/a/b
/prod/backups/glustertest/a/b/c
/prod/backups/glustertest/a/b/c/d
/prod/backups/glustertest/a/b/1
/prod/backups/glustertest/1
/prod/backups/glustertest/1/2
/prod/backups/glustertest/1/2/3
/prod/backups/glustertest/1/2/3/4
# cd a/b
# ls
c
# ln -s ../../1
# ls -al
total 32
drwxr-xr-x@ 3 root wheel 50 8 May 08:52 .
drwxr-xr-x@ 3 root wheel 42 8 May 08:49 ..
lrwxrwxrwx@ 1 root wheel 7 8 May 08:52 1 -> ../../1
drwxr-xr-x@ 3 root wheel 42 8 May 08:49 c
# cd 1
# ls
2
# cd 2 ; pwd ; ls
/prod/backups/glustertest/a/b/1/2/2
2
# cd 2 ; pwd ; ls
/prod/backups/glustertest/a/b/1/2/2/2
2
# cd 2 ; pwd ; ls
/prod/backups/glustertest/a/b/1/2/2/2/2
2
# cd 2 ; pwd ; ls
/prod/backups/glustertest/a/b/1/2/2/2/2/2
2
# cd 2 ; pwd ; ls
/prod/backups/glustertest/a/b/1/2/2/2/2/2/2
2
# find `pwd`
/prod/backups/glustertest/a/b/1/2/2/2/2/2/2
/prod/backups/glustertest/a/b/1/2/2/2/2/2/2/2
/prod/backups/glustertest/a/b/1/2/2/2/2/2/2/2/3
/prod/backups/glustertest/a/b/1/2/2/2/2/2/2/2/3/4
#

Actual results:
As above

Expected results:
No infinite loop on cd/ls, as per Linux/FUSE mount which works as expected.

Additional info:

Mac OSX 10.8.5 with OSXFUSE 2.6.4

# make -v
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
This program built for i386-apple-darwin11.3.0

# gcc -v
Configured with:
--prefix=/Applications/Xcode.app/Contents/Developer/usr
--with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
Target: x86_64-apple-darwin12.5.0
Thread model: posix

Comment 1 Harshavardhana 2014-05-08 21:07:10 UTC
Were the glusterfsd (server) part running on OSX or is it just the client?

Comment 2 dmons 2014-05-08 21:29:33 UTC
Only glusterfs is running on the client (OSX).

The glusterd/glusterfsd server is our production cluster (CentOS6, Gluster 3.4.1GA RPM install from gluster.org).

Comment 3 Harshavardhana 2014-05-08 21:38:49 UTC
I haven't been able to reproduce this with OSX server and OSX client. I guess 3.4.1GA v/s master branch might just not be compatible - few things could have changed.

Let me test it with upstream master on Linux server v/s upstream master on OSX client.

Comment 4 Harshavardhana 2014-05-08 22:05:58 UTC
(In reply to Harshavardhana from comment #3)
> I haven't been able to reproduce this with OSX server and OSX client. I
> guess 3.4.1GA v/s master branch might just not be compatible - few things
> could have changed.
> 
> Let me test it with upstream master on Linux server v/s upstream master on
> OSX client.

Spoke too early, i was able to reproduce this easily. This looks like a symlink issue with OSXFuse. Its not able to handle symlinks properly.

# cat /tmp/inode_.txt 
# /mnt/gfs
total 16
12116464252699970785 drwxr-xr-x@ 3 root  wheel  204 May  8 15:13 1
10468144669891119994 drwxr-xr-x@ 3 root  wheel  204 May  8 15:13 a

# /mnt/gfs/1
total 8
10483121240921989804 drwxr-xr-x@ 3 root  wheel  204 May  8 15:13 2

# /mnt/gfs/a/b/
total 16
11615896904052905179 lrwxrwxrwx@ 0 root  wheel    7 May  8 15:13 1 -> ../../1
12887854228130538625 drwxr-xr-x@ 3 root  wheel  204 May  8 15:13 c

# /mnt/gfs/a/b/1/
total 24
12116464252699970785 drwxr-xr-x@ 3 root  wheel  204 May  8 15:13 .
                   1 drwxr-xr-x@ 5 root  wheel  340 May  8 15:13 ..
10483121240921989804 drwxr-xr-x@ 3 root  wheel  204 May  8 15:13 2

# /mnt/gfs/a/b/1/2
total 24
12116464252699970785 drwxr-xr-x@ 3 root  wheel  204 May  8 15:13 .
                   1 drwxr-xr-x@ 5 root  wheel  340 May  8 15:13 ..
10483121240921989804 drwxr-xr-x@ 3 root  wheel  204 May  8 15:13 2

# /mnt/gfs/a/b/1/2/2
total 24
12116464252699970785 drwxr-xr-x@ 3 root  wheel  204 May  8 15:13 .
                   1 drwxr-xr-x@ 5 root  wheel  340 May  8 15:13 ..
10483121240921989804 drwxr-xr-x@ 3 root  wheel  204 May  8 15:13 2

Comment 5 Harshavardhana 2014-05-08 22:24:24 UTC
bash-3.2# ls -li
total 16
12 lrwxrwxrwx  1 root  wheel     7 May  8 15:21 1 -> ../../1
 8 drwxr-xr-x  1 root  wheel  4096 May  8 15:20 c
bash-3.2# cd 1
ls -bash-3.2# ls -li
total 8
4 drwxr-xr-x  1 root  wheel  4096 May  8 15:19 2
bash-3.2# ls -lia
total 24
3 drwxr-xr-x  1 root  wheel  4096 May  8 15:23 .
2 drwxr-xr-x  1 root  wheel  4096 May  8 15:19 ..
4 drwxr-xr-x  1 root  wheel  4096 May  8 15:19 2
bash-3.2# cd 2
ls -libash-3.2# ls -lia
total 24
4 drwxr-xr-x  1 root  wheel  4096 May  8 15:19 .
3 drwxr-xr-x  1 root  wheel  4096 May  8 15:23 ..
5 drwxr-xr-x  1 root  wheel  4096 May  8 15:20 3
bash-3.2# pwd
/mnt/gfs/test/a/b/1/2
bash-3.2# 

On sshfs similar test works, so it is entirely our fault :-)

Comment 6 Justin Clift 2014-06-25 03:54:25 UTC
This will need to be a blocker for 3.6. ;)

Comment 7 Harshavardhana 2014-06-27 01:08:31 UTC
A "cd <directory>" on OSX

================================================================================================================
[2014-06-27 01:15:30.189899] T [fuse-bridge.c:802:fuse_getattr_resume] 0-glusterfs-fuse: 1: GETATTR 140329739169788 (/1)
[2014-06-27 01:15:30.189985] T [rpc-clnt.c:1381:rpc_clnt_record] 0-dht-client-1: Auth Info: pid: 93507, uid: 0, gid: 0, owner: 0000000000000000
[2014-06-27 01:15:30.190036] T [rpc-clnt.c:1238:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 152, payload: 20, rpc hdr: 132
[2014-06-27 01:15:30.190109] T [rpc-clnt.c:1573:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x34 Program: GlusterFS 3.3, ProgVers: 330, Proc: 1) to rpc-transport (dht-client-1)
[2014-06-27 01:15:30.190162] D [rpc-clnt-ping.c:230:rpc_clnt_start_ping] 0-dht-client-1: ping timeout is 0, returning
[2014-06-27 01:15:30.190220] T [rpc-clnt.c:1381:rpc_clnt_record] 0-dht-client-0: Auth Info: pid: 93507, uid: 0, gid: 0, owner: 0000000000000000
[2014-06-27 01:15:30.190270] T [rpc-clnt.c:1238:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 152, payload: 20, rpc hdr: 132
[2014-06-27 01:15:30.190309] T [rpc-clnt.c:660:rpc_clnt_reply_init] 0-dht-client-1: received rpc message (RPC XID: 0x34 Program: GlusterFS 3.3, ProgVers: 330, Proc: 1) from rpc-transport (dht-client-1)
[2014-06-27 01:15:30.190333] T [rpc-clnt.c:1573:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x36 Program: GlusterFS 3.3, ProgVers: 330, Proc: 1) to rpc-transport (dht-client-0)
[2014-06-27 01:15:30.190432] D [rpc-clnt-ping.c:230:rpc_clnt_start_ping] 0-dht-client-0: ping timeout is 0, returning
[2014-06-27 01:15:30.190524] T [rpc-clnt.c:660:rpc_clnt_reply_init] 0-dht-client-0: received rpc message (RPC XID: 0x36 Program: GlusterFS 3.3, ProgVers: 330, Proc: 1) from rpc-transport (dht-client-0)
[2014-06-27 01:15:30.190615] T [fuse-bridge.c:735:fuse_attr_cbk] 0-glusterfs-fuse: 1: STAT() /1 => 12748640506127241761
[2014-06-27 01:15:30.190669] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 128/128 
[2014-06-27 01:15:30.190711] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 6
[2014-06-27 01:15:30.190771] T [fuse-bridge.c:1403:fuse_readlink_resume] 0-glusterfs-fuse: 0 READLINK /a/b/1/50044947-2662-4bcf-b32d-59afca845fe2
[2014-06-27 01:15:30.190848] T [rpc-clnt.c:1381:rpc_clnt_record] 0-dht-client-0: Auth Info: pid: 93507, uid: 0, gid: 0, owner: 0000000000000000
[2014-06-27 01:15:30.190898] T [rpc-clnt.c:1238:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 156, payload: 24, rpc hdr: 132
[2014-06-27 01:15:30.190961] T [rpc-clnt.c:1573:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x37 Program: GlusterFS 3.3, ProgVers: 330, Proc: 2) to rpc-transport (dht-client-0)
[2014-06-27 01:15:30.191010] D [rpc-clnt-ping.c:230:rpc_clnt_start_ping] 0-dht-client-0: ping timeout is 0, returning
[2014-06-27 01:15:30.191141] T [rpc-clnt.c:660:rpc_clnt_reply_init] 0-dht-client-0: received rpc message (RPC XID: 0x37 Program: GlusterFS 3.3, ProgVers: 330, Proc: 2) from rpc-transport (dht-client-0)
[2014-06-27 01:15:30.191211] T [fuse-bridge.c:1372:fuse_readlink_cbk] 0-glusterfs-fuse: 0: /a/b/1 => ../../1    
[2014-06-27 01:15:30.191265] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 24/24 
[2014-06-27 01:15:30.191311] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 6
[2014-06-27 01:15:30.191367] T [fuse-bridge.c:1403:fuse_readlink_resume] 0-glusterfs-fuse: 1 READLINK /a/b/1/50044947-2662-4bcf-b32d-59afca845fe2    
[2014-06-27 01:15:30.191440] T [rpc-clnt.c:1381:rpc_clnt_record] 0-dht-client-0: Auth Info: pid: 93507, uid: 0, gid: 0, owner: 0000000000000000
[2014-06-27 01:15:30.191490] T [rpc-clnt.c:1238:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 156, payload: 24, rpc hdr: 132
[2014-06-27 01:15:30.191552] T [rpc-clnt.c:1573:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x38 Program: GlusterFS 3.3, ProgVers: 330, Proc: 2) to rpc-transport (dht-client-0)
[2014-06-27 01:15:30.191600] D [rpc-clnt-ping.c:230:rpc_clnt_start_ping] 0-dht-client-0: ping timeout is 0, returning
[2014-06-27 01:15:30.191728] T [rpc-clnt.c:660:rpc_clnt_reply_init] 0-dht-client-0: received rpc message (RPC XID: 0x38 Program: GlusterFS 3.3, ProgVers: 330, Proc: 2) from rpc-transport (dht-client-0)
[2014-06-27 01:15:30.191792] T [fuse-bridge.c:1372:fuse_readlink_cbk] 0-glusterfs-fuse: 1: /a/b/1 => ../../1
[2014-06-27 01:15:30.191844] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 24/24 
================================================================================================================

A "cd <directory>" on Linux
================================================================================================================
[2014-06-23 10:12:21.217148] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 2
[2014-06-23 10:12:21.217160] T [fuse-bridge.c:651:fuse_lookup_resume] 0-glusterfs-fuse: 188: LOOKUP /1(494c43af-a20b-410b-8aca-107c639148c4)
[2014-06-23 10:12:21.217178] T [dht-hashfn.c:97:dht_hash_compute] 0-test-dht: trying regex for 1
[2014-06-23 10:12:21.217193] T [rpc-clnt.c:1360:rpc_clnt_record] 0-test-client-0: Auth Info: pid: 13886, uid: 0, gid: 0, owner: 0000000000000000
[2014-06-23 10:12:21.217201] T [rpc-clnt.c:1217:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 192, payload: 120, rpc hdr: 72
[2014-06-23 10:12:21.217219] T [rpc-clnt.c:1557:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x225x Program: GlusterFS 3.3, ProgVers: 33
0, Proc: 27) to rpc-transport (test-client-0)
[2014-06-23 10:12:21.217328] T [rpc-clnt.c:671:rpc_clnt_reply_init] 0-test-client-0: received rpc message (RPC XID: 0x225x Program: GlusterFS 3.
3, ProgVers: 330, Proc: 27) from rpc-transport (test-client-0)
[2014-06-23 10:12:21.217347] T [io-cache.c:224:ioc_lookup_cbk] 0-test-io-cache: locked inode(0x1024e70)
[2014-06-23 10:12:21.217354] T [io-cache.c:233:ioc_lookup_cbk] 0-test-io-cache: unlocked inode(0x1024e70)
[2014-06-23 10:12:21.217359] T [io-cache.c:242:ioc_lookup_cbk] 0-test-io-cache: locked table(0xfd3dc0)
[2014-06-23 10:12:21.217364] T [io-cache.c:247:ioc_lookup_cbk] 0-test-io-cache: unlocked table(0xfd3dc0)
[2014-06-23 10:12:00.378266] T [io-cache.c:224:ioc_lookup_cbk] 0-test-io-cache: locked inode(0x1024e70)  
[2014-06-23 10:12:00.378273] T [io-cache.c:233:ioc_lookup_cbk] 0-test-io-cache: unlocked inode(0x1024e70)
[2014-06-23 10:12:00.378278] T [io-cache.c:242:ioc_lookup_cbk] 0-test-io-cache: locked table(0xfd3dc0)
[2014-06-23 10:12:00.378283] T [io-cache.c:247:ioc_lookup_cbk] 0-test-io-cache: unlocked table(0xfd3dc0)
[2014-06-23 10:12:00.378296] T [fuse-bridge.c:517:fuse_entry_cbk] 0-glusterfs-fuse: 179: LOOKUP() /1 => -8445920024747489084
[2014-06-23 10:12:00.378325] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 4
[2014-06-23 10:12:00.378336] T [fuse-bridge.c:651:fuse_lookup_resume] 0-glusterfs-fuse: 180: LOOKUP /1/2(2365d505-6e92-4f31-899b-da4c9c144542)
[2014-06-23 10:12:00.378353] T [dht-hashfn.c:97:dht_hash_compute] 0-test-dht: trying regex for 2
[2014-06-23 10:12:00.378368] T [rpc-clnt.c:1360:rpc_clnt_record] 0-test-client-0: Auth Info: pid: 13878, uid: 0, gid: 0, owner: 0000000000000000
[2014-06-23 10:12:00.378376] T [rpc-clnt.c:1217:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 192, payload: 120, rpc hdr: 72
[2014-06-23 10:12:00.378394] T [rpc-clnt.c:1557:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x217x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (test-client-0)
[2014-06-23 10:12:00.378504] T [rpc-clnt.c:671:rpc_clnt_reply_init] 0-test-client-0: received rpc message (RPC XID: 0x217x Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport (test-client-0)
[2014-06-23 10:12:00.378523] T [io-cache.c:224:ioc_lookup_cbk] 0-test-io-cache: locked inode(0x1025140)
[2014-06-23 10:12:00.378529] T [io-cache.c:233:ioc_lookup_cbk] 0-test-io-cache: unlocked inode(0x1025140)
[2014-06-23 10:12:00.378534] T [io-cache.c:242:ioc_lookup_cbk] 0-test-io-cache: locked table(0xfd3dc0)
[2014-06-23 10:12:00.378539] T [io-cache.c:247:ioc_lookup_cbk] 0-test-io-cache: unlocked table(0xfd3dc0)
[2014-06-23 10:12:00.378552] T [fuse-bridge.c:517:fuse_entry_cbk] 0-glusterfs-fuse: 180: LOOKUP() /1/2 => -8530985046552328894
[2014-06-23 10:12:00.379568] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 6
[2014-06-23 10:12:00.379591] T [fuse-bridge.c:1802:fuse_readlink_resume] 0-glusterfs-fuse: 181 READLINK /a/b/1/30df5070-02a0-44d8-a244-bf8bfa1dd465
[2014-06-23 10:12:00.379613] T [rpc-clnt.c:1360:rpc_clnt_record] 0-test-client-0: Auth Info: pid: 13878, uid: 0, gid: 0, owner: 0000000000000000
[2014-06-23 10:12:00.379623] T [rpc-clnt.c:1217:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 96, payload: 24, rpc hdr: 72
[2014-06-23 10:12:00.379660] T [rpc-clnt.c:1557:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x218x Program: GlusterFS 3.3, ProgVers: 330, Proc: 2) to rpc-transport (test-client-0)
[2014-06-23 10:12:00.379779] T [rpc-clnt.c:671:rpc_clnt_reply_init] 0-test-client-0: received rpc message (RPC XID: 0x218x Program: GlusterFS 3.3, ProgVers: 330, Proc: 2) from rpc-transport (test-client-0)
[2014-06-23 10:12:00.379807] T [fuse-bridge.c:1771:fuse_readlink_cbk] 0-glusterfs-fuse: 181: /a/b/1 => ../../1
[2014-06-23 10:12:00.380580] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 6
[2014-06-23 10:12:00.380597] T [fuse-bridge.c:1802:fuse_readlink_resume] 0-glusterfs-fuse: 182 READLINK /a/b/1/30df5070-02a0-44d8-a244-bf8bfa1dd465
[2014-06-23 10:12:00.380617] T [rpc-clnt.c:1360:rpc_clnt_record] 0-test-client-0: Auth Info: pid: 13878, uid: 0, gid: 0, owner: 0000000000000000
[2014-06-23 10:12:00.380625] T [rpc-clnt.c:1217:rpc_clnt_record_build_header] 0-rpc-clnt: Request fraglen 96, payload: 24, rpc hdr: 72
[2014-06-23 10:12:00.380658] T [rpc-clnt.c:1557:rpc_clnt_submit] 0-rpc-clnt: submitted request (XID: 0x219x Program: GlusterFS 3.3, ProgVers: 330, Proc: 2) to rpc-transport (test-client-0)
[2014-06-23 10:12:00.380758] T [rpc-clnt.c:671:rpc_clnt_reply_init] 0-test-client-0: received rpc message (RPC XID: 0x219x Program: GlusterFS 3.3, ProgVers: 330, Proc: 2) from rpc-transport (test-client-0)
[2014-06-23 10:12:00.380779] T [fuse-bridge.c:1771:fuse_readlink_cbk] 0-glusterfs-fuse: 182: /a/b/1 => ../../1
================================================================================================================

In-fact 

# ls 2 
3 

Works! on OSX - since 'ls' initiates a READDIR operation it succeeds, so is 'ls -lR' which would initiate an OPENDIR operation. 

cd - in-fact on OSX is not generating a LOOKUP(), will be verifying more for a way to get it fixed. Atleast we have a clue now

Comment 8 Harshavardhana 2014-07-02 01:05:34 UTC
Tested with only posix.vol

# cat posix.vol 
volume repl-posix
    type storage/posix
    option volume-id c4a8a4c7-dd14-41d6-997d-56e148d45b60
    option directory /tmp/posix
end-volume

# /usr/local/sbin/glusterfs -f posix.vol -L TRACE /mnt

Re did the same tests - with same result

[2014-07-02 01:24:58.050155] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 2    
[2014-07-02 01:24:58.050201] T [fuse-bridge.c:802:fuse_getattr_resume] 0-glusterfs-fuse: 0: GETATTR 140400748704668 (/1)
[2014-07-02 01:24:58.050271] T [fuse-bridge.c:735:fuse_attr_cbk] 0-glusterfs-fuse: 0: STAT() /1 => 11980375756493645147
[2014-07-02 01:24:58.050332] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 128/128 
[2014-07-02 01:24:58.050396] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 6
[2014-07-02 01:24:58.050444] T [fuse-bridge.c:1403:fuse_readlink_resume] 0-glusterfs-fuse: 1 READLINK /a/b/1/0079f739-385c-47bb-90be-96dfd083758a
[2014-07-02 01:24:58.050517] T [fuse-bridge.c:1372:fuse_readlink_cbk] 0-glusterfs-fuse: 1: /a/b/1 => ../../1
[2014-07-02 01:24:58.050568] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 24/24 
[2014-07-02 01:24:58.050631] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 6
[2014-07-02 01:24:58.050680] T [fuse-bridge.c:1403:fuse_readlink_resume] 0-glusterfs-fuse: 0 READLINK /a/b/1/0079f739-385c-47bb-90be-96dfd083758a
[2014-07-02 01:24:58.050753] T [fuse-bridge.c:1372:fuse_readlink_cbk] 0-glusterfs-fuse: 0: /a/b/1 => ../../1
[2014-07-02 01:24:58.050803] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 24/24 
[2014-07-02 01:24:59.632849] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 2
[2014-07-02 01:24:59.632941] T [fuse-bridge.c:802:fuse_getattr_resume] 0-glusterfs-fuse: 1: GETATTR 140400748704668 (/1)
[2014-07-02 01:24:59.633055] T [fuse-bridge.c:735:fuse_attr_cbk] 0-glusterfs-fuse: 1: STAT() /1 => 11980375756493645147
[2014-07-02 01:24:59.633112] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 128/128 
[2014-07-02 01:24:59.633186] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 2
[2014-07-02 01:24:59.633239] T [fuse-bridge.c:2496:fuse_opendir_resume] 0-glusterfs-fuse: 0: OPENDIR /1
[2014-07-02 01:24:59.633324] T [fuse-bridge.c:955:fuse_fd_cbk] 0-glusterfs-fuse: 0: OPENDIR() /1 => 0x7fb198c0c53c
[2014-07-02 01:24:59.633377] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 32/32 
[2014-07-02 01:24:59.633443] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 1
[2014-07-02 01:24:59.633490] T [fuse-bridge.c:3004:fuse_statfs_resume] 0-glusterfs-fuse: 1: STATFS
[2014-07-02 01:24:59.633566] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 96/96 
[2014-07-02 01:24:59.633629] T [fuse-bridge.c:2638:fuse_readdir_resume] 0-glusterfs-fuse: 0: READDIR (0x7fb198c0c53c, size=1536, offset=0)    
[2014-07-02 01:24:59.633749] T [fuse-bridge.c:2582:fuse_readdir_cbk] 0-glusterfs-fuse: 0: READDIR => 3/1536,0
[2014-07-02 01:24:59.633803] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 112/112 
[2014-07-02 01:24:59.633867] T [fuse-bridge.c:2638:fuse_readdir_resume] 0-glusterfs-fuse: 1: READDIR (0x7fb198c0c53c, size=1536, offset=3)
[2014-07-02 01:24:59.633962] T [fuse-bridge.c:2582:fuse_readdir_cbk] 0-glusterfs-fuse: 1: READDIR => 0/1536,3
[2014-07-02 01:24:59.634013] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 16/16 
[2014-07-02 01:24:59.634084] T [fuse-bridge.c:2875:fuse_releasedir] 0-glusterfs-fuse: 0: RELEASEDIR 0x7fb198c0c53c    
[2014-07-02 01:24:59.634157] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 16/16 
[2014-07-02 01:24:59.634227] T [posix-helpers.c:1146:posix_janitor_thread_proc] 0-repl-posix: janitor: closing dir fd=0x7fb198c0d090

Very much related to the interaction with osxfuse v/s glusterfs implementation, i will be digging more into OSXFUSE code to see how is the behavior implemented.

Comment 9 Harshavardhana 2014-07-04 19:23:56 UTC
It would seem like LOOKUP() is never sent on a symlink using OSXFUSE after some good amount of gdb breakpoints and searching, which is quite strange and i need to see how the higher level fuse libraries are implemented in OSXFUSE to handle symlinks eg: libosxfuse.dylib

Comment 10 Justin Clift 2014-07-04 20:04:21 UTC
The OSXFUSE Google Group may also be useful later on too:

  https://groups.google.com/forum/#!forum/osxfuse-group

Comment 11 Harshavardhana 2014-07-12 08:37:53 UTC
Without the change
===================
[2014-07-12 08:41:29.815370] T [fuse-bridge.c:1403:fuse_readlink_resume] 0-glusterfs-fuse: 1 READLINK /a/b/1/9526d413-96ff-4ab8-a36a-2d718d01ea23
[2014-07-12 08:41:29.815473] T [fuse-bridge.c:1372:fuse_readlink_cbk] 0-glusterfs-fuse: 1: /a/b/1 => ../../1
[2014-07-12 08:41:29.815549] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 24/24 
[2014-07-12 08:41:29.815640] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 6
[2014-07-12 08:41:29.815715] T [fuse-bridge.c:1403:fuse_readlink_resume] 0-glusterfs-fuse: 0 READLINK /a/b/1/9526d413-96ff-4ab8-a36a-2d718d01ea23
[2014-07-12 08:41:29.815820] T [fuse-bridge.c:1372:fuse_readlink_cbk] 0-glusterfs-fuse: 0: /a/b/1 => ../../1
[2014-07-12 08:41:29.815897] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 24/24 
[2014-07-12 08:41:29.815988] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 6
[2014-07-12 08:41:29.816064] T [fuse-bridge.c:1403:fuse_readlink_resume] 0-glusterfs-fuse: 1 READLINK /a/b/1/9526d413-96ff-4ab8-a36a-2d718d01ea23
[2014-07-12 08:41:29.816165] T [fuse-bridge.c:1372:fuse_readlink_cbk] 0-glusterfs-fuse: 1: /a/b/1 => ../../1
[2014-07-12 08:41:29.816250] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 24/24 

With the change
================
[2014-07-12 08:43:22.010481] T [fuse-bridge.c:1400:fuse_readlink_resume] 0-glusterfs-fuse: 1 READLINK /a/b/1/9526d413-96ff-4ab8-a36a-2d718d01ea23
[2014-07-12 08:43:22.010580] T [fuse-bridge.c:1370:fuse_readlink_cbk] 0-glusterfs-fuse: 1: /a/b/1 => ../../1 (size:7)
[2014-07-12 08:43:22.010654] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 23/23
[2014-07-12 08:43:48.410167] T [fuse-bridge.c:735:fuse_attr_cbk] 0-glusterfs-fuse: 0: LOOKUP() / => 1
[2014-07-12 08:43:48.410361] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 128/128 
[2014-07-12 08:43:48.410926] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 2
[2014-07-12 08:43:48.411068] T [fuse-bridge.c:802:fuse_getattr_resume] 0-glusterfs-fuse: 1: GETATTR 140440227558348 (/a)
[2014-07-12 08:43:48.411179] T [fuse-bridge.c:735:fuse_attr_cbk] 0-glusterfs-fuse: 1: STAT() /a => 13439084044268099284
[2014-07-12 08:43:48.411258] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 128/128 
[2014-07-12 08:43:48.411999] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 4
[2014-07-12 08:43:48.412080] T [fuse-bridge.c:802:fuse_getattr_resume] 0-glusterfs-fuse: 0: GETATTR 140440227559564 (/a/b)
[2014-07-12 08:43:48.412179] T [fuse-bridge.c:735:fuse_attr_cbk] 0-glusterfs-fuse: 0: STAT() /a/b => 13229400354144744016
[2014-07-12 08:43:48.412255] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 128/128 
[2014-07-12 08:43:48.412389] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 6
[2014-07-12 08:43:48.412464] T [fuse-bridge.c:1400:fuse_readlink_resume] 0-glusterfs-fuse: 1 READLINK /a/b/1/9526d413-96ff-4ab8-a36a-2d718d01ea23
[2014-07-12 08:43:48.412566] T [fuse-bridge.c:1370:fuse_readlink_cbk] 0-glusterfs-fuse: 1: /a/b/1 => ../../1 (size:7)
[2014-07-12 08:43:48.412640] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 23/23 
[2014-07-12 08:43:48.412744] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 2
[2014-07-12 08:43:48.412815] T [fuse-bridge.c:802:fuse_getattr_resume] 0-glusterfs-fuse: 0: GETATTR 140440227563820 (/1)
[2014-07-12 08:43:48.412911] T [fuse-bridge.c:735:fuse_attr_cbk] 0-glusterfs-fuse: 0: STAT() /1 => 11186291260020135826
[2014-07-12 08:43:48.412986] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 128/128 
[2014-07-12 08:43:48.413099] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 6
[2014-07-12 08:43:48.413173] T [fuse-bridge.c:1400:fuse_readlink_resume] 0-glusterfs-fuse: 1 READLINK /a/b/1/9526d413-96ff-4ab8-a36a-2d718d01ea23
[2014-07-12 08:43:48.413272] T [fuse-bridge.c:1370:fuse_readlink_cbk] 0-glusterfs-fuse: 1: /a/b/1 => ../../1 (size:7)
[2014-07-12 08:43:48.413346] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 23/23 
[2014-07-12 08:43:48.413448] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 4
[2014-07-12 08:43:48.413520] T [fuse-bridge.c:802:fuse_getattr_resume] 0-glusterfs-fuse: 0: GETATTR 140440227564380 (/1/2)
[2014-07-12 08:43:48.413617] T [fuse-bridge.c:735:fuse_attr_cbk] 0-glusterfs-fuse: 0: STAT() /1/2 => 10750174612723609863
[2014-07-12 08:43:48.413691] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 128/128 
[2014-07-12 08:43:48.413797] T [fuse-resolve.c:53:fuse_resolve_loc_touchup] 0-fuse: return value inode_path 6
[2014-07-12 08:43:48.413871] T [fuse-bridge.c:1400:fuse_readlink_resume] 0-glusterfs-fuse: 1 READLINK /a/b/1/9526d413-96ff-4ab8-a36a-2d718d01ea23
[2014-07-12 08:43:48.413969] T [fuse-bridge.c:1370:fuse_readlink_cbk] 0-glusterfs-fuse: 1: /a/b/1 => ../../1 (size:7)
[2014-07-12 08:43:48.414043] T [fuse-bridge.c:193:send_fuse_iov] 0-glusterfs-fuse: writev() result 23/23

Following change helps us fix this issue 

$ git diff
diff --git a/xlators/mount/fuse/src/fuse-bridge.c b/xlators/mount/fuse/src/fuse-bridge.c
index 473911a..ae2f4b5 100644
--- a/xlators/mount/fuse/src/fuse-bridge.c
+++ b/xlators/mount/fuse/src/fuse-bridge.c
@@ -1365,13 +1365,11 @@ fuse_readlink_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
                      uuid_utoa (state->loc.gfid));
 
         if (op_ret > 0) {
-                ((char *)linkname)[op_ret] = '\0';
-
                 gf_log ("glusterfs-fuse", GF_LOG_TRACE,
-                        "%"PRIu64": %s => %s", frame->root->unique,
-                        state->loc.path, linkname);
+                        "%"PRIu64": %s => %s (size:%d)", frame->root->unique,
+                        state->loc.path, linkname, op_ret);
 
-                send_fuse_data (this, finh, (void *)linkname, op_ret + 1);
+                send_fuse_data (this, finh, (void *)linkname, op_ret);


This has been a Day 1 bug in readlink_cbk implementation where a linkname() returned in readlink() is being null terminated - this used to be the norm when GlusterFS was using libfuse - after using lower level calls into /dev/fuse this doesn't seem to be a requirement. 

ON linux perhaps there are safe-guards against this but on OSX fuse it populates wrong values internally due to perhaps an overlow() - since we ioctl() into the FUSE device it doesn't reply back with a valid error in this scenario - this means that OSX FUSE in-fact in this scenario doesn't validate. 

But with the above fix it works since there is no necessity to NULL terminate the linkname anymore.

Comment 12 Anand Avati 2014-07-12 09:46:29 UTC
REVIEW: http://review.gluster.org/8300 (fuse: fuse_readlink_cbk() - linkname NULL termination unnecessary) posted (#1) for review on master by Harshavardhana (harsha)

Comment 13 Anand Avati 2014-07-16 22:32:23 UTC
REVIEW: http://review.gluster.org/8300 (fuse: fuse_readlink_cbk() - linkname NULL termination unnecessary) posted (#2) for review on master by Harshavardhana (harsha)

Comment 14 Harshavardhana 2014-07-17 07:28:43 UTC
Same issue affects FreeBSD too :-)

Comment 15 Anand Avati 2014-07-17 20:04:38 UTC
REVIEW: http://review.gluster.org/8300 (fuse: fuse_readlink_cbk() - linkname NULL termination unnecessary) posted (#3) for review on master by Harshavardhana (harsha)

Comment 16 Anand Avati 2014-07-19 01:37:16 UTC
COMMIT: http://review.gluster.org/8300 committed in master by Anand Avati (avati) 
------
commit d30c3dcf1137a0a94c0be0d0c57de86aadad2a33
Author: Harshavardhana <harsha>
Date:   Sat Jul 12 02:15:14 2014 -0700

    fuse: fuse_readlink_cbk() - linkname NULL termination unnecessary
    
    op_ret incremented to compensate for NULL terminating character
    leads to self referential loop where OSXFUSE which would reply
    on a same READLINK() over and over again
    
    ~~~
    [2014-07-12 08:41:29.815473]
       T [fuse-bridge.c:1372:fuse_readlink_cbk] 0-glusterfs-fuse:
                                                  1: /a/b/1 => ../../1
    [2014-07-12 08:41:29.815820]
       T [fuse-bridge.c:1372:fuse_readlink_cbk] 0-glusterfs-fuse:
                                                  0: /a/b/1 => ../../1
    [2014-07-12 08:41:29.816165]
       T [fuse-bridge.c:1372:fuse_readlink_cbk] 0-glusterfs-fuse:
                                                  1: /a/b/1 => ../../1
    ~~~
    
    It happens due to the problem being (op_ret + 1) > strlen(linkname),
    for some odd reason this isn't an issue on Linux where there are odd
    safegaurds on these things - Example of following code
    
    ~~~
    ((char *)linkname)[op_ret] = '\0';
    send_fuse_data (this, finh, (void *)linkname, op_ret + 2048); <---- Here!
    ~~~
    
    This behaves normally with no issue, the reasoning i have is that
    internally 'readlink()' is verified with strlen() again or perhaps the size
    is re-adjusted to the strlen() of `linkname`
    
    This isn't the case on OSX, one needs to make sure that
    ~~~
    strlen(linkname) == op_ret
    ~~~
    Otherwise you would get READLINK() loops as shown above.
    
    This patch fixes the problem.
    
    Many thanks to Anand Avati for helping me out on this.
    
    Change-Id: Ia35818de78a5e4d89bad03ab06e2c5ed6e6753a4
    BUG: 1095525
    Signed-off-by: Harshavardhana <harsha>
    Reviewed-on: http://review.gluster.org/8300
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Anand Avati <avati>

Comment 17 Anand Avati 2014-07-19 01:42:16 UTC
REVIEW: http://review.gluster.org/8336 (fuse: fuse_readlink_cbk() - linkname NULL termination unnecessary) posted (#1) for review on release-3.6 by Harshavardhana (harsha)

Comment 18 Anand Avati 2014-07-22 01:36:31 UTC
REVIEW: http://review.gluster.org/8336 (fuse: fuse_readlink_cbk() - linkname NULL termination unnecessary) posted (#2) for review on release-3.6 by Harshavardhana (harsha)

Comment 19 Anand Avati 2014-07-28 17:11:34 UTC
REVIEW: http://review.gluster.org/8336 (fuse: fuse_readlink_cbk() - linkname NULL termination unnecessary) posted (#3) for review on release-3.6 by Harshavardhana (harsha)

Comment 20 Anand Avati 2014-08-03 14:56:50 UTC
COMMIT: http://review.gluster.org/8336 committed in release-3.6 by Vijay Bellur (vbellur) 
------
commit 9fa27cbfdd93b24d8e25c28dc673df4ff499736c
Author: Harshavardhana <harsha>
Date:   Sat Jul 12 02:15:14 2014 -0700

    fuse: fuse_readlink_cbk() - linkname NULL termination unnecessary
    
    op_ret incremented to compensate for NULL terminating character
    leads to self referential loop where OSXFUSE which would reply
    on a same READLINK() over and over again
    
    ~~~
    [2014-07-12 08:41:29.815473]
       T [fuse-bridge.c:1372:fuse_readlink_cbk] 0-glusterfs-fuse:
                                                  1: /a/b/1 => ../../1
    [2014-07-12 08:41:29.815820]
       T [fuse-bridge.c:1372:fuse_readlink_cbk] 0-glusterfs-fuse:
                                                  0: /a/b/1 => ../../1
    [2014-07-12 08:41:29.816165]
       T [fuse-bridge.c:1372:fuse_readlink_cbk] 0-glusterfs-fuse:
                                                  1: /a/b/1 => ../../1
    ~~~
    
    It happens due to the problem being (op_ret + 1) > strlen(linkname),
    for some odd reason this isn't an issue on Linux where there are odd
    safegaurds on these things - Example of following code
    
    ~~~
    ((char *)linkname)[op_ret] = '\0';
    send_fuse_data (this, finh, (void *)linkname, op_ret + 2048); <---- Here!
    ~~~
    
    This behaves normally with no issue, the reasoning i have is that
    internally 'readlink()' is verified with strlen() again or perhaps the size
    is re-adjusted to the strlen() of `linkname`
    
    This isn't the case on OSX, one needs to make sure that
    ~~~
    strlen(linkname) == op_ret
    ~~~
    Otherwise you would get READLINK() loops as shown above.
    
    This patch fixes the problem.
    
    Many thanks to Anand Avati for helping me out on this.
    
    Change-Id: Ia35818de78a5e4d89bad03ab06e2c5ed6e6753a4
    BUG: 1095525
    Signed-off-by: Harshavardhana <harsha>
    Reviewed-on: http://review.gluster.org/8336
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 21 Niels de Vos 2014-09-22 12:39:49 UTC
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 22 Niels de Vos 2014-11-11 08:31:57 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users