sosreports: [nchilaka@rhsqe-repo nchilaka]$ /home/repo/sosreports/nchilaka/bug.1302355 [nchilaka@rhsqe-repo nchilaka]$ hostname rhsqe-repo.lab.eng.blr.redhat.com
Which NFS server did you use to mount the volume?
I saw the problem with atleast 2 different servers at different times. 10.70.37.202 and 10.70.37.120
Here is our analysis so far: NFS clients send ACCESS fop to get permission of any file. GlusterFS server encapsulates the file permissions in the op_errno before sending to gluster-NFS server. In the pkt trace collected, we have seen few ACCESS fops sent with op_errno set to zero (that too for root gfid/inode). That means brick processes have been sending NULL permissions. We have seen this issue hit with cold-tier bricks. When checked brick processes, we found out that, posix-acl xlator is checking against NULL perms at times. posix-acl xlator (when no acl set) stores permissions of the inodes in its 'ctx->perm' structure and uses those bits to decide access permission for any user. This 'ctx->perm' is updated as part of posix_ctx_update() which gets called as part of variety of fops (like lookup, stat, readdir etc). gluster-nfs server: (gdb) 1453 gf_msg (this->name, GF_LOG_WARNING, (gdb) c Continuing. Breakpoint 1, client3_3_access (frame=0x7f9885331570, this=0x7f9874031150, data=0x7f986bffe7a0) at client-rpc-fops.c:3520 3520 { (gdb) p this->name $10 = 0x7f9874030ae0 "nagvol-client-20" (gdb) volfile: volume nagvol-client-20 type protocol/client option send-gids true option password 492e0b7f-255f-469d-8b9c-6982079dbcd1 option username 727545cc-cd4f-4d92-9fed-81763b6d3d29 option transport-type tcp option remote-subvolume /rhs/brick2/nagvol option remote-host 10.70.35.108 option ping-timeout 42 end-volume Brick process: Breakpoint 3, posix_acl_ctx_update (inode=0x7f7b1a1b306c, this=this@entry=0x7f7b340106b0, buf=buf@entry=0x7f7acc4d9148) at posix-acl.c:734 734 { (gdb) p inode>gfid No symbol "gfid" in current context. (gdb) p inode->gfid $23 = '\000' <repeats 15 times>, "\001" (gdb) n 743 ctx = posix_acl_ctx_get (inode, this); (gdb) 744 if (!ctx) { (gdb) 743 ctx = posix_acl_ctx_get (inode, this); (gdb) 744 if (!ctx) { (gdb) 749 LOCK(&inode->lock); (gdb) 751 ctx->uid = buf->ia_uid; (gdb) 753 ctx->perm = st_mode_from_ia (buf->ia_prot, buf->ia_type); (gdb) 751 ctx->uid = buf->ia_uid; (gdb) 752 ctx->gid = buf->ia_gid; (gdb) 753 ctx->perm = st_mode_from_ia (buf->ia_prot, buf->ia_type); (gdb) 753 ctx->perm = st_mode_from_ia (buf->ia_prot, buf->ia_type); (gdb) 755 acl = ctx->acl_access; (gdb) p/x ctx->perm $24 = 0x4000 (gdb) p buf->ia_prot $25 = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}} (gdb) bt #0 posix_acl_ctx_update (inode=0x7f7b1a1b306c, this=this@entry=0x7f7b340106b0, buf=buf@entry=0x7f7acc4d9148) at posix-acl.c:755 #1 0x00007f7b39932af5 in posix_acl_readdirp_cbk (frame=0x7f7b46619a14, cookie=<optimized out>, this=0x7f7b340106b0, op_ret=7, op_errno=0, entries=0x7f7ab008da50, xdata=0x0) at posix-acl.c:1625 #2 0x00007f7b39b46bef in br_stub_readdirp_cbk (frame=0x7f7b46627674, cookie=<optimized out>, this=0x7f7b3400f270, op_ret=7, op_errno=0, entries=0x7f7ab008da50, dict=0x0) at bit-rot-stub.c:2546 #3 0x00007f7b3b0af163 in posix_readdirp (frame=0x7f7b46618aa0, this=<optimized out>, fd=<optimized out>, size=<optimized out>, off=<optimized out>, dict=<optimized out>) at posix.c:6022 #4 0x00007f7b48b10535 in default_readdirp (frame=0x7f7b46618aa0, this=0x7f7b34009240, fd=0x7f7b2c005408, size=0, off=0, xdata=0x7f7b48ddd5b8) at defaults.c:2101 #5 0x00007f7b48b10535 in default_readdirp (frame=0x7f7b46618aa0, this=0x7f7b3400a880, fd=0x7f7b2c005408, size=0, off=0, xdata=0x7f7b48ddd5b8) at defaults.c:2101 #6 0x00007f7b48b10535 in default_readdirp (frame=0x7f7b46618aa0, this=0x7f7b3400d2e0, fd=0x7f7b2c005408, size=0, off=0, xdata=0x7f7b48ddd5b8) at defaults.c:2101 #7 0x00007f7b39b404db in br_stub_readdirp (frame=0x7f7b46627674, this=0x7f7b3400f270, fd=0x7f7b2c005408, size=0, offset=0, dict=0x7f7b48ddd5b8) at bit-rot-stub.c:2581 #8 0x00007f7b39930949 in posix_acl_readdirp (frame=0x7f7b46619a14, this=0x7f7b340106b0, fd=0x7f7b2c005408, size=0, offset=0, dict=0x7f7b48ddd5b8) at posix-acl.c:1674 #9 0x00007f7b39718817 in pl_readdirp (frame=0x7f7b4661b0ec, this=0x7f7b34011ac0, fd=0x7f7b2c005408, size=0, offset=0, xdata=0x7f7b48ddd5b8) at posix.c:2213 #10 0x00007f7b39506c85 in up_readdirp (frame=0x7f7b46631504, this=0x7f7b34012e60, fd=0x7f7b2c005408, size=0, off=0, dict=0x7f7b48ddd5b8) at upcall.c:1342 #11 0x00007f7b48b1e19d in default_readdirp_resume (frame=0x7f7b4662b4f0, this=0x7f7b340142d0, fd=0x7f7b2c005408, size=0, off=0, xdata=0x7f7b48ddd5b8) at defaults.c:1657 #12 0x00007f7b48b3b17d in call_resume (stub=0x7f7b46113684) at call-stub.c:2576 #13 0x00007f7b392f6363 in iot_worker (data=0x7f7b340546a0) at io-threads.c:215 #14 0x00007f7b47973dc5 in start_thread () from /lib64/libpthread.so.0 #15 0x00007f7b472ba21d in clone () from /lib64/libc.so.6 (gdb) f 1 #1 0x00007f7b39932af5 in posix_acl_readdirp_cbk (frame=0x7f7b46619a14, cookie=<optimized out>, this=0x7f7b340106b0, op_ret=7, op_errno=0, entries=0x7f7ab008da50, xdata=0x0) at posix-acl.c:1625 1625 posix_acl_ctx_update (entry->inode, this, &entry->d_stat); (gdb) p entry $26 = (gf_dirent_t *) 0x7f7acc4d9120 (gdb) p entry->d_stat $27 = {ia_ino = 0, ia_gfid = '\000' <repeats 15 times>, "\001", ia_dev = 0, ia_type = IA_IFDIR, ia_prot = { suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_rdev = 0, ia_size = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_atime_nsec = 0, ia_mtime = 0, ia_mtime_nsec = 0, ia_ctime = 0, ia_ctime_nsec = 0} As we can see above, as part of readdirp, we ended up with root inode entry which hasn't got update with right stat (entry->d_stat). When looked at the code, int posix_make_ancestryfromgfid (xlator_t *this, char *path, int pathsize, gf_dirent_t *head, int type, uuid_t gfid, const size_t handle_size, const char *priv_base_path, inode_table_t *itable, inode_t **parent, dict_t *xdata, int32_t *op_errno) { char *linkname = NULL; /* "../../<gfid[0]>/<gfid[1]/" "<gfidstr>/<NAME_MAX>" */ ........... ........... if (__is_root_gfid (gfid)) { if (parent) { if (*parent) { inode_unref (*parent); } *parent = inode_ref (itable->root); } inode = itable->root; memset (&iabuf, 0, sizeof (iabuf)); gf_uuid_copy (iabuf.ia_gfid, inode->gfid); iabuf.ia_type = inode->ia_type; ret = posix_make_ancestral_node (priv_base_path, path, pathsize, head, "/", &iabuf, inode, type, xdata); if (ret < 0) *op_errno = ENOMEM; return ret; } ............. } For root inode entry, we do not seem to fetching stat (for other entries 'posix_resolve()' call is made which updates stat). So we suspect this could have resulted in root entry with NULL perms which in turn got updated in posix_acl xlator 'ctx->perm' resulting in EPERM error for ACCESS fop. But its not yet sure why this issue hasn't been hit till now.
Patches merged: https://code.engineering.redhat.com/gerrit/71534 https://code.engineering.redhat.com/gerrit/71538 https://code.engineering.redhat.com/gerrit/71536
*** Bug 1320887 has been marked as a duplicate of this bug. ***
Created distributed -disperse volume enabled quota and limit usage. Mounted volume as NFS on two clients.(One RHEL client and other fedora) copied 30 mp3 songs on clients. started playing mp3 songs in random mode Didn't see any error from vlc or hugs while playing songs [ashah@localhost Music]$ vlc VLC media player 2.0.10 Twoflower (revision 2.0.10-0-g2912d78) [0x9b5108] main libvlc: Running vlc with the default interface. Use 'cvlc' to use vlc without interface. bug verified on build glusterfs-3.7.9-5.el7rhgs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240