Description of problem: -------------------------- On a 6x3 volume, some bricks were brought down when rebalance was in progress. This caused the mount to be read-only (client quorum was enabled). While rebalance was in progress, the bricks were brought back up. This triggered self-heal on the volume. While self-heal was in progress, attempt to remove a directory was made from the mount point. After a while a brick was found to have crashed. Following is from logs of the brick that crashed - [2015-04-22 12:31:23.720702] E [posix.c:4433:posix_removexattr] 0-vol-posix: null gfid for path /linux-3.19.4/include/net/netprio_cgroup.h frame : type(0) op(0) frame : type(0) op(20) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(40) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2015-04-22 12:31:23 configuration details: [2015-04-22 12:31:23.731428] E [posix.c:4433:posix_removexattr] 0-vol-posix: null gfid for path /linux-3.19.4/include/net/raw.h argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7dev /usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3ad30221c6] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x3ad303de2f] /lib64/libc.so.6[0x3ad14326a0] /lib64/libpthread.so.0(pthread_spin_lock+0x0)[0x3ad180c380] /usr/lib64/glusterfs/3.7dev/xlator/features/quota.so(+0x589b)[0x7f484d94689b] /usr/lib64/glusterfs/3.7dev/xlator/features/quota.so(quota_fill_inodectx+0x1fa)[0x7f484d94f6aa] /usr/lib64/glusterfs/3.7dev/xlator/features/quota.so(quota_readdirp_cbk+0x13e)[0x7f484d94fa9e] /usr/lib64/glusterfs/3.7dev/xlator/features/marker.so(marker_readdirp_cbk+0x13e)[0x7f484db71bbe] /usr/lib64/libglusterfs.so.0(default_readdirp_cbk+0xc2)[0x3ad302e622] /usr/lib64/glusterfs/3.7dev/xlator/features/locks.so(pl_readdirp_cbk+0x18b)[0x7f484e5b6cfb] /usr/lib64/glusterfs/3.7dev/xlator/features/access-control.so(posix_acl_readdirp_cbk+0x27a)[0x7f484e7d0b7a] /usr/lib64/glusterfs/3.7dev/xlator/features/bitrot-stub.so(br_stub_readdirp_cbk+0x214)[0x7f484e9db304] /usr/lib64/glusterfs/3.7dev/xlator/storage/posix.so(posix_do_readdir+0x1b8)[0x7f484f871498] /usr/lib64/glusterfs/3.7dev/xlator/storage/posix.so(posix_readdirp+0x1ee)[0x7f484f872fde] /usr/lib64/libglusterfs.so.0(default_readdirp+0x83)[0x3ad3027333] /usr/lib64/libglusterfs.so.0(default_readdirp+0x83)[0x3ad3027333] /usr/lib64/libglusterfs.so.0(default_readdirp+0x83)[0x3ad3027333] /usr/lib64/glusterfs/3.7dev/xlator/features/bitrot-stub.so(br_stub_readdirp+0x259)[0x7f484e9d8e29] /usr/lib64/glusterfs/3.7dev/xlator/features/access-control.so(posix_acl_readdirp+0x19d)[0x7f484e7cd4bd] /usr/lib64/glusterfs/3.7dev/xlator/features/locks.so(pl_readdirp+0x204)[0x7f484e5b5d94] /usr/lib64/libglusterfs.so.0(default_readdirp+0x83)[0x3ad3027333] /usr/lib64/libglusterfs.so.0(default_readdirp_resume+0x142)[0x3ad3029db2] /usr/lib64/libglusterfs.so.0(call_resume+0x80)[0x3ad3046470] /usr/lib64/glusterfs/3.7dev/xlator/performance/io-threads.so(iot_worker+0x158)[0x7f484e1a1388] /lib64/libpthread.so.0[0x3ad18079d1] /lib64/libc.so.6(clone+0x6d)[0x3ad14e88fd] --------- Attempt to remove a directory from the mount point failed - # rm -fr linux-3.19.4 rm: cannot remove `linux-3.19.4/include/crypto': Directory not empty rm: cannot remove `linux-3.19.4/include/drm': Directory not empty rm: cannot remove `linux-3.19.4/include/media': Directory not empty rm: cannot remove `linux-3.19.4/include/net/netfilter': Directory not empty rm: cannot remove `linux-3.19.4/include/net/bluetooth': Directory not empty I also see a lot of the following messages in brick logs - <snip> [2015-04-22 12:31:14.461836] E [posix.c:4433:posix_removexattr] 0-vol-posix: null gfid for path /linux-3.19.4/include/net/netns [2015-04-22 12:31:18.132176] E [posix.c:4433:posix_removexattr] 0-vol-posix: null gfid for path /linux-3.19.4/include/net/caif/caif_layer.h [2015-04-22 12:31:23.675448] E [posix.c:4433:posix_removexattr] 0-vol-posix: null gfid for path /linux-3.19.4/include/net/ip6_fib.h [2015-04-22 12:31:23.691089] E [posix.c:4433:posix_removexattr] 0-vol-posix: null gfid for path /linux-3.19.4/include/net/ip_fib.h [2015-04-22 12:31:23.699589] E [posix.c:4433:posix_removexattr] 0-vol-posix: null gfid for path /linux-3.19.4/include/net/lib80211.h [2015-04-22 12:31:23.711344] E [posix.c:4433:posix_removexattr] 0-vol-posix: null gfid for path /linux-3.19.4/include/net/neighbour.h </snip> See volume info below - # gluster volume info vol Volume Name: vol Type: Distributed-Replicate Volume ID: 133fe4f3-987c-474d-9904-c28475d4812f Status: Started Number of Bricks: 6 x 3 = 18 Transport-type: tcp Bricks: Brick1: vm3-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick1/b1 Brick2: vm4-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick1/b1 Brick3: vm5-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick1/b1 Brick4: vm6-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick1/b1 Brick5: vm3-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick2/b1 Brick6: vm4-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick2/b1 Brick7: vm5-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick2/b1 Brick8: vm6-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick2/b1 Brick9: vm3-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick3/b1 Brick10: vm4-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick3/b1 Brick11: vm5-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick3/b1 Brick12: vm6-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick3/b1 Brick13: vm3-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick4/b1 Brick14: vm4-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick4/b1 Brick15: vm5-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick4/b1 Brick16: vm6-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick4/b1 Brick17: vm3-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick5/b1 Brick18: vm4-rhsqa13.lab.eng.blr.redhat.com:/rhs/brick5/b1 Options Reconfigured: cluster.quorum-type: auto client.event-threads: 4 server.event-threads: 5 features.uss: enable features.quota: on cluster.consistent-metadata: on Version-Release number of selected component (if applicable): -------------------------------------------------------------- On the server - glusterfs-3.7dev-0.965.git2788ddd.el6.x86_64 On the client - glusterfs-3.7dev-0.1009.git8b987be.el6.x86_64 How reproducible: ------------------ Saw it once. Steps to Reproduce: -------------------- 1. 1. On a 6x3 volume, started remove-brick operation of one replica set. 2. After completion of data migration for the remove-brick operation, executed stop remove-brick. 3. Started rebalance operation on the volume. 4. While rebalance was in progress, killed two bricks each in 3 replica sets. 5. After a while, while rebalance was still running, started the volume using force. 6. Was monitoring volume heal info output when I noticed that one of the bricks was not connected (See BZ#1214169 for details) 7. While self-heal was in progress, tried to remove a directory from the mount point - #rm -fr linux-3.19.4 8. After a while a brick was found to have crashed (found brick to be disconnected in heal info output) Actual results: ---------------- Brick process crashed. Expected results: ------------------ Brick process is not expected to crash. Additional info:
REVIEW: http://review.gluster.org/10416 (quota: Validate NULL inode from the entries received in readdirp_cbk) posted (#1) for review on master by Vijaikumar Mallikarjuna (vmallika)
COMMIT: http://review.gluster.org/10416 committed in master by Raghavendra G (rgowdapp) ------ commit a152fa7ad96053093b88a010bff20e48aa5e5b70 Author: vmallika <vmallika> Date: Tue Apr 28 12:52:56 2015 +0530 quota: Validate NULL inode from the entries received in readdirp_cbk In quota readdirp_cbk, inode ctx filled for the all entries received. In marker readdirp_cbk, files/directories are inspected for dirty There is no guarantee that entry->inode is populated. If entry->inode is NULL, this needs to be treated as readdir Change-Id: Id2d17bb89e4770845ce1f13d73abc2b3c5826c06 BUG: 1215550 Signed-off-by: vmallika <vmallika> Reviewed-on: http://review.gluster.org/10416 Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra G <rgowdapp> Tested-by: Raghavendra G <rgowdapp>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user