Description of problem: glusterfsd processes crashed (all the bricks which are on-line) when one of the node went down second time. The same node went down once and after coming back online, it was online for about an hour. Then when the node went down again, all the brick processes crashed. Version-Release number of selected component (if applicable): glusterfs 3.4.0.37rhs built on Oct 30 2013 14:34:21 How reproducible: Not sure. It just happened while I was testing geo-rep with lot of renames and one one down. Steps to Reproduce: Will add steps after making sure some of the things. Will add it as a comment. Actual results: 3 glusterfsd processes crashed. Expected results: No crash Additional info: Back trace from the logs. [2013-10-31 18:04:56.897575] I [server-handshake.c:569:server_setvolume] 0-master-server: accepted client from harrier.blr.redhat.com-2292-2013/10/31-18:04:56:819148-master-client-3-0 (version: 3.4.0.37rhs) [2013-10-31 18:45:12.201727] W [posix-helpers.c:788:posix_handle_pair] 0-master-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag) [2013-10-31 18:45:12.201775] E [posix.c:915:posix_mknod] 0-master-posix: setting xattrs on /rhs/bricks/brick3/network_shared/starting_gate.tmp failed (Operation not supported) [2013-10-31 18:45:12.252969] W [changelog-helpers.c:321:changelog_local_init] (-->/usr/lib64/libglusterfs.so.0(default_setxattr+0x83) [0x3bf201ea13] (-->/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/access-control.so(posix_acl_setxattr+0x23d) [0x7f691f76687d] (-->/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_setxattr+0x67) [0x7f691f9710f7]))) 0-master-changelog: inode needed for version checking !!! pending frames: frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2013-10-31 18:45:12configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.37rhs /lib64/libc.so.6[0x3dbf832960] /usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_get_usable_buffer+0x0)[0x7f691f973be0] /usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_setxattr+0x73)[0x7f691f971103] /usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/access-control.so(posix_acl_setxattr+0x23d)[0x7f691f76687d] /usr/lib64/libglusterfs.so.0(default_setxattr+0x83)[0x3bf201ea13] /usr/lib64/glusterfs/3.4.0.37rhs/xlator/performance/io-threads.so(iot_setxattr_wrapper+0x142)[0x7f691f3361a2] /usr/lib64/libglusterfs.so.0(call_resume+0x3ae)[0x3bf2031afe] /usr/lib64/glusterfs/3.4.0.37rhs/xlator/performance/io-threads.so(iot_worker+0x158)[0x7f691f33dac8] /lib64/libpthread.so.0[0x3dc0007851] /lib64/libc.so.6(clone+0x6d)[0x3dbf8e894d] [2013-10-31 18:45:13.762678] W [changelog-helpers.c:321:changelog_local_init] (-->/usr/lib64/libglusterfs.so.0(default_setxattr+0x83) [0x302381ea13] (-->/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/access-control.so(posix_acl_setxattr+0x23d) [0x7f14f7bec87d] (-->/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_setxattr+0x67) [0x7f14f7df70f7]))) 0-master-changelog: inode needed for version checking !!! pending frames: frame : type(0) op(31) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2013-10-31 18:45:13configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.37rhs /lib64/libc.so.6[0x3b3a032960] /usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_get_usable_buffer+0x0)[0x7f14f7df9be0] /usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_setxattr+0x73)[0x7f14f7df7103] /usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/access-control.so(posix_acl_setxattr+0x23d)[0x7f14f7bec87d] /usr/lib64/libglusterfs.so.0(default_setxattr+0x83)[0x302381ea13] /usr/lib64/glusterfs/3.4.0.37rhs/xlator/performance/io-threads.so(iot_setxattr_wrapper+0x142)[0x7f14f77bc1a2] /usr/lib64/libglusterfs.so.0(call_resume+0x3ae)[0x3023831afe] /usr/lib64/glusterfs/3.4.0.37rhs/xlator/performance/io-threads.so(iot_worker+0x158)[0x7f14f77c3ac8] /lib64/libpthread.so.0[0x3b3a807851] /lib64/libc.so.6(clone+0x6d)[0x3b3a0e894d] There were lot of renames happening from the client, when the rename was happening and also the geo-replication session was running on the master where the crash occurred.
I am able to hit this issue with the build glusterfs-3.4.0.38rhs-1.el6rhs.x86_64. It was hit while renaming files, unlike above steps, it was hit at the first renames, and there were no brick down operation happening . The steps were straight. 1. create files on master, 2. create symlinks to files, 3. create hardlinks to files. 4. start renaming files .
https://code.engineering.redhat.com/gerrit/#/c/15047/
verified in the build glusterfs-3.4.0.39rhs
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1769.html