Bug 1025476

Summary: All glusterfsd processes crashed when one of the node went down two times in succession and while renames were happening
Product: Red Hat Gluster Storage Reporter: M S Vishwanath Bhat <vbhat>
Component: glusterfsAssignee: Venky Shankar <vshankar>
Status: CLOSED ERRATA QA Contact: Vijaykumar Koppad <vkoppad>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: amarts, bbandari, grajaiya, mzywusko, vbellur, vkoppad
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.39rhs Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-27 15:45:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description M S Vishwanath Bhat 2013-10-31 19:04:46 UTC
Description of problem:
glusterfsd  processes crashed (all the bricks which are on-line) when one of the node went down second time. The same node went down once and after coming back online, it was online for about an hour. Then when the node went down again, all the brick processes crashed.

Version-Release number of selected component (if applicable):
glusterfs 3.4.0.37rhs built on Oct 30 2013 14:34:21


How reproducible:
Not sure. It just happened while I was testing geo-rep with lot of renames and one one down.

Steps to Reproduce:

Will add steps after making sure some of the things. Will add it as a comment.

Actual results:
3 glusterfsd processes crashed.


Expected results:
No crash

Additional info:


Back trace from the logs.

[2013-10-31 18:04:56.897575] I [server-handshake.c:569:server_setvolume] 0-master-server: accepted client from harrier.blr.redhat.com-2292-2013/10/31-18:04:56:819148-master-client-3-0 (version: 3.4.0.37rhs)
[2013-10-31 18:45:12.201727] W [posix-helpers.c:788:posix_handle_pair] 0-master-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag)
[2013-10-31 18:45:12.201775] E [posix.c:915:posix_mknod] 0-master-posix: setting xattrs on /rhs/bricks/brick3/network_shared/starting_gate.tmp failed (Operation not supported)
[2013-10-31 18:45:12.252969] W [changelog-helpers.c:321:changelog_local_init] (-->/usr/lib64/libglusterfs.so.0(default_setxattr+0x83) [0x3bf201ea13] (-->/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/access-control.so(posix_acl_setxattr+0x23d) [0x7f691f76687d] (-->/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_setxattr+0x67) [0x7f691f9710f7]))) 0-master-changelog: inode needed for version checking !!!
pending frames:
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2013-10-31 18:45:12configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.37rhs
/lib64/libc.so.6[0x3dbf832960]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_get_usable_buffer+0x0)[0x7f691f973be0]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_setxattr+0x73)[0x7f691f971103]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/access-control.so(posix_acl_setxattr+0x23d)[0x7f691f76687d]
/usr/lib64/libglusterfs.so.0(default_setxattr+0x83)[0x3bf201ea13]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/performance/io-threads.so(iot_setxattr_wrapper+0x142)[0x7f691f3361a2]
/usr/lib64/libglusterfs.so.0(call_resume+0x3ae)[0x3bf2031afe]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/performance/io-threads.so(iot_worker+0x158)[0x7f691f33dac8]
/lib64/libpthread.so.0[0x3dc0007851]
/lib64/libc.so.6(clone+0x6d)[0x3dbf8e894d]



[2013-10-31 18:45:13.762678] W [changelog-helpers.c:321:changelog_local_init] (-->/usr/lib64/libglusterfs.so.0(default_setxattr+0x83) [0x302381ea13] (-->/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/access-control.so(posix_acl_setxattr+0x23d) [0x7f14f7bec87d] (-->/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_setxattr+0x67) [0x7f14f7df70f7]))) 0-master-changelog: inode needed for version checking !!!
pending frames:
frame : type(0) op(31)
frame : type(0) op(0)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2013-10-31 18:45:13configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.37rhs
/lib64/libc.so.6[0x3b3a032960]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_get_usable_buffer+0x0)[0x7f14f7df9be0]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_setxattr+0x73)[0x7f14f7df7103]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/access-control.so(posix_acl_setxattr+0x23d)[0x7f14f7bec87d]
/usr/lib64/libglusterfs.so.0(default_setxattr+0x83)[0x302381ea13]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/performance/io-threads.so(iot_setxattr_wrapper+0x142)[0x7f14f77bc1a2]
/usr/lib64/libglusterfs.so.0(call_resume+0x3ae)[0x3023831afe]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/performance/io-threads.so(iot_worker+0x158)[0x7f14f77c3ac8]
/lib64/libpthread.so.0[0x3b3a807851]
/lib64/libc.so.6(clone+0x6d)[0x3b3a0e894d]


There were lot of renames happening from the client, when the rename was happening and also the geo-replication session was running on the master where the crash occurred.

Comment 2 Vijaykumar Koppad 2013-11-01 11:10:39 UTC
I am able to hit this issue with the build glusterfs-3.4.0.38rhs-1.el6rhs.x86_64. It was hit while renaming files, unlike above steps, it was hit at the first renames, and there were no brick down operation happening . The steps were straight.

1. create files on master, 
2. create symlinks to files,
3. create hardlinks to files.
4. start renaming files .

Comment 4 Vijaykumar Koppad 2013-11-07 11:58:44 UTC
verified in the build glusterfs-3.4.0.39rhs

Comment 6 errata-xmlrpc 2013-11-27 15:45:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html