Bug 1025476 - All glusterfsd processes crashed when one of the node went down two times in succession and while renames were happening
All glusterfsd processes crashed when one of the node went down two times in ...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs (Show other bugs)
2.1
All Linux
high Severity high
: ---
: ---
Assigned To: Venky Shankar
Vijaykumar Koppad
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-10-31 15:04 EDT by M S Vishwanath Bhat
Modified: 2016-05-31 21:56 EDT (History)
6 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0.39rhs
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-27 10:45:32 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description M S Vishwanath Bhat 2013-10-31 15:04:46 EDT
Description of problem:
glusterfsd  processes crashed (all the bricks which are on-line) when one of the node went down second time. The same node went down once and after coming back online, it was online for about an hour. Then when the node went down again, all the brick processes crashed.

Version-Release number of selected component (if applicable):
glusterfs 3.4.0.37rhs built on Oct 30 2013 14:34:21


How reproducible:
Not sure. It just happened while I was testing geo-rep with lot of renames and one one down.

Steps to Reproduce:

Will add steps after making sure some of the things. Will add it as a comment.

Actual results:
3 glusterfsd processes crashed.


Expected results:
No crash

Additional info:


Back trace from the logs.

[2013-10-31 18:04:56.897575] I [server-handshake.c:569:server_setvolume] 0-master-server: accepted client from harrier.blr.redhat.com-2292-2013/10/31-18:04:56:819148-master-client-3-0 (version: 3.4.0.37rhs)
[2013-10-31 18:45:12.201727] W [posix-helpers.c:788:posix_handle_pair] 0-master-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag)
[2013-10-31 18:45:12.201775] E [posix.c:915:posix_mknod] 0-master-posix: setting xattrs on /rhs/bricks/brick3/network_shared/starting_gate.tmp failed (Operation not supported)
[2013-10-31 18:45:12.252969] W [changelog-helpers.c:321:changelog_local_init] (-->/usr/lib64/libglusterfs.so.0(default_setxattr+0x83) [0x3bf201ea13] (-->/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/access-control.so(posix_acl_setxattr+0x23d) [0x7f691f76687d] (-->/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_setxattr+0x67) [0x7f691f9710f7]))) 0-master-changelog: inode needed for version checking !!!
pending frames:
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2013-10-31 18:45:12configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.37rhs
/lib64/libc.so.6[0x3dbf832960]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_get_usable_buffer+0x0)[0x7f691f973be0]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_setxattr+0x73)[0x7f691f971103]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/access-control.so(posix_acl_setxattr+0x23d)[0x7f691f76687d]
/usr/lib64/libglusterfs.so.0(default_setxattr+0x83)[0x3bf201ea13]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/performance/io-threads.so(iot_setxattr_wrapper+0x142)[0x7f691f3361a2]
/usr/lib64/libglusterfs.so.0(call_resume+0x3ae)[0x3bf2031afe]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/performance/io-threads.so(iot_worker+0x158)[0x7f691f33dac8]
/lib64/libpthread.so.0[0x3dc0007851]
/lib64/libc.so.6(clone+0x6d)[0x3dbf8e894d]



[2013-10-31 18:45:13.762678] W [changelog-helpers.c:321:changelog_local_init] (-->/usr/lib64/libglusterfs.so.0(default_setxattr+0x83) [0x302381ea13] (-->/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/access-control.so(posix_acl_setxattr+0x23d) [0x7f14f7bec87d] (-->/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_setxattr+0x67) [0x7f14f7df70f7]))) 0-master-changelog: inode needed for version checking !!!
pending frames:
frame : type(0) op(31)
frame : type(0) op(0)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2013-10-31 18:45:13configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.37rhs
/lib64/libc.so.6[0x3b3a032960]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_get_usable_buffer+0x0)[0x7f14f7df9be0]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_setxattr+0x73)[0x7f14f7df7103]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/access-control.so(posix_acl_setxattr+0x23d)[0x7f14f7bec87d]
/usr/lib64/libglusterfs.so.0(default_setxattr+0x83)[0x302381ea13]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/performance/io-threads.so(iot_setxattr_wrapper+0x142)[0x7f14f77bc1a2]
/usr/lib64/libglusterfs.so.0(call_resume+0x3ae)[0x3023831afe]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/performance/io-threads.so(iot_worker+0x158)[0x7f14f77c3ac8]
/lib64/libpthread.so.0[0x3b3a807851]
/lib64/libc.so.6(clone+0x6d)[0x3b3a0e894d]


There were lot of renames happening from the client, when the rename was happening and also the geo-replication session was running on the master where the crash occurred.
Comment 2 Vijaykumar Koppad 2013-11-01 07:10:39 EDT
I am able to hit this issue with the build glusterfs-3.4.0.38rhs-1.el6rhs.x86_64. It was hit while renaming files, unlike above steps, it was hit at the first renames, and there were no brick down operation happening . The steps were straight.

1. create files on master, 
2. create symlinks to files,
3. create hardlinks to files.
4. start renaming files .
Comment 4 Vijaykumar Koppad 2013-11-07 06:58:44 EST
verified in the build glusterfs-3.4.0.39rhs
Comment 6 errata-xmlrpc 2013-11-27 10:45:32 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html

Note You need to log in before you can comment on or make changes to this bug.