Bug 1025476 - All glusterfsd processes crashed when one of the node went down two times in succession and while renames were happening
Summary: All glusterfsd processes crashed when one of the node went down two times in ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.1
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Venky Shankar
QA Contact: Vijaykumar Koppad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-31 19:04 UTC by M S Vishwanath Bhat
Modified: 2016-06-01 01:56 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.4.0.39rhs
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-11-27 15:45:32 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:1769 0 normal SHIPPED_LIVE Red Hat Storage 2.1 enhancement and bug fix update #1 2013-11-27 20:17:39 UTC

Description M S Vishwanath Bhat 2013-10-31 19:04:46 UTC
Description of problem:
glusterfsd  processes crashed (all the bricks which are on-line) when one of the node went down second time. The same node went down once and after coming back online, it was online for about an hour. Then when the node went down again, all the brick processes crashed.

Version-Release number of selected component (if applicable):
glusterfs 3.4.0.37rhs built on Oct 30 2013 14:34:21


How reproducible:
Not sure. It just happened while I was testing geo-rep with lot of renames and one one down.

Steps to Reproduce:

Will add steps after making sure some of the things. Will add it as a comment.

Actual results:
3 glusterfsd processes crashed.


Expected results:
No crash

Additional info:


Back trace from the logs.

[2013-10-31 18:04:56.897575] I [server-handshake.c:569:server_setvolume] 0-master-server: accepted client from harrier.blr.redhat.com-2292-2013/10/31-18:04:56:819148-master-client-3-0 (version: 3.4.0.37rhs)
[2013-10-31 18:45:12.201727] W [posix-helpers.c:788:posix_handle_pair] 0-master-posix: Extended attributes not supported (try remounting brick with 'user_xattr' flag)
[2013-10-31 18:45:12.201775] E [posix.c:915:posix_mknod] 0-master-posix: setting xattrs on /rhs/bricks/brick3/network_shared/starting_gate.tmp failed (Operation not supported)
[2013-10-31 18:45:12.252969] W [changelog-helpers.c:321:changelog_local_init] (-->/usr/lib64/libglusterfs.so.0(default_setxattr+0x83) [0x3bf201ea13] (-->/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/access-control.so(posix_acl_setxattr+0x23d) [0x7f691f76687d] (-->/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_setxattr+0x67) [0x7f691f9710f7]))) 0-master-changelog: inode needed for version checking !!!
pending frames:
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2013-10-31 18:45:12configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.37rhs
/lib64/libc.so.6[0x3dbf832960]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_get_usable_buffer+0x0)[0x7f691f973be0]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_setxattr+0x73)[0x7f691f971103]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/access-control.so(posix_acl_setxattr+0x23d)[0x7f691f76687d]
/usr/lib64/libglusterfs.so.0(default_setxattr+0x83)[0x3bf201ea13]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/performance/io-threads.so(iot_setxattr_wrapper+0x142)[0x7f691f3361a2]
/usr/lib64/libglusterfs.so.0(call_resume+0x3ae)[0x3bf2031afe]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/performance/io-threads.so(iot_worker+0x158)[0x7f691f33dac8]
/lib64/libpthread.so.0[0x3dc0007851]
/lib64/libc.so.6(clone+0x6d)[0x3dbf8e894d]



[2013-10-31 18:45:13.762678] W [changelog-helpers.c:321:changelog_local_init] (-->/usr/lib64/libglusterfs.so.0(default_setxattr+0x83) [0x302381ea13] (-->/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/access-control.so(posix_acl_setxattr+0x23d) [0x7f14f7bec87d] (-->/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_setxattr+0x67) [0x7f14f7df70f7]))) 0-master-changelog: inode needed for version checking !!!
pending frames:
frame : type(0) op(31)
frame : type(0) op(0)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2013-10-31 18:45:13configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.37rhs
/lib64/libc.so.6[0x3b3a032960]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_get_usable_buffer+0x0)[0x7f14f7df9be0]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/changelog.so(changelog_setxattr+0x73)[0x7f14f7df7103]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/features/access-control.so(posix_acl_setxattr+0x23d)[0x7f14f7bec87d]
/usr/lib64/libglusterfs.so.0(default_setxattr+0x83)[0x302381ea13]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/performance/io-threads.so(iot_setxattr_wrapper+0x142)[0x7f14f77bc1a2]
/usr/lib64/libglusterfs.so.0(call_resume+0x3ae)[0x3023831afe]
/usr/lib64/glusterfs/3.4.0.37rhs/xlator/performance/io-threads.so(iot_worker+0x158)[0x7f14f77c3ac8]
/lib64/libpthread.so.0[0x3b3a807851]
/lib64/libc.so.6(clone+0x6d)[0x3b3a0e894d]


There were lot of renames happening from the client, when the rename was happening and also the geo-replication session was running on the master where the crash occurred.

Comment 2 Vijaykumar Koppad 2013-11-01 11:10:39 UTC
I am able to hit this issue with the build glusterfs-3.4.0.38rhs-1.el6rhs.x86_64. It was hit while renaming files, unlike above steps, it was hit at the first renames, and there were no brick down operation happening . The steps were straight.

1. create files on master, 
2. create symlinks to files,
3. create hardlinks to files.
4. start renaming files .

Comment 4 Vijaykumar Koppad 2013-11-07 11:58:44 UTC
verified in the build glusterfs-3.4.0.39rhs

Comment 6 errata-xmlrpc 2013-11-27 15:45:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html


Note You need to log in before you can comment on or make changes to this bug.