Bug 762737 (GLUSTER-1005) - Solaris servers crash
Summary: Solaris servers crash
Keywords:
Status: CLOSED DUPLICATE of bug 762790
Alias: GLUSTER-1005
Product: GlusterFS
Classification: Community
Component: logging
Version: 3.0.4
Hardware: All
OS: Solaris
low
medium
Target Milestone: ---
Assignee: shishir gowda
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-06-16 12:02 UTC by Lakshmipathi G
Modified: 2015-12-01 16:45 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: RTP
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Lakshmipathi G 2010-06-16 12:02:40 UTC
While running system_light over dht - 4  solaris server + 1 client - all four servers crashed.

server-log:
--
[2010-06-16 18:49:46] D [compat.c:365:solaris_getxattr] libglusterfs: Couldn't read extended attribute for the file /export/home/gluster/laks/export1//sys (61)
[2010-06-16 18:49:47] D [compat.c:365:solaris_getxattr] libglusterfs: Couldn't read extended attribute for the file /export/home/gluster/laks/export1//sys/fstest_ac2ce25b84f49745b8a1bc7ab6c22993 (61)
[2010-06-16 18:49:47] D [compat.c:365:solaris_getxattr] libglusterfs: Couldn't read extended attribute for the file /export/home/gluster/laks/export1//sys/fstest_ac2ce25b84f49745b8a1bc7ab6c22993/fstest_64be9649ca4203790a6f12dbd602d244 (61)
[2010-06-16 18:49:49] D [compat.c:365:solaris_getxattr] libglusterfs: Couldn't read extended attribute for the file /export/home/gluster/laks/export1//sys/fstest_ac2ce25b84f49745b8a1bc7ab6c22993/fstest_64be9649ca4203790a6f12dbd602d244 (61)
[2010-06-16 18:49:50] D [compat.c:365:solaris_getxattr] libglusterfs: Couldn't read extended attribute for the file /export/home/gluster/laks/export1//sys/fstest_ac2ce25b84f49745b8a1bc7ab6c22993/fstest_64be9649ca4203790a6f12dbd602d244 (61)
[2010-06-16 18:49:53] D [compat.c:365:solaris_getxattr] libglusterfs: Couldn't read extended attribute for the file /export/home/gluster/laks/export1//sys/fstest_ac2ce25b84f49745b8a1bc7ab6c22993/fstest_64be9649ca4203790a6f12dbd602d244 (61)
[2010-06-16 18:49:54] D [compat.c:365:solaris_getxattr] libglusterfs: Couldn't read extended attribute for the file /export/home/gluster/laks/export1//sys/fstest_ac2ce25b84f49745b8a1bc7ab6c22993/fstest_64be9649ca4203790a6f12dbd602d244 (61)
[2010-06-16 18:49:55] D [compat.c:365:solaris_getxattr] libglusterfs: Couldn't read extended attribute for the file /export/home/gluster/laks/export1// (61)
[2010-06-16 18:49:55] D [compat.c:365:solaris_getxattr] libglusterfs: Couldn't read extended attribute for the file /export/home/gluster/laks/export1//sys (61)
[2010-06-16 18:49:55] D [compat.c:365:solaris_getxattr] libglusterfs: Couldn't read extended attribute for the file /export/home/gluster/laks/export1//sys/fstest_ac2ce25b84f49745b8a1bc7ab6c22993 (61)
[2010-06-16 18:49:55] D [compat.c:365:solaris_getxattr] libglusterfs: Couldn't read extended attribute for the file /export/home/gluster/laks/export1//sys/fstest_dbc278eff91a7a2b3679dfd1b8cd716c (61)
[2010-06-16 18:50:00] E [posix.c:485:posix_lookup] posix1: lstat on /sys/_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_12345 failed: File name too long
pending frames:
frame : type(1) op(LOOKUP)

patchset: v3.0.4-43-g4437568
signal received: 11
time of crash: 2010-06-16 18:50:00
configuration details:
dlfcn 1
libpthread 1
spinlock 1
st_atim.tv_nsec 1
package-string: glusterfs 3.0.5rc6

=============================
(gdb) bt full
#0  0xfed9a227 in _lwp_kill () from /lib/libc.so.1
No symbol table info available.
#1  0xfed9598f in thr_kill () from /lib/libc.so.1
No symbol table info available.
#2  0xfed41ed3 in raise () from /lib/libc.so.1
No symbol table info available.
#3  0xfee56c99 in gf_print_trace (signum=11) at common-utils.c:467
	tm = (struct tm *) 0x0
	msg = "signal received: 11\n\0004437568\n", '\0' <repeats 994 times>
	timestr = "2010-06-16 18:50:00\n", '\0' <repeats 235 times>
	utime = 1276694400
	fd = 5
#4  0xfed975af in __sighndlr () from /lib/libc.so.1
No symbol table info available.
#5  0xfed8d290 in call_user_handler () from /lib/libc.so.1
No symbol table info available.
#6  <signal handler called>
No symbol table info available.
#7  0xfee63c10 in vasprintf (result=0x0, format=0xfec39d38 "%lld: LOOKUP %s (%lld) ==> %d (%s)", args=0xfde6a8f8 "3T\021") at compat.c:470
	p = 0xfec39d46 "s (%lld) ==> %d (%s)"
	total_width = 95
	ap = 0xfde6a900 "X�\025\b"
#8  0xfee51aaf in _gf_log (domain=0x80b2ed8 "server-tcp", file=0x4f <Address 0x4f out of bounds>, function=0xfec38ec0 "server_lookup_cbk", line=2471, 
    level=GF_LOG_DEBUG, fmt=0xfec39d38 "%lld: LOOKUP %s (%lld) ==> %d (%s)") at logging.c:502
	basename = 0x0
	new_logfile = (FILE *) 0xfde6a7c0
	utime = 1276694400
	tm = (struct tm *) 0xfde6a7c0
	timestr = "2010-06-16 18:50:00", '\0' <repeats 169 times>, "�\022��\000\000\000\001\000\000\000\000�\022��\000��������\000���\022��\004\000\000\000�\022��\000���C\000\000\000\000\000\000\000��e\022���\022��\000\000\000"
	str1 = 0x816bb50 "[2010-06-16 18:50:00] D [server-protocol.c:2471:server_lookup_cbk] server-tcp: "
	str2 = 0x0
	msg = 0x0
	len = 0
	ret = 79
	level_strings = {0xfee6ef34 "", 0xfee6ef35 "C", 0xfee6ef37 "E", 0xfee6ef39 "W", 0xfee6ef3b "N", 0xfee6ef3d "D", 0xfee6ef3f "T", 0xfee6ef34 ""}
	__PRETTY_FUNCTION__ = "_gf_log"
---Type <return> to continue, or q <return> to quit--- 
	__FUNCTION__ = "_gf_log"
#9  0xfec2bfae in server_lookup_cbk (frame=0x813853c, cookie=0x8104b68, this=0x80b3c50, op_ret=-1, op_errno=78, inode=0x814b158, stbuf=0xfde6ac50, dict=0x0, 
    postparent=0xfde6abc0) at server-protocol.c:2466
	state = (server_state_t *) 0x8141bb8
	dict_len = 0
	hdrlen = 272
	gf_errno = 0
	ret = 135463876
	link_inode = (inode_t *) 0x81303c4
	fresh_loc = {path = 0x0, name = 0x0, ino = 0, inode = 0x0, parent = 0x0}
	__FUNCTION__ = "server_lookup_cbk"
#10 0xfec5373d in iot_lookup_cbk (frame=0x8104b68, cookie=0x8140e88, this=0x80b0d60, op_ret=-1, op_errno=78, inode=0x814b158, buf=0xfde6ac50, xattr=0x0, 
    postparent=0xfde6abc0) at io-threads.c:322
	fn = (fop_lookup_cbk_t) 0xfec2bb4c <server_lookup_cbk>
	_parent = (call_frame_t *) 0x813853c
	old_THIS = (xlator_t *) 0x80b0d60
#11 0xfec76293 in pl_lookup_cbk (frame=0x8140e88, cookie=0x818f150, this=0x80b0cc8, op_ret=-1, op_errno=78, inode=0x814b158, buf=0xfde6ac50, dict=0x0, 
    postparent=0xfde6abc0) at posix.c:1123
	fn = (ret_fn_t) 0xfec536dc <iot_lookup_cbk>
	_parent = (call_frame_t *) 0x8104b68
	old_THIS = (xlator_t *) 0x80b0cc8
#12 0xfec93d7d in posix_lookup (frame=0x818f150, this=0x80b0bc8, loc=0x817d278, xattr_req=0x80b8868) at posix.c:522
	fn = (fop_lookup_cbk_t) 0xfec761e8 <pl_lookup_cbk>
	_parent = (call_frame_t *) 0x8140e88
	old_THIS = (xlator_t *) 0x80b0bc8
	buf = {st_dev = 0, st_pad1 = {0, 0, 0}, st_ino = 0, st_mode = 0, st_nlink = 0, st_uid = 0, st_gid = 0, st_rdev = 0, st_pad2 = {0, 0}, st_size = 0, st_atim = {
    tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 0, tv_nsec = 0}, st_blksize = 0, st_blocks = 0, 
  st_fstype = '\0' <repeats 15 times>, st_pad4 = {0, 0, 0, 0, 0, 0, 0, 0}}
	op_ret = -1
	entry_ret = -1
	op_errno = 78
	xattr = (dict_t *) 0x0
	pathdup = 0x8130308 ""
	parentpath = 0x0
	postparent = {st_dev = 22937, st_pad1 = {0, 0, 0}, st_ino = 7681, st_mode = 16877, st_nlink = 3, st_uid = 0, st_gid = 0, st_rdev = 0, st_pad2 = {0, 0}, 
  st_size = 512, st_atim = {tv_sec = 1276693659, tv_nsec = 149559000}, st_mtim = {tv_sec = 1276694396, tv_nsec = 126044000}, st_ctim = {tv_sec = 1276694396, 
    tv_nsec = 126044000}, st_blksize = 8192, st_blocks = 2, st_fstype = "ufs", '\0' <repeats 12 times>, st_pad4 = {0, 0, 0, 0, 0, 0, 0, 0}}
---Type <return> to continue, or q <return> to quit---
	__FUNCTION__ = "posix_lookup"
#13 0xfec76446 in pl_lookup (frame=0x8140e88, this=0x80b0cc8, loc=0x817d278, xattr_req=0x80b8868) at posix.c:1163
	_new = (call_frame_t *) 0x818f150
	old_THIS = (xlator_t *) 0x80b0cc8
	tmp_cbk = (fop_lookup_cbk_t) 0
	local = (pl_local_t *) 0x80b0cc8
	__FUNCTION__ = "pl_lookup"
#14 0xfec537ef in iot_lookup_wrapper (frame=0x80b0d60, this=0x80b0d60, loc=0x817d278, xattr_req=0x80b8868) at io-threads.c:332
	_new = (call_frame_t *) 0x8140e88
	old_THIS = (xlator_t *) 0x80b0d60
	tmp_cbk = (fop_lookup_cbk_t) 0
#15 0xfee61755 in call_resume (stub=0x817d260) at call-stub.c:2673
	old_THIS = (xlator_t *) 0xfee940e0
	__FUNCTION__ = "call_resume"
#16 0xfec565b5 in iot_worker_unordered (arg=0x80bade8) at io-threads.c:2453
	stub = (call_stub_t *) 0x0
#17 0xfed971c0 in _thr_setup () from /lib/libc.so.1
No symbol table info available.
#18 0xfed974b0 in L3_doit () from /lib/libc.so.1
No symbol table info available.
#19 0xfe2b2a00 in ?? ()
No symbol table info available.
#20 0x00000000 in ?? ()
No symbol table info available.

Comment 1 Pavan Vilas Sondur 2010-06-17 04:22:17 UTC
It is known to crash in vasprintf if built with -m32 (default). Can you verify if it happens when built with -m64 too?

Comment 2 Lakshmipathi G 2010-06-29 09:48:38 UTC
no,it didn't crash when built with -m64 flag.

Comment 3 Amar Tumballi 2010-07-19 06:57:21 UTC
all solaris crash issues point to something related to argument printing.. need to verify/fix.

Comment 4 shishir gowda 2010-09-21 05:20:34 UTC
Moving all solaris bug to target milestone 3.2.0

Comment 5 shishir gowda 2010-11-16 07:08:37 UTC
This seems to be crash seen in solaris vasprintf for format %llu/%lld.

This was fixed as part of bug 762790.

*** This bug has been marked as a duplicate of bug 1058 ***


Note You need to log in before you can comment on or make changes to this bug.