Bug 803815 - [glusterfs-3.3.0qa27]: glusterfs server crashed
Summary: [glusterfs-3.3.0qa27]: glusterfs server crashed
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: protocol
Version: mainline
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
Assignee: Amar Tumballi
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-03-15 17:12 UTC by Raghavendra Bhat
Modified: 2013-12-19 00:07 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:30:36 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: glusterfs-3.3.0qa40
Embargoed:


Attachments (Terms of Use)

Description Raghavendra Bhat 2012-03-15 17:12:09 UTC
Description of problem:

3 replica volume. 3 fuse and 3 nfs client running different tests such as rdd, ping_pong, fs-perf-test etc. One of the servers crashed when statedump was issued.

This is the backtrace.

GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 3555
Reading symbols from /usr/local/sbin/glusterfsd...done.
Reading symbols from /usr/local/lib/libglusterfs.so.0...done.
Loaded symbols for /usr/local/lib/libglusterfs.so.0
Reading symbols from /usr/local/lib/libgfrpc.so.0...done.
Loaded symbols for /usr/local/lib/libgfrpc.so.0
Reading symbols from /usr/local/lib/libgfxdr.so.0...done.
Loaded symbols for /usr/local/lib/libgfxdr.so.0
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
[New Thread 0x7f519fdfd700 (LWP 3594)]
[New Thread 0x7f51b518b700 (LWP 3572)]
[New Thread 0x7f51b538d700 (LWP 3570)]
[New Thread 0x7f51b75a6700 (LWP 3560)]
[New Thread 0x7f51b83d1700 (LWP 3558)]
[New Thread 0x7f51b8dd2700 (LWP 3557)]
[New Thread 0x7f51b97d3700 (LWP 3556)]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa27/rpc-transport/socket.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa27/rpc-transport/socket.so
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa27/xlator/storage/posix.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa27/xlator/storage/posix.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa27/xlator/features/access-control.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa27/xlator/features/access-control.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa27/xlator/features/locks.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa27/xlator/features/locks.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/io-threads.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa27/xlator/performance/io-threads.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa27/xlator/features/index.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa27/xlator/features/index.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa27/xlator/features/marker.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa27/xlator/features/marker.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa27/xlator/debug/io-stats.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa27/xlator/debug/io-stats.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa27/xlator/protocol/server.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa27/xlator/protocol/server.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa27/auth/login.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa27/auth/login.so
Reading symbols from /usr/local/lib/glusterfs/3.3.0qa27/auth/addr.so...done.
Loaded symbols for /usr/local/lib/glusterfs/3.3.0qa27/auth/addr.so
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
0x0000003c380f508e in __lll_lock_wait_private () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64 libgcc-4.4.6-3.el6.x86_64
(gdb) bt
#0  0x0000003c380f508e in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x0000003c3807bc48 in _L_lock_9291 () from /lib64/libc.so.6
#2  0x0000003c380795e2 in malloc () from /lib64/libc.so.6
#3  0x0000003c3807f842 in strdup () from /lib64/libc.so.6
#4  0x0000003c3809bcf1 in tzset_internal () from /lib64/libc.so.6
#5  0x0000003c3809be69 in __tz_convert () from /lib64/libc.so.6
#6  0x00007f51bacee760 in gf_print_trace (signum=11) at ../../../libglusterfs/src/common-utils.c:408
#7  <signal handler called>
#8  0x0000003c3807881e in _int_malloc () from /lib64/libc.so.6
#9  0x0000003c38079288 in calloc () from /lib64/libc.so.6
#10 0x00007f51bad0db8d in __gf_calloc (nmemb=1, size=18, type=45) at ../../../libglusterfs/src/mem-pool.c:145
#11 0x00007f51bacd3798 in _dict_set (this=0x2679110, key=0x2dc36d5 "glusterfs.content", value=0x3e75940)
    at ../../../libglusterfs/src/dict.c:281
#12 0x00007f51bacd3956 in dict_set (this=0x2679110, key=0x2dc36d5 "glusterfs.content", value=0x3e75940)
    at ../../../libglusterfs/src/dict.c:324
#13 0x00007f51bacd85bd in dict_unserialize (orig_buf=0x2dc3610 "", size=221, fill=0x7fff2218fd10) at ../../../libglusterfs/src/dict.c:2671
#14 0x00007f51b5ad7b8b in server_lookup (req=0x7f51b53f4b14) at ../../../../../xlators/protocol/server/src/server3_1-fops.c:4890
#15 0x00007f51baaad0a9 in rpcsvc_handle_rpc_call (svc=0xd3c330, trans=0x16784e0, msg=0xf65fd0) at ../../../../rpc/rpc-lib/src/rpcsvc.c:514
#16 0x00007f51baaad44c in rpcsvc_notify (trans=0x16784e0, mydata=0xd3c330, event=RPC_TRANSPORT_MSG_RECEIVED, data=0xf65fd0)
    at ../../../../rpc/rpc-lib/src/rpcsvc.c:610
#17 0x00007f51baab2da8 in rpc_transport_notify (this=0x16784e0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0xf65fd0)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#18 0x00007f51b77c6270 in socket_event_poll_in (this=0x16784e0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1686
#19 0x00007f51b77c67f4 in socket_event_handler (fd=25, idx=7, data=0x16784e0, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1801
#20 0x00007f51bad0d01c in event_dispatch_epoll_handler (event_pool=0xd10c20, events=0xd2a5a0, i=0) at ../../../libglusterfs/src/event.c:794
#21 0x00007f51bad0d23f in event_dispatch_epoll (event_pool=0xd10c20) at ../../../libglusterfs/src/event.c:856
#22 0x00007f51bad0d5ca in event_dispatch (event_pool=0xd10c20) at ../../../libglusterfs/src/event.c:956
#23 0x0000000000407dbd in main (argc=19, argv=0x7fff22190268) at ../../../glusterfsd/src/glusterfsd.c:1612
(gdb) info thr
  8 Thread 0x7f51b97d3700 (LWP 3556)  0x0000003c380f508e in __lll_lock_wait_private () from /lib64/libc.so.6
  7 Thread 0x7f51b8dd2700 (LWP 3557)  0x0000003c3880b3dc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  6 Thread 0x7f51b83d1700 (LWP 3558)  0x0000003c3880b3dc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5 Thread 0x7f51b75a6700 (LWP 3560)  0x0000003c380f508e in __lll_lock_wait_private () from /lib64/libc.so.6
  4 Thread 0x7f51b538d700 (LWP 3570)  0x0000003c3880b3dc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3 Thread 0x7f51b518b700 (LWP 3572)  0x0000003c3880b75b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 0x7f519fdfd700 (LWP 3594)  0x0000003c3880b75b in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1 Thread 0x7f51ba87f700 (LWP 3555)  0x0000003c380f508e in __lll_lock_wait_private () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003c380f508e in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x0000003c3807bc48 in _L_lock_9291 () from /lib64/libc.so.6
#2  0x0000003c380795e2 in malloc () from /lib64/libc.so.6
#3  0x0000003c3807f842 in strdup () from /lib64/libc.so.6
#4  0x0000003c3809bcf1 in tzset_internal () from /lib64/libc.so.6
#5  0x0000003c3809be69 in __tz_convert () from /lib64/libc.so.6
#6  0x00007f51bacee760 in gf_print_trace (signum=11) at ../../../libglusterfs/src/common-utils.c:408
#7  <signal handler called>
#8  0x0000003c3807881e in _int_malloc () from /lib64/libc.so.6
#9  0x0000003c38079288 in calloc () from /lib64/libc.so.6
#10 0x00007f51bad0db8d in __gf_calloc (nmemb=1, size=18, type=45) at ../../../libglusterfs/src/mem-pool.c:145
#11 0x00007f51bacd3798 in _dict_set (this=0x2679110, key=0x2dc36d5 "glusterfs.content", value=0x3e75940)
    at ../../../libglusterfs/src/dict.c:281
#12 0x00007f51bacd3956 in dict_set (this=0x2679110, key=0x2dc36d5 "glusterfs.content", value=0x3e75940)
    at ../../../libglusterfs/src/dict.c:324
#13 0x00007f51bacd85bd in dict_unserialize (orig_buf=0x2dc3610 "", size=221, fill=0x7fff2218fd10) at ../../../libglusterfs/src/dict.c:2671
#14 0x00007f51b5ad7b8b in server_lookup (req=0x7f51b53f4b14) at ../../../../../xlators/protocol/server/src/server3_1-fops.c:4890
#15 0x00007f51baaad0a9 in rpcsvc_handle_rpc_call (svc=0xd3c330, trans=0x16784e0, msg=0xf65fd0) at ../../../../rpc/rpc-lib/src/rpcsvc.c:514
#16 0x00007f51baaad44c in rpcsvc_notify (trans=0x16784e0, mydata=0xd3c330, event=RPC_TRANSPORT_MSG_RECEIVED, data=0xf65fd0)
    at ../../../../rpc/rpc-lib/src/rpcsvc.c:610
#17 0x00007f51baab2da8 in rpc_transport_notify (this=0x16784e0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0xf65fd0)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#18 0x00007f51b77c6270 in socket_event_poll_in (this=0x16784e0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1686
#19 0x00007f51b77c67f4 in socket_event_handler (fd=25, idx=7, data=0x16784e0, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1801
#20 0x00007f51bad0d01c in event_dispatch_epoll_handler (event_pool=0xd10c20, events=0xd2a5a0, i=0) at ../../../libglusterfs/src/event.c:794
#21 0x00007f51bad0d23f in event_dispatch_epoll (event_pool=0xd10c20) at ../../../libglusterfs/src/event.c:856
#22 0x00007f51bad0d5ca in event_dispatch (event_pool=0xd10c20) at ../../../libglusterfs/src/event.c:956
#23 0x0000000000407dbd in main (argc=19, argv=0x7fff22190268) at ../../../glusterfsd/src/glusterfsd.c:1612
(gdb) f 6
#6  0x00007f51bacee760 in gf_print_trace (signum=11) at ../../../libglusterfs/src/common-utils.c:408
408	                tm    = localtime (&utime);
(gdb) l
403	
404	        {
405	                /* Dump the timestamp of the crash too, so the previous logs
406	                   can be related */
407	                utime = time (NULL);
408	                tm    = localtime (&utime);
409	                strftime (timestr, 256, "%Y-%m-%d %H:%M:%S\n", tm);
410	                ret = write (fd, "time of crash: ", 15);
411	                if (ret < 0)
412	                        goto out;
(gdb) f 10
#10 0x00007f51bad0db8d in __gf_calloc (nmemb=1, size=18, type=45) at ../../../libglusterfs/src/mem-pool.c:145
145	        ptr = calloc (1, tot_size);
(gdb) p ptr
$1 = 0x0
(gdb) t 5
[Switching to thread 5 (Thread 0x7f51b75a6700 (LWP 3560))]#0  0x0000003c380f508e in __lll_lock_wait_private () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003c380f508e in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x0000003c3809c08d in _L_lock_2164 () from /lib64/libc.so.6
#2  0x0000003c3809be47 in __tz_convert () from /lib64/libc.so.6
#3  0x00007f51bacdc6ed in _gf_log (domain=0xd383c0 "mirror-server", 
    file=0x7f51b5ada840 "../../../../../xlators/protocol/server/src/server.c", function=0x7f51b5adb210 "grace_time_handler", line=52, 
    level=GF_LOG_INFO, fmt=0x7f51b5ada88b "grace timer expired") at ../../../libglusterfs/src/logging.c:550
#4  0x00007f51b5ab9bd0 in grace_time_handler (data=0x3dbbc10) at ../../../../../xlators/protocol/server/src/server.c:52
#5  0x00007f51bacf25b4 in gf_timer_proc (ctx=0xcf8010) at ../../../libglusterfs/src/timer.c:177
#6  0x0000003c388077f1 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003c380e592d in clone () from /lib64/libc.so.6
(gdb) f 3
#3  0x00007f51bacdc6ed in _gf_log (domain=0xd383c0 "mirror-server", 
    file=0x7f51b5ada840 "../../../../../xlators/protocol/server/src/server.c", function=0x7f51b5adb210 "grace_time_handler", line=52, 
    level=GF_LOG_INFO, fmt=0x7f51b5ada88b "grace timer expired") at ../../../libglusterfs/src/logging.c:550
550	        tm    = localtime (&tv.tv_sec);
(gdb) l
545	log:
546	        ret = gettimeofday (&tv, NULL);
547	        if (-1 == ret)
548	                goto out;
549	
550	        tm    = localtime (&tv.tv_sec);
551	
552	        pthread_mutex_lock (&logfile_mutex);
553	        {
554	                va_start (ap, fmt);
(gdb) t 8
[Switching to thread 8 (Thread 0x7f51b97d3700 (LWP 3556))]#0  0x0000003c380f508e in __lll_lock_wait_private () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003c380f508e in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x0000003c3807b8c2 in _L_lock_2468 () from /lib64/libc.so.6
#2  0x0000003c380761f5 in mallinfo () from /lib64/libc.so.6
#3  0x00007f51bad13b4d in gf_proc_dump_mem_info () at ../../../libglusterfs/src/statedump.c:220
#4  0x00007f51bad14f3b in gf_proc_dump_info (signum=10) at ../../../libglusterfs/src/statedump.c:654
#5  0x000000000040748c in glusterfs_sigwaiter (arg=0x7fff22190060) at ../../../glusterfsd/src/glusterfsd.c:1353
#6  0x0000003c388077f1 in start_thread () from /lib64/libpthread.so.0
#7  0x0000003c380e592d in clone () from /lib64/libc.so.6
(gdb) f 3
#3  0x00007f51bad13b4d in gf_proc_dump_mem_info () at ../../../libglusterfs/src/statedump.c:220
220	        info = mallinfo ();
(gdb) l
215	{
216	#ifdef HAVE_MALLOC_STATS
217	        struct mallinfo info;
218	
219	        memset (&info, 0, sizeof (struct mallinfo));
220	        info = mallinfo ();
221	
222	        gf_proc_dump_add_section ("mallinfo");
223	        gf_proc_dump_write ("mallinfo_arena", "%d", info.arena);
224	        gf_proc_dump_write ("mallinfo_ordblks", "%d", info.ordblks);
(gdb) quit
A debugging session is active.

	Inferior 1 [process 3555] will be detached.

Quit anyway? (y or n) Detaching from program: /usr/local/sbin/glusterfsd, process 3555

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:

glusterfs server crashed when statedump command was issued.

Expected results:

glusterfs server should not crash.

Additional info:

gluster volume info
 
Volume Name: mirror
Type: Replicate
Volume ID: b28afc4a-5d3b-44c0-b202-590748a6a23d
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.16.156.9:/export/mirror
Brick2: 10.16.156.12:/export/miror
Brick3: 10.16.156.15:/export/mirror
Options Reconfigured:
performance.stat-prefetch: off
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
[root@gqas005 ~]#

Comment 1 Anand Avati 2012-03-18 06:27:36 UTC
CHANGE: http://review.gluster.com/2958 (protocol/server: add and remove the transports from the list, inside the lock) merged in master by Anand Avati (avati)

Comment 2 Anand Avati 2012-03-20 14:49:08 UTC
CHANGE: http://review.gluster.com/2980 (protocol/server: remove the transport from the list irrespective of lock-heal is on/off) merged in master by Vijay Bellur (vijay)

Comment 3 Raghavendra Bhat 2012-05-10 10:46:05 UTC
Checked with glusterfs-3.3.0qa40 and when statedump is issued server did not crash since we are properly doing add/remove of the transports from the list within the lock.


Note You need to log in before you can comment on or make changes to this bug.