Bug 761762 (GLUSTER-30)

Summary: GlusterFS (2-way replicate) hangs if both servers haven't been started
Product: [Community] GlusterFS Reporter: Vikas Gorur <vikas>
Component: replicateAssignee: Vikas Gorur <vikas>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: mainlineCC: gluster-bugs, gowda
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Basavanagowda Kanur 2009-06-18 11:08:37 UTC
this situation is reached when fuse process thread is not started.

fuse process thread starts only after it gets CHILD_UP notification from its child. in this case, since both servers are down cluster/replicate would not have sent CHILD_UP.

--
Gowda

Comment 1 Vikas Gorur 2009-06-18 13:40:34 UTC
Client: AFR with two remote subvolumes, wb, ioc
Servers: iot, locks, posix

gdb -p of the client process shows:

(gdb) info thr
2 Thread 1084934464 (LWP 24495)  0x0000003b8c8983b1 in nanosleep () from /lib64/libc.so.6
* 1 Thread 48008229728640 (LWP 24494)  0x0000003b8d00a4b6 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

(gdb) thr 1
(gdb) bt
#0  0x0000003b8d00a4b6 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002ba9c95782b9 in event_dispatch_epoll (event_pool=0x13d1690) at event.c:830
#2  0x00002ba9c95786fc in event_dispatch (event_pool=0x13d1690) at event.c:975
#3  0x0000000000405429 in main (argc=7, argv=0x7fffe1565a88) at glusterfsd.c:1200

(gdb) thr 2
(gdb) bt
#0  0x0000003b8c8983b1 in nanosleep () from /lib64/libc.so.6
#1  0x0000003b8c8cbae4 in usleep () from /lib64/libc.so.6
#2  0x00002ba9c95659b1 in gf_timer_proc (ctx=0x13d0010) at timer.c:177
#3  0x0000003b8d006307 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003b8c8d1ded in clone () from /lib64/libc.so.6

(gdb)  p ((call_pool_t *)ctx->pool)->all_frames
$3 = {next = 0x13d0370, prev = 0x13d0370}

Curiously, a breakpoint at fuse_statfs and afr_statfs was not hit,
even though the 'df' process is hung at the statfs call.