Bug 761762 (GLUSTER-30) - GlusterFS (2-way replicate) hangs if both servers haven't been started
Summary: GlusterFS (2-way replicate) hangs if both servers haven't been started
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-30
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Vikas Gorur
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-18 13:40 UTC by Vikas Gorur
Modified: 2009-06-24 07:27 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Basavanagowda Kanur 2009-06-18 11:08:37 UTC
this situation is reached when fuse process thread is not started.

fuse process thread starts only after it gets CHILD_UP notification from its child. in this case, since both servers are down cluster/replicate would not have sent CHILD_UP.

--
Gowda

Comment 1 Vikas Gorur 2009-06-18 13:40:34 UTC
Client: AFR with two remote subvolumes, wb, ioc
Servers: iot, locks, posix

gdb -p of the client process shows:

(gdb) info thr
2 Thread 1084934464 (LWP 24495)  0x0000003b8c8983b1 in nanosleep () from /lib64/libc.so.6
* 1 Thread 48008229728640 (LWP 24494)  0x0000003b8d00a4b6 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

(gdb) thr 1
(gdb) bt
#0  0x0000003b8d00a4b6 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00002ba9c95782b9 in event_dispatch_epoll (event_pool=0x13d1690) at event.c:830
#2  0x00002ba9c95786fc in event_dispatch (event_pool=0x13d1690) at event.c:975
#3  0x0000000000405429 in main (argc=7, argv=0x7fffe1565a88) at glusterfsd.c:1200

(gdb) thr 2
(gdb) bt
#0  0x0000003b8c8983b1 in nanosleep () from /lib64/libc.so.6
#1  0x0000003b8c8cbae4 in usleep () from /lib64/libc.so.6
#2  0x00002ba9c95659b1 in gf_timer_proc (ctx=0x13d0010) at timer.c:177
#3  0x0000003b8d006307 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003b8c8d1ded in clone () from /lib64/libc.so.6

(gdb)  p ((call_pool_t *)ctx->pool)->all_frames
$3 = {next = 0x13d0370, prev = 0x13d0370}

Curiously, a breakpoint at fuse_statfs and afr_statfs was not hit,
even though the 'df' process is hung at the statfs call.


Note You need to log in before you can comment on or make changes to this bug.