NFS volume config is standard config generated from volgen with "namelookup" turned off in NFS/server. Backtrace =================== Program terminated with signal 11, Segmentation fault. #0 dht_iatt_merge (this=0x1376d30, to=0x7fc3a01a9f50, from=0x13b90, subvol=0x136c5f0) at dht-helper.c:349 349 to->ia_dev = from->ia_dev; Missing separate debuginfos, use: debuginfo-install glibc-2.10.2-1.x86_64 libgcc-4.4.1-2.fc11.x86_64 (gdb) bt #0 dht_iatt_merge (this=0x1376d30, to=0x7fc3a01a9f50, from=0x13b90, subvol=0x136c5f0) at dht-helper.c:349 #1 0x00007fc3a6457634 in dht_selfheal_dir_mkdir_cbk (frame=0x7fc3a01abc50, cookie=0x14b3480, this=0x1376d30, op_ret=<value optimized out>, op_errno=<value optimized out>, inode=<value optimized out>, stbuf=0x0, preparent=0x3df2b69e80, postparent=0x13b90) at dht-selfheal.c:220 #2 0x00007fc3a668c881 in client_mkdir (frame=0x14b3480, this=<value optimized out>, loc=0x7fc3a01a9ce8, mode=<value optimized out>) at client-protocol.c:1062 #3 0x00007fc3a6457283 in dht_selfheal_dir_mkdir (frame=0x7fc3a01abc50, loc=<value optimized out>, layout=0x7fc3a01a8870, force=<value optimized out>) at dht-selfheal.c:271 #4 0x00007fc3a64573b2 in dht_selfheal_restore (frame=0x7fc3a01abc50, dir_cbk=<value optimized out>, loc=0x7fc3a01a9ce8, layout=0x7fc3a01a8870) at dht-selfheal.c:533 #5 0x00007fc3a6464b56 in dht_rmdir_cbk (frame=0x7fc3a01abc50, cookie=0x14b1ce0, this=0x1376d30, op_ret=0, op_errno=<value optimized out>, preparent=<value optimized out>, postparent=0x7fff0169e390) at dht-common.c:3380 #6 0x00007fc3a6693d42 in client_rmdir_cbk (frame=0x14b1ce0, hdr=<value optimized out>, hdrlen=<value optimized out>, iobuf=<value optimized out>) at client-protocol.c:4625 #7 0x00007fc3a667e70a in protocol_client_pollin (this=0x136b800, trans=0x1384550) at client-protocol.c:6435 #8 0x00007fc3a6684d28 in notify (this=0x1376d30, event=<value optimized out>, data=0x1384550) at client-protocol.c:6554 #9 0x0000003df2c14863 in xlator_notify (xl=0x136b800, event=2, data=0x1384550) at xlator.c:919 #10 0x00007fc3a52071d8 in socket_event_handler (fd=<value optimized out>, idx=20, data=0x1384550, poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:831 #11 0x0000003df2c303cd in event_dispatch_epoll_handler (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:804 #12 event_dispatch_epoll (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:867 #13 0x0000000000404272 in main (argc=<value optimized out>, argv=<value optimized out>) at glusterfsd.c:1494 =============================
[2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-3: connection to 10.1.100.31:9697 failed (Connection refused) [2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-2: connection to 10.1.100.31:9697 failed (Connection refused) [2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-2: connection to 10.1.100.31:9697 failed (Connection refused) [2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-4: connection to 10.1.100.31:9697 failed (Connection refused) [2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-4: connection to 10.1.100.31:9697 failed (Connection refused) [2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-5: connection to 10.1.100.31:9697 failed (Connection refused) [2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-6: connection to 10.1.100.31:9697 failed (Connection refused) [2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-5: connection to 10.1.100.31:9697 failed (Connection refused) [2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-6: connection to 10.1.100.31:9697 failed (Connection refused) pending frames: patchset: v3.0.0-252-g17efe56 signal received: 11 time of crash: 2010-09-02 02:24:47 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs nfs_beta_rc11 /lib64/libc.so.6[0x3df28332f0] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/cluster/distribute.so(dht_iatt_merge+0x1d)[0x7fc3a64554ad] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/cluster/distribute.so(dht_selfheal_dir_mkdir_cbk+0x94)[0x7fc3a6457634] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/protocol/client.so(client_mkdir+0x261)[0x7fc3a668c881] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/cluster/distribute.so(dht_selfheal_dir_mkdir+0x293)[0x7fc3a6457283] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/cluster/distribute.so(dht_selfheal_restore+0x52)[0x7fc3a64573b2] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/cluster/distribute.so(dht_rmdir_cbk+0x3d6)[0x7fc3a6464b56] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/protocol/client.so(client_rmdir_cbk+0xe2)[0x7fc3a6693d42] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/protocol/client.so(protocol_client_pollin+0xca)[0x7fc3a667e70a] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/protocol/client.so(notify+0xe8)[0x7fc3a6684d28] /usr/lib64/libglusterfs.so.0(xlator_notify+0x43)[0x3df2c14863] /usr/lib64/glusterfs/nfs_beta_rc11/transport/socket.so(socket_event_handler+0xc8)[0x7fc3a52071d8] /usr/lib64/libglusterfs.so.0[0x3df2c303cd] glusterfsd(main+0x892)[0x404272] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3df281ea4d] glusterfsd[0x4027e9]
Use mount type as nfs so I can differentiate bugs found on NFS mount vs FUSE mount. Use version as nfs-alpha. We havent released nfs beta, just release candidates.
(In reply to comment #1) .... .... .... > Program terminated with signal 11, Segmentation fault. > #0 dht_iatt_merge (this=0x1376d30, to=0x7fc3a01a9f50, from=0x13b90, > subvol=0x136c5f0) at dht-helper.c:349 > 349 to->ia_dev = from->ia_dev; > Missing separate debuginfos, use: debuginfo-install glibc-2.10.2-1.x86_64 > libgcc-4.4.1-2.fc11.x86_64 > (gdb) bt > #0 dht_iatt_merge (this=0x1376d30, to=0x7fc3a01a9f50, from=0x13b90, > subvol=0x136c5f0) at dht-helper.c:349 > #1 0x00007fc3a6457634 in dht_selfheal_dir_mkdir_cbk (frame=0x7fc3a01abc50, > cookie=0x14b3480, this=0x1376d30, op_ret=<value optimized out>, > op_errno=<value optimized out>, inode=<value optimized out>, stbuf=0x0, > preparent=0x3df2b69e80, postparent=0x13b90) at dht-selfheal.c:220 > #2 0x00007fc3a668c881 in client_mkdir (frame=0x14b3480, this=<value optimized > out>, loc=0x7fc3a01a9ce8, mode=<value optimized out>) > at client-protocol.c:1062 The STACK_UNWIND on a failed client_mkdir is not being called with all mkdir_cbk args with NULL, hence postparent gets crap in its pointer in dht.
Harsha, there are two parts to the fixes, one in dht and the other in proto/client. Both will be available in rc13 later today. Only the dht patch is relevant for mainline. The bug in proto/client does not exist anymore in mainline code.
PATCH: http://patches.gluster.com/patch/4472 in master (cluster/dht: check for op_ret in dht_selfheal_dir_mkdir_cbk ())