Bug 763235 (GLUSTER-1503)
Summary: | segfault in distribute during failover testing | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Harshavardhana <fharshav> |
Component: | protocol | Assignee: | Shehjar Tikoo <shehjart> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | nfs-alpha | CC: | cww, gluster-bugs, shehjart |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | Type: | --- | |
Regression: | RTP | Mount Type: | nfs |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Harshavardhana
2010-09-01 23:38:46 UTC
[2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-3: connection to 10.1.100.31:9697 failed (Connection refused) [2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-2: connection to 10.1.100.31:9697 failed (Connection refused) [2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-2: connection to 10.1.100.31:9697 failed (Connection refused) [2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-4: connection to 10.1.100.31:9697 failed (Connection refused) [2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-4: connection to 10.1.100.31:9697 failed (Connection refused) [2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-5: connection to 10.1.100.31:9697 failed (Connection refused) [2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-6: connection to 10.1.100.31:9697 failed (Connection refused) [2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-5: connection to 10.1.100.31:9697 failed (Connection refused) [2010-09-02 02:24:47] E [socket.c:762:socket_connect_finish] 10.1.100.31-6: connection to 10.1.100.31:9697 failed (Connection refused) pending frames: patchset: v3.0.0-252-g17efe56 signal received: 11 time of crash: 2010-09-02 02:24:47 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs nfs_beta_rc11 /lib64/libc.so.6[0x3df28332f0] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/cluster/distribute.so(dht_iatt_merge+0x1d)[0x7fc3a64554ad] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/cluster/distribute.so(dht_selfheal_dir_mkdir_cbk+0x94)[0x7fc3a6457634] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/protocol/client.so(client_mkdir+0x261)[0x7fc3a668c881] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/cluster/distribute.so(dht_selfheal_dir_mkdir+0x293)[0x7fc3a6457283] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/cluster/distribute.so(dht_selfheal_restore+0x52)[0x7fc3a64573b2] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/cluster/distribute.so(dht_rmdir_cbk+0x3d6)[0x7fc3a6464b56] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/protocol/client.so(client_rmdir_cbk+0xe2)[0x7fc3a6693d42] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/protocol/client.so(protocol_client_pollin+0xca)[0x7fc3a667e70a] /usr/lib64/glusterfs/nfs_beta_rc11/xlator/protocol/client.so(notify+0xe8)[0x7fc3a6684d28] /usr/lib64/libglusterfs.so.0(xlator_notify+0x43)[0x3df2c14863] /usr/lib64/glusterfs/nfs_beta_rc11/transport/socket.so(socket_event_handler+0xc8)[0x7fc3a52071d8] /usr/lib64/libglusterfs.so.0[0x3df2c303cd] glusterfsd(main+0x892)[0x404272] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3df281ea4d] glusterfsd[0x4027e9] Use mount type as nfs so I can differentiate bugs found on NFS mount vs FUSE mount. Use version as nfs-alpha. We havent released nfs beta, just release candidates. (In reply to comment #1) .... .... .... > Program terminated with signal 11, Segmentation fault. > #0 dht_iatt_merge (this=0x1376d30, to=0x7fc3a01a9f50, from=0x13b90, > subvol=0x136c5f0) at dht-helper.c:349 > 349 to->ia_dev = from->ia_dev; > Missing separate debuginfos, use: debuginfo-install glibc-2.10.2-1.x86_64 > libgcc-4.4.1-2.fc11.x86_64 > (gdb) bt > #0 dht_iatt_merge (this=0x1376d30, to=0x7fc3a01a9f50, from=0x13b90, > subvol=0x136c5f0) at dht-helper.c:349 > #1 0x00007fc3a6457634 in dht_selfheal_dir_mkdir_cbk (frame=0x7fc3a01abc50, > cookie=0x14b3480, this=0x1376d30, op_ret=<value optimized out>, > op_errno=<value optimized out>, inode=<value optimized out>, stbuf=0x0, > preparent=0x3df2b69e80, postparent=0x13b90) at dht-selfheal.c:220 > #2 0x00007fc3a668c881 in client_mkdir (frame=0x14b3480, this=<value optimized > out>, loc=0x7fc3a01a9ce8, mode=<value optimized out>) > at client-protocol.c:1062 The STACK_UNWIND on a failed client_mkdir is not being called with all mkdir_cbk args with NULL, hence postparent gets crap in its pointer in dht. Harsha, there are two parts to the fixes, one in dht and the other in proto/client. Both will be available in rc13 later today. Only the dht patch is relevant for mainline. The bug in proto/client does not exist anymore in mainline code. PATCH: http://patches.gluster.com/patch/4472 in master (cluster/dht: check for op_ret in dht_selfheal_dir_mkdir_cbk ()) |