Bug 762173 (GLUSTER-441) - booster NFS rexporting distribute volume doesn't respond
Summary: booster NFS rexporting distribute volume doesn't respond
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-441
Product: GlusterFS
Classification: Community
Component: booster
Version: mainline
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Shehjar Tikoo
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-12-05 05:12 UTC by Harshavardhana
Modified: 2015-03-23 01:03 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Harshavardhana 2009-12-05 02:12:35 UTC
Backtrace from the gdb attached to unfsd

#0  sp_cache_remove_entry (cache=0x7fc3480042f0, name=<value optimized out>, remove_all=<value optimized out>)
    at stat-prefetch.c:390
#1  0x00007fc34e5fc610 in sp_cache_free (cache=0x0) at stat-prefetch.c:462
#2  0x00007fc34e5fc662 in sp_fd_ctx_free (fd_ctx=0x7fc3480055d0) at stat-prefetch.c:530
#3  0x00007fc34e5fc6c8 in sp_release (this=0x182cd80, fd=<value optimized out>) at stat-prefetch.c:3842
#4  0x00000031e102c817 in fd_destroy (fd=<value optimized out>) at fd.c:406
#5  fd_unref (fd=<value optimized out>) at fd.c:448
#6  0x00007fc34fcea10a in glusterfs_closedir (dirfd=0x1837140) at libglusterfsclient.c:5680
#7  0x00007fc34ff14285 in closedir (dh=<value optimized out>) at booster.c:1864
#8  0x000000000040c7a4 in inet_addr ()
#9  0x0000000000406a33 in inet_addr ()
#10 0x00000000004092ba in inet_addr ()
#11 0x0000000000403b69 in inet_addr ()
#12 0x0000003b9290a8a9 in svc_getreq_common_internal () from /lib64/libc.so.6
#13 0x0000003b9290a231 in svc_getreq_poll_internal () from /lib64/libc.so.6
#14 0x000000000040479b in inet_addr ()
#15 0x0000003b9281ea2d in __libc_start_main () from /lib64/libc.so.6
#16 0x0000000000402b79 in inet_addr ()

Comment 1 Amar Tumballi 2009-12-05 02:14:58 UTC
some more debugging show'd 

(gdb) p *(sp_cache_t *)0x7fc3480042f0                                                                            
$2 = {table = 0x1837190, expected_offset = 0, lock = 1, miss = 0, hits = 0, ref = 1}
(gdb) fr 0
#0  sp_cache_remove_entry (cache=0x7fc3480042f0, name=<value optimized out>, remove_all=<value optimized out>)
    at stat-prefetch.c:390
390             if (this->private == NULL)
(gdb) p *this
Cannot access memory at address 0x100000000
(gdb) l
385             this = THIS;
386
387             if (this == NULL)
388                     goto out;
389
390             if (this->private == NULL)
391                     goto out;
392
393             priv = this->private;
394

THIS seems to be wrong :O

Comment 2 Harshavardhana 2009-12-05 05:12:04 UTC
This is with ERROR logging. 

[2009-12-05 03:03:13] E [socket.c:760:socket_connect_finish] availtvn4-3: connection to 192.168.101.163:10002 failed (Connection refused)
[2009-12-05 03:03:13] E [socket.c:760:socket_connect_finish] availtvn4-4: connection to 192.168.101.163:10002 failed (Connection refused)
[2009-12-05 03:03:13] E [socket.c:760:socket_connect_finish] availtvn4-4: connection to 192.168.101.163:10002 failed (Connection refused)
[2009-12-05 03:05:26] E [socket.c:760:socket_connect_finish] availtvn1-1: connection to 192.168.101.160:10002 failed (Connection refused)
[2009-12-05 03:05:26] E [socket.c:760:socket_connect_finish] availtvn1-1: connection to 192.168.101.160:10002 failed (Connection refused)
[2009-12-05 03:05:26] E [socket.c:760:socket_connect_finish] availtvn1-2: connection to 192.168.101.160:10002 failed (Connection refused)
[2009-12-05 03:05:26] E [socket.c:760:socket_connect_finish] availtvn1-2: connection to 192.168.101.160:10002 failed (Connection refused)
[2009-12-05 03:05:26] E [socket.c:760:socket_connect_finish] availtvn1-3: connection to 192.168.101.160:10002 failed (Connection refused)
[2009-12-05 03:05:26] E [socket.c:760:socket_connect_finish] availtvn1-3: connection to 192.168.101.160:10002 failed (Connection refused)
[2009-12-05 05:02:37] E [booster.c:2807:chdir] booster: chdir failed: Structure needs cleaning

while the same configuration with "DEBUG" logging

[2009-12-05 05:05:54] N [client-protocol.c:6252:client_setvolume_cbk] availtvn2-1: Connected to 192.168.101.161:10002, attached to remote volume 'brick1'.
[2009-12-05 05:05:54] N [client-protocol.c:6252:client_setvolume_cbk] availtvn2-1: Connected to 192.168.101.161:10002, attached to remote volume 'brick1'.
[2009-12-05 05:05:54] N [client-protocol.c:6252:client_setvolume_cbk] availtvn2-2: Connected to 192.168.101.161:10002, attached to remote volume 'brick2'.
[2009-12-05 05:05:54] N [client-protocol.c:6252:client_setvolume_cbk] availtvn2-2: Connected to 192.168.101.161:10002, attached to remote volume 'brick2'.
[2009-12-05 05:05:54] N [client-protocol.c:6252:client_setvolume_cbk] availtvn2-3: Connected to 192.168.101.161:10002, attached to remote volume 'brick3'.
[2009-12-05 05:05:54] N [client-protocol.c:6252:client_setvolume_cbk] availtvn2-3: Connected to 192.168.101.161:10002, attached to remote volume 'brick3'.
[2009-12-05 05:05:54] D [dht-diskusage.c:71:dht_du_info_cbk] distribute: on subvolume 'availtvn2-1': avail_percent is: 99.00 and avail_space is: 13630190026752
[2009-12-05 05:05:54] D [dht-diskusage.c:71:dht_du_info_cbk] distribute: on subvolume 'availtvn2-1': avail_percent is: 99.00 and avail_space is: 13630190026752
[2009-12-05 05:05:54] D [dht-diskusage.c:71:dht_du_info_cbk] distribute: on subvolume 'availtvn2-2': avail_percent is: 99.00 and avail_space is: 14040060981248
[2009-12-05 05:05:54] D [dht-diskusage.c:71:dht_du_info_cbk] distribute: on subvolume 'availtvn2-3': avail_percen

In first case with LOG message error it didn't segfault but in second case.
it segfaulted.

Comment 3 Amar Tumballi 2009-12-05 06:11:21 UTC
without statprefetch NFS worked fine.. seems likes ld-preload is screwing up some pthreads APIs and THIS (which is based on pthreads) is getting screwed up..

Comment 4 Anand Avati 2009-12-06 08:10:11 UTC
PATCH: http://patches.gluster.com/patch/2582 in master (THIS: set THIS pointers before forget/release/releasedir callbacks)


Note You need to log in before you can comment on or make changes to this bug.