| Summary: | Everyday crashes on some glusterfs clients | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Thomas Séglard <tsg> |
| Component: | replicate | Assignee: | Pavan Vilas Sondur <pavan> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | low | ||
| Version: | 3.0.4 | CC: | gluster-bugs, vijay |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Vijay Bellur
2010-07-07 07:42:38 UTC
Replicated setup. 2 servers and 6 clients. On 4 clients everything is ok on 2 other clients I see multiple crashs during the day. On these 2 "bad" clients we are using a script which does many reads in one particular directory (mounted with glusterfs). This script is in PHP and uses threads.
I post below the (same) configuration file used on all clients and full gdb backtrace.
** Servers
[...]
#################
# CONF BK2010
#################
volume posix4
type storage/posix
option directory /data/exports/export_conf2010
end-volume
volume locks4
type features/locks
subvolumes posix4
end-volume
volume brick4
type performance/io-threads
option thread-count 8
subvolumes locks4
end-volume
[...]
** Clients
## file auto generated by /bin/glusterfs-volgen (mount.vol)
# Cmd line:
# $ /bin/glusterfs-volgen --name bkconf2010 --raid 1 10.110.0.41:/data/exports/export_conf2010 10.110.0.4
2:/data/exports/export_conf2010
# RAID 1
# TRANSPORT-TYPE tcp
volume 10.110.0.42-4
type protocol/client
option transport-type tcp
option remote-host 10.110.0.42
option transport.socket.nodelay on
option transport.remote-port 6996
option remote-subvolume brick4
end-volume
volume 10.110.0.41-4
type protocol/client
option transport-type tcp
option remote-host 10.110.0.41
option transport.socket.nodelay on
option transport.remote-port 6996
option remote-subvolume brick4
end-volume
volume mirror-4
type cluster/replicate
subvolumes 10.110.0.41-4 10.110.0.42-4
end-volume
volume readahead4
type performance/read-ahead
option page-count 4
subvolumes mirror-4
end-volume
volume iocache4
type performance/io-cache
option cache-size `echo $(( $(grep 'MemTotal' /proc/meminfo | sed 's/[^0-9]//g') / 5120 ))`MB
option cache-timeout 1
subvolumes readahead4
end-volume
volume quickread4
type performance/quick-read
option cache-timeout 1
option max-file-size 64kB
subvolumes iocache4
end-volume
volume writebehind4
type performance/write-behind
option enable-trickling-writes false
option flush-behind on
option cache-size 4MB
subvolumes quickread4
end-volume
volume statprefetch4
type performance/stat-prefetch
subvolumes writebehind4
end-volume
** Backtrace
shell# gdb /sbin/glusterfs /core.17091
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...
warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib/libglusterfs.so.0...done.
Loaded symbols for /lib/libglusterfs.so.0
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/glusterfs/3.0.4/xlator/protocol/client.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/xlator/protocol/client.so
Reading symbols from /lib/glusterfs/3.0.4/xlator/cluster/replicate.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/xlator/cluster/replicate.so
Reading symbols from /lib/glusterfs/3.0.4/xlator/performance/read-ahead.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/xlator/performance/read-ahead.so
Reading symbols from /lib/glusterfs/3.0.4/xlator/performance/io-cache.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/xlator/performance/io-cache.so
Reading symbols from /lib/glusterfs/3.0.4/xlator/performance/quick-read.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/xlator/performance/quick-read.so
Reading symbols from /lib/glusterfs/3.0.4/xlator/performance/write-behind.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/xlator/performance/write-behind.so
Reading symbols from /lib/glusterfs/3.0.4/xlator/performance/stat-prefetch.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/xlator/performance/stat-prefetch.so
Reading symbols from /lib/glusterfs/3.0.4/xlator/mount/fuse.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/xlator/mount/fuse.so
Reading symbols from /lib/glusterfs/3.0.4/transport/socket.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/transport/socket.so
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Core was generated by `/sbin/glusterfs --log-level=NORMAL --volfile=/etc/glusterfs/bkconf2010.vol /dat'.
Program terminated with signal 11, Segmentation fault.
[New process 17097]
[New process 17091]
[New process 17092]
#0 0x00007f40d3fea114 in ra_fstat (frame=0x7f40c404f5b0, this=0x2534bd0, fd=0x25eaca0)
at read-ahead.c:750
750 read-ahead.c: No such file or directory.
in read-ahead.c
(gdb) thread apply all bt full
Thread 3 (process 17092):
#0 0x00007f40d4f12071 in nanosleep () from /lib/libc.so.6
No symbol table info available.
#1 0x00007f40d4f3d534 in usleep () from /lib/libc.so.6
No symbol table info available.
#2 0x00007f40d5610652 in gf_timer_proc (ctx=0x252d010) at timer.c:177
now = 1278460080777811
now_tv = {tv_sec = 1278460080, tv_usec = 777811}
event = (gf_timer_t *) 0x253c550
reg = (gf_timer_registry_t *) 0x25380a0
__FUNCTION__ = "gf_timer_proc"
#3 0x00007f40d51cdfc7 in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#4 0x00007f40d4f4364d in clone () from /lib/libc.so.6
No symbol table info available.
#5 0x0000000000000000 in ?? ()
No symbol table info available.
Thread 2 (process 17091):
#0 0x00007f40d51d2645 in ?? () from /lib/libpthread.so.0
No symbol table info available.
#1 0x00007f40d3dda73c in ioc_open_cbk (frame=0x7f40c404cf40, cookie=0x7f40c404bd80, this=0x25354b0,
op_ret=0, op_errno=117, fd=0x25903c0) at io-cache.c:474
tmp_ioc_inode = 0
local = (ioc_local_t *) 0x7f40c404bcb0
table = (ioc_table_t *) 0x2538730
ioc_inode = (ioc_inode_t *) 0x0
inode = (inode_t *) 0x7f40cc000bb0
weight = 4294967295
path = 0x7f40c401a9c0 "/extMapping.php"
__FUNCTION__ = "ioc_open_cbk"
#2 0x00007f40d3fe7454 in ra_open_cbk (frame=0x7f40c404bd80, cookie=0x7f40c404bde0, this=0x2534bd0,
op_ret=0, op_errno=117, fd=0x25903c0) at read-ahead.c:116
fn = (fop_open_cbk_t) 0x7f40d3dda698 <ioc_open_cbk>
_parent = (call_frame_t *) 0x7f40c404cf40
old_THIS = (xlator_t *) 0x2534bd0
conf = (ra_conf_t *) 0x2538810
file = (ra_file_t *) 0x7f40c404c400
ret = 0
wbflags = 0
__FUNCTION__ = "ra_open_cbk"
#3 0x00007f40d421a79a in afr_open_cbk (frame=0x7f40c404bde0, cookie=0x0, this=0x2534350, op_ret=0,
op_errno=0, fd=0x25903c0) at afr-open.c:140
fn = (fop_open_cbk_t) 0x7f40d3fe7150 <ra_open_cbk>
_parent = (call_frame_t *) 0x7f40c404bd80
old_THIS = (xlator_t *) 0x2534350
__local = (afr_local_t *) 0x7f40c404d040
__this = (xlator_t *) 0x2534350
local = (afr_local_t *) 0x7f40c404d040
priv = (afr_private_t *) 0x25389c0
child_index = 0
ctx = 139916143515392
fd_ctx = (afr_fd_ctx_t *) 0x7f40c4085700
ret = 0
call_count = 0
__FUNCTION__ = "afr_open_cbk"
#4 0x00007f40d4462f49 in client_open_cbk (frame=0x7f40c401aab0, hdr=0x7f40c404c760, hdrlen=116,
iobuf=0x0) at client-protocol.c:4090
fn = (ret_fn_t) 0x7f40d421a2f5 <afr_open_cbk>
_parent = (call_frame_t *) 0x7f40c404bde0
old_THIS = (xlator_t *) 0x2533dd0
op_ret = 0
op_errno = 0
fd = (fd_t *) 0x25903c0
remote_fd = 7
rsp = (gf_fop_open_rsp_t *) 0x7f40c404c7cc
local = (client_local_t *) 0x7f40c404b850
conf = (client_conf_t *) 0x2538b80
fdctx = (client_fd_ctx_t *) 0x7f40c404c7e0
ino = 36135004
gen = 5480397927761314920
#5 0x00007f40d446ac19 in protocol_client_interpret (this=0x2533dd0, trans=0x2539700,
hdr_p=0x7f40c404c760 "", hdrlen=116, iobuf=0x0) at client-protocol.c:6529
ret = -1
frame = (call_frame_t *) 0x7f40c401aab0
hdr = (gf_hdr_common_t *) 0x7f40c404c760
callid = 292901
type = 4
op = 10
conn = (client_connection_t *) 0x2539b60
__FUNCTION__ = "protocol_client_interpret"
#6 0x00007f40d446b8df in protocol_client_pollin (this=0x2533dd0, trans=0x2539700)
at client-protocol.c:6827
conf = (client_conf_t *) 0x2538b80
ret = 0
iobuf = (struct iobuf *) 0x0
hdr = 0x7f40c404c760 ""
hdrlen = 116
#7 0x00007f40d446bf53 in notify (this=0x2533dd0, event=2, data=0x2539700) at client-protocol.c:6946
i = 0
ret = -1
child_down = 1
was_not_down = 0
trans = (transport_t *) 0x2539700
conn = (client_connection_t *) 0x0
conf = (client_conf_t *) 0x2538b80
parent = (xlator_list_t *) 0x0
__FUNCTION__ = "notify"
#8 0x00007f40d56015dc in xlator_notify (xl=0x2533dd0, event=2, data=0x2539700) at xlator.c:924
old_THIS = (xlator_t *) 0x7f40d58417c0
ret = 0
#9 0x00007f40d337f1ef in socket_event_poll_in (this=0x2539700) at socket.c:731
ret = 0
#10 0x00007f40d337f4e9 in socket_event_handler (fd=9, idx=1, data=0x2539700, poll_in=1, poll_out=0,
poll_err=0) at socket.c:831
this = (transport_t *) 0x2539700
priv = (socket_private_t *) 0x2539a90
ret = 0
#11 0x00007f40d5626203 in event_dispatch_epoll_handler (event_pool=0x252e900, events=0x253b790, i=0)
at event.c:804
event_data = (struct event_data *) 0x253b794
handler = (event_handler_t) 0x7f40d337f432 <socket_event_handler>
data = (void *) 0x2539700
idx = 1
ret = -1
__FUNCTION__ = "event_dispatch_epoll_handler"
#12 0x00007f40d56263d2 in event_dispatch_epoll (event_pool=0x252e900) at event.c:867
events = (struct epoll_event *) 0x253b790
size = 1
i = 0
ret = 1
__FUNCTION__ = "event_dispatch_epoll"
#13 0x00007f40d56266e3 in event_dispatch (event_pool=0x252e900) at event.c:975
ret = -1
__FUNCTION__ = "event_dispatch"
#14 0x00000000004063d3 in main (argc=4, argv=0x7fffdda5e748) at glusterfsd.c:1425
ctx = (glusterfs_ctx_t *) 0x252d010
cmd_args = (cmd_args_t *) 0x252d010
pool = (call_pool_t *) 0x252e9a0
stbuf = {st_dev = 139916439052352, st_ino = 15773923, st_nlink = 4294967487,
st_mode = 3584355008, st_uid = 32576, st_gid = 3584410456, pad0 = 32576, st_rdev = 140736912025280,
st_size = 139916439044096, st_blksize = 4200312, st_blocks = 0, st_atim = {tv_sec = 4429057797,
tv_nsec = 705412764499181569}, st_mtim = {tv_sec = 139916438987952, tv_nsec = 140736912025416},
st_ctim = {tv_sec = 1970169159, tv_nsec = 11}, __unused = {32768, 4, 140736912025416}}
tmp_logfile = '\0' <repeats 1023 times>
tmp_logfile_dyn = 0x0
tmp_logfilebase = 0x0
timestr = '\0' <repeats 255 times>
utime = 1278420147
tm = (struct tm *) 0x0
ret = 63
lim = {rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615}
specfp = (FILE *) 0x252ec60
graph = (xlator_t *) 0x252eec0
trav = (xlator_t *) 0x0
fuse_volume_found = 0
xl_count = 8
process_mode = 1 '\001'
pipe_fd = {6, 7}
gf_success = 0
gf_failure = -1
__FUNCTION__ = "main"
Thread 1 (process 17097):
#0 0x00007f40d3fea114 in ra_fstat (frame=0x7f40c404f5b0, this=0x2534bd0, fd=0x25eaca0)
at read-ahead.c:750
file = (ra_file_t *) 0x7f40c404c400
iter_fd = (fd_t *) 0x25903c0
inode = (inode_t *) 0x7f40cc000bb0
ret = 0
tmp_file = 139916143281152
#1 0x00007f40d5606c1c in default_fstat (frame=0x7f40c404f550, this=0x25354b0, fd=0x25eaca0)
at defaults.c:673
_new = (call_frame_t *) 0x7f40c404f5b0
old_THIS = (xlator_t *) 0x25354b0
tmp_cbk = (fop_fstat_cbk_t) 0x7f40d56069c6 <default_fstat_cbk>
#2 0x00007f40d3bceca5 in qr_fstat (frame=0x7f40c404f4f0, this=0x2535e50, fd=0x25eaca0)
at quick-read.c:1184
_new = (call_frame_t *) 0x7f40c404f550
old_THIS = (xlator_t *) 0x2535e50
tmp_cbk = (fop_fstat_cbk_t) 0x7f40d3bce64f <qr_fstat_cbk>
qr_fd_ctx = (qr_fd_ctx_t *) 0x253f600
need_open = 0 '\0'
can_wind = 1 '\001'
need_unwind = 0 '\0'
value = 39056896
ret = 0
op_ret = -1
op_errno = -1
stub = (call_stub_t *) 0x0
loc = {path = 0x0, name = 0x0, ino = 0, inode = 0x0, parent = 0x0}
path = 0x254f0e0 "/extMapping.php"
flags = 32768
#3 0x00007f40d39be44d in wb_fstat_helper (frame=0x7f40c404c3a0, this=0x25367f0, fd=0x25eaca0)
at write-behind.c:698
_new = (call_frame_t *) 0x7f40c404f4f0
old_THIS = (xlator_t *) 0x25367f0
tmp_cbk = (fop_fstat_cbk_t) 0x7f40d39be170 <wb_fstat_cbk>
#4 0x00007f40d561bacd in call_resume_wind (stub=0x7f40c404f170) at call-stub.c:2607
__FUNCTION__ = "call_resume_wind"
#5 0x00007f40d562180a in call_resume (stub=0x7f40c404f170) at call-stub.c:4304
old_THIS = (xlator_t *) 0x25367f0
__FUNCTION__ = "call_resume"
#6 0x00007f40d39c15ce in wb_resume_other_requests (frame=0x7f40c404c3a0, file=0x254f1c0,
other_requests=0x42656e20) at write-behind.c:1621
ret = 0
request = (wb_request_t *) 0x7f40c404f480
dummy = (wb_request_t *) 0x42656df0
fops_removed = 0
wind = 1 '\001'
stub = (call_stub_t *) 0x7f40c404f170
#7 0x00007f40d39c1678 in wb_do_ops (frame=0x7f40c404c3a0, file=0x254f1c0, winds=0x42656e40,
unwinds=0x42656e30, other_requests=0x42656e20) at write-behind.c:1649
ret = 0
#8 0x00007f40d39c1cef in wb_process_queue (frame=0x7f40c404c3a0, file=0x254f1c0, flush_all=1 '\001')
at write-behind.c:1817
winds = {next = 0x42656e40, prev = 0x42656e40}
unwinds = {next = 0x42656e30, prev = 0x42656e30}
other_requests = {next = 0x7f40c404f4b0, prev = 0x7f40c404f4b0}
size = 131072
conf = (wb_conf_t *) 0x25384d0
count = 1
ret = -1
#9 0x00007f40d39be7b7 in wb_fstat (frame=0x7f40c404c3a0, this=0x25367f0, fd=0x25eaca0)
at write-behind.c:755
file = (wb_file_t *) 0x254f1c0
local = (wb_local_t *) 0x7f40c404c570
tmp_file = 39121344
stub = (call_stub_t *) 0x7f40c404f170
request = (wb_request_t *) 0x7f40c404f480
ret = -1
op_errno = 22
__FUNCTION__ = "wb_fstat"
#10 0x00007f40d5606c1c in default_fstat (frame=0x7f40c404c9f0, this=0x2537270, fd=0x25eaca0)
at defaults.c:673
_new = (call_frame_t *) 0x7f40c404c3a0
old_THIS = (xlator_t *) 0x2537270
tmp_cbk = (fop_fstat_cbk_t) 0x7f40d56069c6 <default_fstat_cbk>
#11 0x00007f40d358cf34 in fuse_getattr (this=0x252eec0, finh=0x7f40c404cd40, msg=0x7f40c404cd68)
at fuse-bridge.c:812
_new = (call_frame_t *) 0x7f40c404c9f0
old_THIS = (xlator_t *) 0x252eec0
tmp_cbk = (fop_fstat_cbk_t) 0x7f40d358c057 <fuse_attr_cbk>
frame = (call_frame_t *) 0x7f40c404c988
xl = (xlator_t *) 0x2537270
state = (fuse_state_t *) 0x7f40c404c830
fd = (fd_t *) 0x25eaca0
ret = 0
__FUNCTION__ = "fuse_getattr"
#12 0x00007f40d3598cb9 in fuse_thread_proc (data=0x252eec0) at fuse-bridge.c:3190
mount_point = 0x0
this = (xlator_t *) 0x252eec0
priv = (fuse_private_t *) 0x2538280
res = 56
iobuf = (struct iobuf *) 0x252e8c8
finh = (fuse_in_header_t *) 0x7f40c404cd40
iov_in = {{iov_base = 0x7f40c404cd40, iov_len = 80}, {iov_base = 0x7f40d4e54000,
iov_len = 131072}}
msg = (void *) 0x7f40c404cd68
msg0_size = 168
ret = 0
now = {tv_sec = 1278420147, tv_usec = 623243}
timeout = {tv_sec = 1278420148, tv_nsec = 623243000}
__FUNCTION__ = "fuse_thread_proc"
#13 0x00007f40d51cdfc7 in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#14 0x00007f40d4f4364d in clone () from /lib/libc.so.6
No symbol table info available.
#15 0x0000000000000000 in ?? ()
No symbol table info available.
(En réponse au commentaire 1)
> This looks to be a duplicate of bug 762600. Bug 868 has been fixed in the 3.0.5
> release candidates. Can you please check if your crash is fixed by 3.0.5rc9 or
> by applying this patch on to GlusterFS 3.0.4?
>
> http://patches.gluster.com/patch/3213
I applied your patch on glusterfs 3.0.4 and now everything is perfect. I set this ticket has "solved". Thanks !
|