Bug 762781 (GLUSTER-1049) - [3.0.5rc9]:client crash in distribute
Summary: [3.0.5rc9]:client crash in distribute
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-1049
Product: GlusterFS
Classification: Community
Component: distribute
Version: 3.0.4
Hardware: All
OS: Linux
high
high
Target Milestone: ---
Assignee: shishir gowda
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-07-05 18:02 UTC by Raghavendra Bhat
Modified: 2015-12-01 16:45 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Raghavendra Bhat 2010-07-05 18:02:16 UTC
The glusterfs client crashed in dht_selfheal_layout_new_directory. This is the backtrace of the core generated.


GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-slackware-linux"...

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /opt/glusterfs/3.0.5rc9/lib/libglusterfs.so.0...done.
Loaded symbols for /opt/glusterfs/3.0.5rc9/lib/libglusterfs.so.0
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libpthread.so.0...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/protocol/client.so...done.
Loaded symbols for /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/protocol/client.so
Reading symbols from /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/cluster/replicate.so...done.
Loaded symbols for /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/cluster/replicate.so
Reading symbols from /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/cluster/distribute.so...done.
Loaded symbols for /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/cluster/distribute.so
Reading symbols from /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/performance/read-ahead.so...done.
Loaded symbols for /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/performance/read-ahead.so
Reading symbols from /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/performance/io-cache.so...done.
Loaded symbols for /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/performance/io-cache.so
Reading symbols from /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/performance/quick-read.so...done.
Loaded symbols for /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/performance/quick-read.so
Reading symbols from /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/performance/write-behind.so...done.
Loaded symbols for /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/performance/write-behind.so
Reading symbols from /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/performance/stat-prefetch.so...done.
Loaded symbols for /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/performance/stat-prefetch.so
Reading symbols from /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/mount/fuse.so...done.
Loaded symbols for /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/xlator/mount/fuse.so
Reading symbols from /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/transport/socket.so...done.
Loaded symbols for /opt/glusterfs/3.0.5rc9/lib/glusterfs/3.0.5rc9/transport/socket.so
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `/opt/glusterfs/3.0.5rc9/sbin/glusterfs -f client.vol /mnt/hd/ -l /f/afr_client.'.
Program terminated with signal 8, Arithmetic exception.
[New process 3959]
[New process 3965]
[New process 3960]
#0  0x00007fb11f6e1df4 in dht_selfheal_layout_new_directory (frame=0x7fb1143b0b90, loc=0x7fb1143b66e8, layout=0x7fb11437b3f0)
    at ../../../../../xlators/cluster/dht/src/dht-selfheal.c:337
337		chunk = ((unsigned long) 0xffffffff) / cnt;
(gdb) bt
#0  0x00007fb11f6e1df4 in dht_selfheal_layout_new_directory (frame=0x7fb1143b0b90, loc=0x7fb1143b66e8, layout=0x7fb11437b3f0)
    at ../../../../../xlators/cluster/dht/src/dht-selfheal.c:337
#1  0x00007fb11f6e231a in dht_selfheal_new_directory (frame=0x7fb1143b0b90, dir_cbk=0x7fb11f6f7c57 <dht_mkdir_selfheal_cbk>, 
    layout=0x7fb11437b3f0) at ../../../../../xlators/cluster/dht/src/dht-selfheal.c:445
#2  0x00007fb11f6f7ffb in dht_mkdir_cbk (frame=0x7fb1143b0b90, cookie=0x7fb10c031100, this=0x612040, op_ret=-1, op_errno=11, 
    inode=0x0, stbuf=0x7fb10c028dc0, preparent=0x7fb10c028ee0, postparent=0x7fb10c028f70)
    at ../../../../../xlators/cluster/dht/src/dht-common.c:3058
#3  0x00007fb11f92048c in afr_mkdir_unwind (frame=0x7fb10c015ac8, this=0x611de0)
    at ../../../../../xlators/cluster/afr/src/afr-dir-write.c:644
#4  0x00007fb11f920a93 in afr_mkdir_done (frame=0x7fb10c015ac8, this=0x611de0)
    at ../../../../../xlators/cluster/afr/src/afr-dir-write.c:777
#5  0x00007fb11f9330d2 in afr_unlock (frame=0x7fb10c015ac8, this=0x611de0)
    at ../../../../../xlators/cluster/afr/src/afr-transaction.c:551
#6  0x00007fb11f936e40 in afr_lock_rec (frame=0x7fb10c015ac8, this=0x611de0, child_index=2)
    at ../../../../../xlators/cluster/afr/src/afr-transaction.c:1268
#7  0x00007fb11f936728 in afr_lock_cbk (frame=0x7fb10c015ac8, cookie=0x1, this=0x611de0, op_ret=-1, op_errno=2)
    at ../../../../../xlators/cluster/afr/src/afr-transaction.c:1118
#8  0x00007fb11fb7ed3e in client_entrylk_cbk (frame=0x7fb10c03bc10, hdr=0x7fb10c02c610, hdrlen=108, iobuf=0x0)
    at ../../../../../xlators/protocol/client/src/client-protocol.c:5484
#9  0x00007fb11fb819e5 in protocol_client_interpret (this=0x610f90, trans=0x617a20, hdr_p=0x7fb10c02c610 "", hdrlen=108, iobuf=0x0)
    at ../../../../../xlators/protocol/client/src/client-protocol.c:6571
#10 0x00007fb11fb826ab in protocol_client_pollin (this=0x610f90, trans=0x617a20)
    at ../../../../../xlators/protocol/client/src/client-protocol.c:6869
#11 0x00007fb11fb82d1f in notify (this=0x610f90, event=2, data=0x617a20)
    at ../../../../../xlators/protocol/client/src/client-protocol.c:6988
#12 0x00007fb120d342fa in xlator_notify (xl=0x610f90, event=2, data=0x617a20) at ../../../libglusterfs/src/xlator.c:929
#13 0x00007fb11e064257 in socket_event_poll_in (this=0x617a20) at ../../../../transport/socket/src/socket.c:771
#14 0x00007fb11e064551 in socket_event_handler (fd=14, idx=1, data=0x617a20, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../transport/socket/src/socket.c:871
#15 0x00007fb120d58f1f in event_dispatch_epoll_handler (event_pool=0x60a320, events=0x61b4f0, i=0)
    at ../../../libglusterfs/src/event.c:804
#16 0x00007fb120d590ee in event_dispatch_epoll (event_pool=0x60a320) at ../../../libglusterfs/src/event.c:867
#17 0x00007fb120d593ff in event_dispatch (event_pool=0x60a320) at ../../../libglusterfs/src/event.c:975
#18 0x000000000040634b in main (argc=6, argv=0x7fff071417c8) at ../../../glusterfsd/src/glusterfsd.c:1425
(gdb) f 0
#0  0x00007fb11f6e1df4 in dht_selfheal_layout_new_directory (frame=0x7fb1143b0b90, loc=0x7fb1143b66e8, layout=0x7fb11437b3f0)
    at ../../../../../xlators/cluster/dht/src/dht-selfheal.c:337
337		chunk = ((unsigned long) 0xffffffff) / cnt;
(gdb) p cnt
$1 = 0
(gdb) q


Here divide by zero is happening. cnt should be checked before dividing.

How it occured:


distribured replicate setup (4 servers)

3 clients (2 with flush-behind option on in write-behind and one with off)

In one on client untarring of linux kernel in a loop

In other on client "find . | xargs stat" in a loop

In off client "rm -rf <linux kernel>" in a loop and "ls -lR".


While all of these were happening simultaneously I did server down  and up.

Comment 1 shishir gowda 2010-08-12 04:35:22 UTC
Resolved in fix 966 (clang fixes)


Note You need to log in before you can comment on or make changes to this bug.