Bug 761843 (GLUSTER-111)

Summary: Both servers aborted in simple-afr configuration
Product: [Community] GlusterFS Reporter: Vikas Gorur <vikas>
Component: coreAssignee: Anand Avati <aavati>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: mainlineCC: chrisw, gluster-bugs, vinayak
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTNR Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Vikas Gorur 2009-07-06 12:39:31 UTC
Setup is /share/meter/config/simple-afr

2-servers on brick1 and brick2 and client on brick4.

After a few dd's, both servers crashed with the identical backtrace:

#0  0x0000003788a30155 in raise () from /lib64/libc.so.6
#1  0x0000003788a31bf0 in abort () from /lib64/libc.so.6
#2  0x0000003788a6a3db in __libc_message () from /lib64/libc.so.6
#3  0x0000003788a71d21 in _int_malloc () from /lib64/libc.so.6
#4  0x0000003788a72bb3 in calloc () from /lib64/libc.so.6
#5  0x00002b738b933c41 in gf_dirent_for_name (name=0x1f3c0f0b "dd-write-16k")
    at gf-dirent.c:57
#6  0x00002b738c3c17e2 in posix_readdir (frame=0x2aaaac004a00, this=0x1f3b9ea0, 
    fd=<value optimized out>, size=4096, off=0) at posix.c:3588
#7  0x00002b738b91b839 in default_readdir (frame=0x2aaaac006dc0, this=0x1f3ba850, 
    fd=0x2aaaac003260, size=4096, off=0) at defaults.c:1400
#8  0x00002b738c7df3b0 in iot_readdir_wrapper (frame=0x2aaaac0051e0, this=0x1f3bb120, 
    fd=0x2aaaac003260, size=4096, offset=0) at io-threads.c:1689
#9  0x00002b738b92ab05 in call_resume_wind (stub=0x2aaaac003ae0) at call-stub.c:2661
#10 0x00002b738b92ec8c in call_resume (stub=0x2aaaac003ae0) at call-stub.c:4169
#11 0x00002b738c7df008 in iot_worker_ordered (arg=<value optimized out>)
    at io-threads.c:1943
#12 0x0000003789206307 in start_thread () from /lib64/libpthread.so.0
#13 0x0000003788ad1ded in clone () from /lib64/libc.so.6

Core files, logs, etc. available at /share/tickets/<this ticket number>

Comment 1 Anand Avati 2009-07-07 01:16:16 UTC
This bug is suspected to be a duplicate of #102. Please confirm if this can be reproduced with the latest master snapshot.

Comment 2 Anand Avati 2009-07-16 00:47:30 UTC
PATCH: http://patches.gluster.com/patch/721/ (Support for option transport.socket.nodelay)

Comment 3 Anand Avati 2009-07-16 00:51:12 UTC
(In reply to comment #2)
> PATCH: http://patches.gluster.com/patch/721/ (Support for option
> transport.socket.nodelay)

This was my mistake :-) the patch and bug are unrelated!

Comment 4 Anand Avati 2009-07-28 16:47:52 UTC
The following patch fixes this crash: 

http://patches.gluster.com/patch/672/ (mem-pool: Do not perform chunkhead2ptr on MALLOCed memory)