Bug 761947 (GLUSTER-215)

Summary: crash on ib-verbs in 2.0.6-rc4
Product: [Community] GlusterFS Reporter: Christian Marnitz <christian.marnitz>
Component: ib-verbsAssignee: Raghavendra G <raghavendra>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: mainlineCC: gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Logfile of this node-2
none
gdb result of node-2 none

Description Christian Marnitz 2009-08-15 05:59:28 UTC
Created attachment 58 [details]
Patch for inn.spec file

Comment 1 Christian Marnitz 2009-08-15 06:00:01 UTC
Created attachment 59 [details]
Broken XF86Config file generated by installer

Comment 2 Christian Marnitz 2009-08-15 08:58:39 UTC
gdb /usr/local/sbin/glusterfsd -c /core.21040
---------------------------------------------------------------------------------
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /usr/local/lib/libglusterfs.so.0...done.
Loaded symbols for /usr/local/lib/libglusterfs.so.0
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/local/lib/glusterfs/2.0.6rc4/xlator/storage/posix.so...done.
Loaded symbols for /usr/local/lib/glusterfs/2.0.6rc4/xlator/storage/posix.so
Reading symbols from /usr/local/lib/glusterfs/2.0.6rc4/xlator/features/locks.so...done.
Loaded symbols for /usr/local/lib/glusterfs/2.0.6rc4/xlator/features/locks.so
Reading symbols from /usr/local/lib/glusterfs/2.0.6rc4/xlator/performance/io-threads.so...done.
Loaded symbols for /usr/local/lib/glusterfs/2.0.6rc4/xlator/performance/io-threads.so
Reading symbols from /usr/local/lib/glusterfs/2.0.6rc4/xlator/protocol/server.so...done.
Loaded symbols for /usr/local/lib/glusterfs/2.0.6rc4/xlator/protocol/server.so
Reading symbols from /usr/local/lib/glusterfs/2.0.6rc4/transport/ib-verbs.so...done.
Loaded symbols for /usr/local/lib/glusterfs/2.0.6rc4/transport/ib-verbs.so
Reading symbols from /usr/lib/libibverbs.so.1...done.
Loaded symbols for /usr/lib/libibverbs.so.1
Reading symbols from /usr/lib/libipathverbs-rdmav2.so...done.
Loaded symbols for /usr/lib/libipathverbs-rdmav2.so
Reading symbols from /usr/lib/libnes-rdmav2.so...done.
Loaded symbols for /usr/lib/libnes-rdmav2.so
Reading symbols from /usr/lib/libmthca-rdmav2.so...done.
Loaded symbols for /usr/lib/libmthca-rdmav2.so
Reading symbols from /usr/lib/libcxgb3-rdmav2.so...done.
Loaded symbols for /usr/lib/libcxgb3-rdmav2.so
Reading symbols from /usr/lib/libmlx4-rdmav2.so...done.
Loaded symbols for /usr/lib/libmlx4-rdmav2.so
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /usr/local/lib/glusterfs/2.0.6rc4/auth/addr.so...done.
Loaded symbols for /usr/local/lib/glusterfs/2.0.6rc4/auth/addr.so
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Core was generated by `/usr/local/sbin/glusterfsd -p /var/run/glusterfsd.pid -f /etc/glusterfs/gluster'.
Program terminated with signal 11, Segmentation fault.
[New process 21049]
[New process 21054]
[New process 21053]
[New process 21050]
[New process 21048]
[New process 21047]
[New process 21046]
[New process 21045]
[New process 21044]
[New process 21043]
[New process 21041]
[New process 21040]
#0  inode_ref (inode=0x0) at inode.c:393
393          pthread_mutex_lock (&table->lock);
(gdb) bt
#0  inode_ref (inode=0x0) at inode.c:393
#1  0x00007fc314aa4411 in server_link_resume (frame=0x7fc30c04dda0, this=0x60fa50, oldloc=0x7fc30c03a7d0, newloc=0x7fc30c03a7f8) at server-protocol.c:6069
#2  0x00007fc31628a09c in call_resume (stub=0x7fc30c03a7a0) at call-stub.c:2363
#3  0x00007fc314ab1b52 in server_stub_resume (stub=0x7fc30c03a7a0, op_ret=6355536, op_errno=22, inode=0x7fc30c04a0f0, parent=0x620730) at server-protocol.c:3376
#4  0x00007fc314ab39d7 in __do_path_resolve_cbk (frame=0x7fc30c01fa50, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=22, inode=0x7fc30c04a0f0,
    stbuf=0x41738fd0, dict=0x0) at server-dentry.c:265
#5  0x00007fc314cc2ce4 in iot_lookup_cbk (frame=<value optimized out>, cookie=<value optimized out>, this=<value optimized out>, op_ret=201566200, op_errno=6424368, inode=0x16,
    buf=0x41738fd0, xattr=0x0) at io-threads.c:222
#6  0x00007fc316280a64 in default_lookup_cbk (frame=<value optimized out>, cookie=<value optimized out>, this=<value optimized out>, op_ret=201566200, op_errno=6424368,
    inode=0x16, buf=0x41738fd0, dict=0x0) at defaults.c:46
#7  0x00007fc3150df9a3 in posix_lookup (frame=0x64a110, this=0x60df60, loc=0x7fc30c01f850, xattr_req=0x0) at posix.c:308
#8  0x00007fc316283fa5 in default_lookup (frame=0x65bd70, this=0x60e930, loc=0x7fc30c01f850, xattr_req=0x0) at defaults.c:61
#9  0x00007fc314cc6115 in iot_lookup_wrapper (frame=0x7fc30c00e070, this=0x60f150, loc=0x7fc30c01f850, xattr_req=0x0) at io-threads.c:231
#10 0x00007fc31628a303 in call_resume (stub=0x7fc30c01f820) at call-stub.c:2633
#11 0x00007fc314cc3dbe in iot_worker_unordered (arg=<value optimized out>) at io-threads.c:2058
#12 0x00007fc315e523f7 in start_thread () from /lib/libpthread.so.0
#13 0x00007fc315bc1b3d in clone () from /lib/libc.so.6
#14 0x0000000000000000 in ?? ()

--

what have we done.... 

we have 8 servers with the attached config. the client use replicate over 1+2, 3+4, 5+6, 7+8 and then distribute over this 4 replicate-groups. the client is an ftp-server. so we have uploaded over gigabit 2500 small files around 750 kb/each and the ftp-server stores it in the glusterfs-mountpoint. the crash below we had only in node-2. all others are fine.

greetings
christian

Comment 3 Raghavendra G 2009-08-27 06:10:59 UTC
in server_stub_resume, switch (stub->fop) is hitting default case (server-protocol.c: 3376), whereas in call_resume on same stub, switch (stub->fop) is hitting GF_FOP_LINK (call-stub.c: 2363) correctly. Its quite strange since same stub is being passed to call_resume from server_stub_resume.

If gdb is displaying the line numbers correctly, the oldloc is not filled in server_stub_resume and hence there is a possibility of oldloc.parent being NULL and inode_ref on it results in crash.

Comment 4 Anand Avati 2009-09-14 02:10:26 UTC
PATCH: http://patches.gluster.com/patch/1291 in master (protocol/server: server_stub_resume should check for failure of lookup when oldloc.parent is NULL.)

Comment 5 Anand Avati 2009-09-14 02:10:34 UTC
PATCH: http://patches.gluster.com/patch/1292 in release-2.0 (protocol/server: server_stub_resume should check for failure of lookup when oldloc.parent is NULL.)