Bug 763586 (GLUSTER-1854) - GlusterFS 3.1 beta crashes over Infiniband with rdma issue
Summary: GlusterFS 3.1 beta crashes over Infiniband with rdma issue
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-1854
Product: GlusterFS
Classification: Community
Component: rdma
Version: 3.1-beta
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-10-07 16:52 UTC by Jacob Shucart
Modified: 2010-10-27 11:43 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Server logs. (43.31 KB, application/octet-stream)
2010-10-07 13:54 UTC, Jacob Shucart
no flags Details
this is /etc/glusterfs (95.38 KB, application/octet-stream)
2010-10-07 15:21 UTC, Jacob Shucart
no flags Details

Description Jacob Shucart 2010-10-07 13:54:02 UTC
Created attachment 339


Attached is the server log.

Comment 1 Vijay Bellur 2010-10-07 14:13:49 UTC
Is a core file available? It should be normally in /. If yes, would like to see
a backtrace from the core.

Thanks,
Vijay

Comment 2 Jacob Shucart 2010-10-07 15:21:08 UTC
Created attachment 340

Comment 3 Mike Robbert 2010-10-07 15:57:26 UTC
Vijay,
I found the core file. It is 500Mb and when I tried to upload it gave an error after taking close to an hour to upload. I have installed gdb and run a bt on the file.

(gdb) bt
#0  0x0000003f5da2e21d in raise () from /lib64/tls/libc.so.6
#1  0x0000003f5da2fa1e in abort () from /lib64/tls/libc.so.6
#2  0x0000003f5da27ae1 in __assert_fail () from /lib64/tls/libc.so.6
#3  0x00002b2f1d4ee537 in mem_put (pool=Variable "pool" is not available.
) at mem-pool.c:463
#4  0x00002aaaabb1379d in rdma_handshake_pollerr (this=0x72dd948) at rdma.c:314
#5  0x00002aaaabb13b3e in rdma_event_handler (fd=Variable "fd" is not available.
) at rdma.c:4358
#6  0x00002b2f1d4ed7cb in event_dispatch_epoll (event_pool=0x7248048) at event.c:812
#7  0x0000000000405416 in main ()

Let me know if there is any more information I can pull from this file or if there is a better place I can put the file.

Mike

(In reply to comment #2)
> Is a core file available? It should be normally in /. If yes, would like to see
> a backtrace from the core.
> 
> Thanks,
> Vijay

Comment 4 Vijay Bellur 2010-10-07 16:01:26 UTC
Thanks, Mike. This should do for now. Will revert back if we need more information. If you could preserve the core till we root cause this, that would be great.

Thanks,
Vijay

Comment 5 Jacob Shucart 2010-10-07 16:52:31 UTC
The servers are Dell PE1950 w/ 16G of RAM running RHEL 4 Update 5. The backend storage is DDN S2A9550 controller pair attached via Fibre Channel. 1 brick per server. The Infiniband card is:

0c:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20)

[root@iosrv-7-1 ~]# gluster volume info

Volume Name: gluster-test
Type: Distribute
Status: Started
Number of Bricks: 4
Transport-type: rdma
Bricks:
Brick1: iosrv-7-3:/mnt/brick3
Brick2: iosrv-7-4:/mnt/brick4
Brick3: 172.16.2.4:/mnt/brick1
Brick4: iosrv-7-2:/mnt/brick2

The clients are all Dell, mostly PE1950 with either 16 or 32 GB of RAM and they are running RHEL 5.4. 

I have attached the current log file from /var/log/glusterfs/ on the head node. My most recent action was to stop the volume, unmount everywhere, kill any remaining glusterfs processes on the servers, then restart the server and mount the client on the head node of the cluster. ls of the mount is hung.

The server process just crashed with the following log:

[2010-10-07 09:37:00.63148] E [rdma.c:4350:rdma_event_handler] rpc-transport/rdma: gluster-test-client
-0: pollin received on tcp socket (peer: 172.16.2.5:6971) after handshake is complete pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)

patchset: v3.1.0qa7-389-gc5a5fea
signal received: 6
time of crash: 2010-10-07 09:37:00
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.1.0beta
/lib64/tls/libc.so.6[0x3f5da2e2b0]
/lib64/tls/libc.so.6(gsignal+0x3d)[0x3f5da2e21d]
/lib64/tls/libc.so.6(abort+0xfe)[0x3f5da2fa1e]
/lib64/tls/libc.so.6(__assert_fail+0xf1)[0x3f5da27ae1]
/usr/lib64/libglusterfs.so.0[0x2b2f1d4ee537]
/usr/lib64/glusterfs/3.1.0beta/rpc-transport/rdma.so[0x2aaaabb1379d]
/usr/lib64/glusterfs/3.1.0beta/rpc-transport/rdma.so[0x2aaaabb13b3e]
/usr/lib64/libglusterfs.so.0[0x2b2f1d4ed7cb]
/usr/sbin/glusterfs(main+0x326)[0x405416]
/lib64/tls/libc.so.6(__libc_start_main+0xdb)[0x3f5da1c3fb]
/usr/sbin/glusterfs[0x40313a]
---------

Comment 6 Raghavendra G 2010-10-08 09:56:27 UTC
Hi Mike,

Can you attach gdb and execute command 'thr apply all bt full'? This command will give us more information, which will help us in debugging.

regards,
Raghavendra
(In reply to comment #4)
> Vijay,
> I found the core file. It is 500Mb and when I tried to upload it gave an error
> after taking close to an hour to upload. I have installed gdb and run a bt on
> the file.
> 
> (gdb) bt
> #0  0x0000003f5da2e21d in raise () from /lib64/tls/libc.so.6
> #1  0x0000003f5da2fa1e in abort () from /lib64/tls/libc.so.6
> #2  0x0000003f5da27ae1 in __assert_fail () from /lib64/tls/libc.so.6
> #3  0x00002b2f1d4ee537 in mem_put (pool=Variable "pool" is not available.
> ) at mem-pool.c:463
> #4  0x00002aaaabb1379d in rdma_handshake_pollerr (this=0x72dd948) at rdma.c:314
> #5  0x00002aaaabb13b3e in rdma_event_handler (fd=Variable "fd" is not
> available.
> ) at rdma.c:4358
> #6  0x00002b2f1d4ed7cb in event_dispatch_epoll (event_pool=0x7248048) at
> event.c:812
> #7  0x0000000000405416 in main ()
> 
> Let me know if there is any more information I can pull from this file or if
> there is a better place I can put the file.
> 
> Mike
> 
> (In reply to comment #2)
> > Is a core file available? It should be normally in /. If yes, would like to see
> > a backtrace from the core.
> > 
> > Thanks,
> > Vijay

Comment 7 Raghavendra G 2010-10-08 10:23:18 UTC
Hi Mike,

Can you also give the commandlines you used to start glusterfs (glusterfs, glusterfsd and glusterd) on each of the client and server?

regards,
Raghavendra

Comment 8 Mike Robbert 2010-10-08 13:07:29 UTC
(In reply to comment #7)
> Hi Mike,
> 
> Can you also give the commandlines you used to start glusterfs (glusterfs,
> glusterfsd and glusterd) on each of the client and server?
> 
> regards,
> Raghavendra

Here is the full backtrace:

(gdb) thr apply all bt full

Thread 12 (process 17931):
#0  0x0000003f5e10c28b in do_sigwait () from /lib64/tls/libpthread.so.0
No symbol table info available.
#1  0x0000003f5e10c33d in sigwait () from /lib64/tls/libpthread.so.0
No symbol table info available.
#2  0x0000000000404f9d in ?? ()
No symbol table info available.
#3  0x0000003f5e1060a0 in __free_tcb () from /lib64/tls/libpthread.so.0
No symbol table info available.
#4  0x000000005e10d280 in ?? ()
No symbol table info available.
#5  0x0000000000004a03 in ?? ()
No symbol table info available.
#6  0x0000003f5e10d280 in _thread_db_sizeof_pthread () from /lib64/tls/libpthread.so.0
No symbol table info available.
#7  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 11 (process 17933):
#0  0x0000003f5da8fc45 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
No symbol table info available.
#1  0x0000003f5dac08f8 in usleep () from /lib64/tls/libc.so.6
No symbol table info available.
#2  0x00002b2f1d4dca8d in gf_timer_proc (ctx=0x7246010) at timer.c:182
        now = 1286465819254181
        now_tv = {tv_sec = 1286465819, tv_usec = 254181}
        event = (gf_timer_t *) 0x2aaab9fc4c58
        reg = (gf_timer_registry_t *) 0x724c8b8
        __FUNCTION__ = "gf_timer_proc"
#3  0x0000003f5e106137 in start_thread () from /lib64/tls/libpthread.so.0
No symbol table info available.
#4  0x0000003f5dac7113 in clone () from /lib64/tls/libc.so.6
No symbol table info available.

Thread 10 (process 17934):
#0  0x0000003f5e10b19f in __read_nocancel () from /lib64/tls/libpthread.so.0
No symbol table info available.
#1  0x00000030a930934c in ibv_get_cq_event () from /usr/lib64/libibverbs.so.1
No symbol table info available.
#2  0x00002aaaabb13375 in rdma_send_completion_proc (data=0x724e8a0) at rdma.c:3430
        chan = Variable "chan" is not available.

Thread 9 (process 17935):
#0  0x0000003f5e10b19f in __read_nocancel () from /lib64/tls/libpthread.so.0
No symbol table info available.
#1  0x00000030a930934c in ibv_get_cq_event () from /usr/lib64/libibverbs.so.1
No symbol table info available.
#2  0x00002aaaabb12ee2 in rdma_recv_completion_proc (data=0x7266e10) at rdma.c:3261
        chan = Variable "chan" is not available.

Thread 8 (process 17936):
#0  0x0000003f5e10b19f in __read_nocancel () from /lib64/tls/libpthread.so.0
No symbol table info available.
---Type <return> to continue, or q <return> to quit--- 
#1  0x00000030a930934c in ibv_get_cq_event () from /usr/lib64/libibverbs.so.1
No symbol table info available.
#2  0x00002aaaabb13375 in rdma_send_completion_proc (data=0x725e100) at rdma.c:3430
        chan = Variable "chan" is not available.

Thread 7 (process 17937):
#0  0x0000003f5e10b19f in __read_nocancel () from /lib64/tls/libpthread.so.0
No symbol table info available.
#1  0x00000030a930934c in ibv_get_cq_event () from /usr/lib64/libibverbs.so.1
No symbol table info available.
#2  0x00002aaaabb12ee2 in rdma_recv_completion_proc (data=0x726dfd0) at rdma.c:3261
        chan = Variable "chan" is not available.

Thread 6 (process 17938):
#0  0x0000003f5e10b19f in __read_nocancel () from /lib64/tls/libpthread.so.0
No symbol table info available.
#1  0x00000030a930934c in ibv_get_cq_event () from /usr/lib64/libibverbs.so.1
No symbol table info available.
#2  0x00002aaaabb13375 in rdma_send_completion_proc (data=0x72ba050) at rdma.c:3430
        chan = Variable "chan" is not available.

Thread 5 (process 17939):
#0  0x0000003f5e10b19f in __read_nocancel () from /lib64/tls/libpthread.so.0
No symbol table info available.
#1  0x00000030a930934c in ibv_get_cq_event () from /usr/lib64/libibverbs.so.1
No symbol table info available.
#2  0x00002aaaabb12ee2 in rdma_recv_completion_proc (data=0x72ba070) at rdma.c:3261
        chan = Variable "chan" is not available.

Thread 4 (process 17940):
#0  0x0000003f5e10b19f in __read_nocancel () from /lib64/tls/libpthread.so.0
No symbol table info available.
#1  0x00000030a930934c in ibv_get_cq_event () from /usr/lib64/libibverbs.so.1
No symbol table info available.
#2  0x00002aaaabb13375 in rdma_send_completion_proc (data=0x72befd0) at rdma.c:3430
        chan = Variable "chan" is not available.

Thread 3 (process 17941):
#0  0x0000003f5e10b19f in __read_nocancel () from /lib64/tls/libpthread.so.0
No symbol table info available.
#1  0x00000030a930934c in ibv_get_cq_event () from /usr/lib64/libibverbs.so.1
No symbol table info available.
#2  0x00002aaaabb12ee2 in rdma_recv_completion_proc (data=0x72e1350) at rdma.c:3261
        chan = Variable "chan" is not available.

Thread 2 (process 17944):
#0  0x0000003f5e10aecb in __lll_mutex_lock_wait () from /lib64/tls/libpthread.so.0
No symbol table info available.
#1  0x00000000072dddd0 in ?? ()
No symbol table info available.
#2  0x00002aaabfb010b4 in ?? ()
No symbol table info available.
#3  0x0000003f5e107c94 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
No symbol table info available.
#4  0x0000000000000004 in ?? ()
No symbol table info available.
#5  0x00002aaaba3b2528 in ?? ()
No symbol table info available.
---Type <return> to continue, or q <return> to quit---
#6  0x00000000072ddd10 in ?? ()
No symbol table info available.
#7  0x00002aaabfb01034 in ?? ()
No symbol table info available.
#8  0x00002b2f1d4ef36b in iobref_unref (iobref=0x72dd948) at iobuf.c:531
No locals.
#9  0x00000000ffffffff in ?? ()
No symbol table info available.
#10 0x00000000072e05a8 in ?? ()
No symbol table info available.
#11 0x00002aaaabb1033b in __rdma_ioq_churn_entry (peer=0x2b2f1d4ee483, entry=0x2aaab307b280)
    at rdma.c:1683
        ret = 120446224
        priv = (rdma_private_t *) 0xfffffffffffffffc
        device = (rdma_device_t *) 0x72dddd0
        options = (rdma_options_t *) 0x2aaaba367db8
        post = (rdma_post_t *) 0xfffffffffffffffc
#12 0x00002aaaabb136f4 in rdma_disconnect (this=0x72ddcf8) at rdma.c:3901
        ret = Variable "ret" is not available.

Thread 1 (process 17930):
#0  0x0000003f5da2e21d in raise () from /lib64/tls/libc.so.6
No symbol table info available.
#1  0x0000003f5da2fa1e in abort () from /lib64/tls/libc.so.6
No symbol table info available.
#2  0x0000003f5da27ae1 in __assert_fail () from /lib64/tls/libc.so.6
No symbol table info available.
#3  0x00002b2f1d4ee537 in mem_put (pool=Variable "pool" is not available.
) at mem-pool.c:463
        head = Variable "head" is not available.
(gdb) 

gdb claims this corefile was generated by this command line:
Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster-test --volume-name gluste'.

All I ever did on the servers was run '/etc/init.d/glusterd start'

On the client I used mount with the following fstab entry
iosrv-7-1:gluster-test      /mnt/gluster-test       glusterfs       defaults,_netdev 0 0

Also, here is some IP information that might be useful 
The gluster servers are
iosrv-7-1  eth0 172.16.2.4  ib0 172.16.34.4
iosrv-7-2  eth0 172.16.2.5  ib0 172.16.34.5
iosrv-7-3  eth0 172.16.2.6  ib0 172.16.34.6
iosrv-7-4  eth0 172.16.2.7  ib0 172.16.34.7

The frontend of the cluster is
ra5 eth0 172.16.0.1 ib0 172.16.32.2

other clients will be in the range
eth0 172.16.8.1 -  160 and 172.16.12.1 - 84
ib0  172.16.40.1 - 160 and 172.16.44.1 - 84

Comment 9 Vijay Bellur 2010-10-09 10:37:02 UTC
PATCH: http://patches.gluster.com/patch/5384 in master (rpc-transport/rdma: fix the bug which was causing an ioq-entry being mem_put twice.)

Comment 10 Mike Robbert 2010-10-12 15:18:30 UTC
What is the proper procedure for me to utilize this fix? I do not see the patch in git and in fact I can't even get the git version to startup properly because the startup scripts are pointing to different binaries and configuration files that don't exist. 

Thanks,
Mike

Comment 11 Raghavendra G 2010-10-13 02:47:14 UTC
(In reply to comment #10)
> What is the proper procedure for me to utilize this fix? I do not see the patch
> in git 

The patch went in as commit d1f938c16e309431382ab. 

> and in fact I can't even get the git version to startup properly because
> the startup scripts are pointing to different binaries and configuration files
> that don't exist. 

Can you please give the names of the scripts you are facing problems with?

> 
> Thanks,
> Mike

regards,
Raghavendra

Comment 12 Amar Tumballi 2010-10-13 03:23:30 UTC
> > and in fact I can't even get the git version to startup properly because
> > the startup scripts are pointing to different binaries and configuration files
> > that don't exist. 
> 
> Can you please give the names of the scripts you are facing problems with?
> 

Mike, also can you try with our 3.1.0 release, and see if its fixed ? If fixed, we will mark it as RESOLVED.

Comment 13 Mike Robbert 2010-10-13 12:49:05 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > What is the proper procedure for me to utilize this fix? I do not see the patch
> > in git 
> 
> The patch went in as commit d1f938c16e309431382ab. 

The documentation at http://www.gluster.org/download/gluster-source-code/, which I had to dig to find from the new 3.1 documentation, first pointed me to 
sh$ git clone git://git.sv.gnu.org/gluster.git glusterfs
That repo does not contain the commit that you mention, it doesn't even contain the rdma code. The last commit there is from Aug. 12

> 
> > and in fact I can't even get the git version to startup properly because
> > the startup scripts are pointing to different binaries and configuration files
> > that don't exist. 
> 
> Can you please give the names of the scripts you are facing problems with?

Again using the code listed above the only init.d script that I could find and the one that installed is glusterfsd. That init script doesn't run by default.

Anyways, I later logged into the IRC channel and was pointed to: http://www.gluster.com/community/documentation/index.php/Development_Work_Flow
which contains a link to:
git clone git://git.gluster.com/glusterfs.git ./glusterfs.git
That code contains the patch and correct init scripts so I was able to get it compiled and started. Unfortunately I can't test further because I am now blocked by bug 763587 which I will work on today.
> 
> > 
> > Thanks,
> > Mike
> 
> regards,
> Raghavendra

Comment 14 Raghavendra G 2010-10-25 02:10:08 UTC
Hi Michael,

Did you get a chance to test whether this bug is fixed? If so, we can close the bug.

regards,
Raghavendra

(In reply to comment #13)
> (In reply to comment #11)
> > (In reply to comment #10)
> > > What is the proper procedure for me to utilize this fix? I do not see the patch
> > > in git 
> > 
> > The patch went in as commit d1f938c16e309431382ab. 
> 
> The documentation at http://www.gluster.org/download/gluster-source-code/,
> which I had to dig to find from the new 3.1 documentation, first pointed me to 
> sh$ git clone git://git.sv.gnu.org/gluster.git glusterfs
> That repo does not contain the commit that you mention, it doesn't even contain
> the rdma code. The last commit there is from Aug. 12
> 
> > 
> > > and in fact I can't even get the git version to startup properly because
> > > the startup scripts are pointing to different binaries and configuration files
> > > that don't exist. 
> > 
> > Can you please give the names of the scripts you are facing problems with?
> 
> Again using the code listed above the only init.d script that I could find and
> the one that installed is glusterfsd. That init script doesn't run by default.
> 
> Anyways, I later logged into the IRC channel and was pointed to:
> http://www.gluster.com/community/documentation/index.php/Development_Work_Flow
> which contains a link to:
> git clone git://git.gluster.com/glusterfs.git ./glusterfs.git
> That code contains the patch and correct init scripts so I was able to get it
> compiled and started. Unfortunately I can't test further because I am now
> blocked by bug 763587 which I will work on today.
> > 
> > > 
> > > Thanks,
> > > Mike
> > 
> > regards,
> > Raghavendra

Comment 15 Jacob Shucart 2010-10-25 12:42:24 UTC
Raghavendra,

Please see comment #13 as it says that Michael did not find the fix.  Was this fix included in the released version of GlsuterFS 3.1.0?

Comment 16 Raghavendra G 2010-10-26 02:18:41 UTC
Jacob,

Yes this fix is include in release-3.1.0.

regards,
Raghavendra

Comment 17 Jacob Shucart 2010-10-26 13:26:34 UTC
According to Michael, I asked him if it works with the 3.1.0 release:

Nope.  I'm at a complete stand still because I can not create volume in my current state.

Is there additional information that Michael can gather regarding this issue?

Comment 18 Raghavendra G 2010-10-27 03:25:05 UTC
Jacob,

Even with release-3.1.0, is the customer running into same crash, mentioned at the begining of this bug report? If so, is it possible to run gdb on core and get output of command 'thr apply all bt full'? Also, is it possible to attach the file rpc/rpc-transport/rdma/src/rdma.c from glusterfs sources?

If the customer is not running into crash, is it possible to explain in detail what is the issue?

regards,
Raghavendra

Comment 19 Harshavardhana 2010-10-27 04:00:16 UTC
(In reply to comment #18)
> Jacob,
> 
> Even with release-3.1.0, is the customer running into same crash, mentioned at
> the begining of this bug report? If so, is it possible to run gdb on core and
> get output of command 'thr apply all bt full'? Also, is it possible to attach
> the file rpc/rpc-transport/rdma/src/rdma.c from glusterfs sources?
> 
> If the customer is not running into crash, is it possible to explain in detail
> what is the issue?
> 
> regards,
> Raghavendra

This issue is resolved in 3.1 works with rdma in native glusterfs. Issue was regarding few drivers not loaded automatically eg: ib_ipoib, ib_uverbs namely and fuse which rendered mount-point hung.

We can close this ticket its not valid anymore.


Note You need to log in before you can comment on or make changes to this bug.