Bug 761833 (GLUSTER-101) - ib-verbs config crashing while dd'ing with a big mtu size
Summary: ib-verbs config crashing while dd'ing with a big mtu size
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-101
Product: GlusterFS
Classification: Community
Component: ib-verbs
Version: 2.0.0
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Raghavendra G
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-26 14:30 UTC by Pavan Vilas Sondur
Modified: 2009-07-16 06:18 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Pavan Vilas Sondur 2009-06-26 14:30:21 UTC
If code is changed as the following and dd'ing a file of block-size of 1G on the mount point crashes the glusterfs client.

[root@brick1 glusterfs]# diff -uNr build/glusterfs-2.0.0-custom/transport/ib-verbs/src/ib-verbs.c build/glusterfs-2.0.0/transport/ib-verbs/src/ib-verbs.c
--- build/glusterfs-2.0.0-custom/transport/ib-verbs/src/ib-verbs.c      2009-06-26 06:44:42.000000000 -0700
+++ build/glusterfs-2.0.0/transport/ib-verbs/src/ib-verbs.c     2009-04-24 03:10:57.000000000 -0700
@@ -1269,8 +1269,8 @@
 
         /* TODO: validate arguments from options below */
 
-        options->send_size = 1024 ;/*this->xl->ctx->page_size;*/
-        options->recv_size = 1024; /*this->xl->ctx->page_size;*/
+        options->send_size = this->xl->ctx->page_size;
+        options->recv_size = this->xl->ctx->page_size;
         options->send_count = 32;
         options->recv_count = 32;

The client crashes and the bt is indicating memory corruption.

(gdb) bt
#0  0x0000003caaa6faa7 in malloc_consolidate () from /lib64/libc.so.6
#1  0x0000003caaa71c72 in _int_malloc () from /lib64/libc.so.6
#2  0x0000003caaa72efd in malloc () from /lib64/libc.so.6
#3  0x0000003caaa6129a in __fopen_internal () from /lib64/libc.so.6
#4  0x00002aaaab66022a in internal_setent () from /lib64/libnss_files.so.2
#5  0x00002aaaab660aba in _nss_files_gethostbyname2_r () from /lib64/libnss_files.so.2
#6  0x0000003caaab9def in gaih_inet () from /lib64/libc.so.6
#7  0x0000003caaabb7aa in getaddrinfo () from /lib64/libc.so.6
#8  0x00002b29bdf9ee5e in gf_resolve_ip6 (hostname=0x1a344790 "brick1", port=23456, family=0, dnscache=0x1a347a50, addr_info=0x7fffecb211e0)
    at common-utils.c:100
#9  0x00002aaaaaab1f04 in ibverbs_client_get_remote_sockaddr (this=0x1a347a00, sockaddr=0x7fffecb21230, sockaddr_len=0x7fffecb212b8) at name.c:238
#10 0x00002aaaaaaae8be in ib_verbs_connect (this=0x1a347a00) at ib-verbs.c:2052
#11 0x00002b29bea4ba0f in client_protocol_reconnect (trans_ptr=<value optimized out>) at client-protocol.c:6405
#12 0x00002b29bea52d19 in notify (this=0x1a343e20, event=4, data=0x1a347a00) at client-protocol.c:7000
#13 0x00002aaaaaaae714 in ib_verbs_handshake_pollerr (this=0x1a347a00) at ib-verbs.c:1902
#14 0x00002aaaaaab0812 in ib_verbs_event_handler (fd=<value optimized out>, idx=568, data=0x1a347a00, poll_in=1, poll_out=0, poll_err=16) at ib-verbs.c:2004
#15 0x00002b29bdfaced5 in event_dispatch_epoll (event_pool=0x1a33e380) at event.c:804
#16 0x0000000000403769 in main (argc=8, argv=0x7fffecb22448) at glusterfsd.c:1154

I've moved volfiles, logfiles and 2 cores to /share/tickets/Bug#/

Comment 1 Raghavendra G 2009-07-10 02:31:40 UTC
When a buffer bigger than ib-verbs-work-request-send-size
was attempted to write to network, ib-verbs returns ENOTCONN. Neverthless,
the ioq_entry corresponding to the write was appended to the pending lists
of ioq_entries waiting to be written to network. This resulted in double
free of header, once in protocol_client_xfer and again during cleanup of
transport. The transport cleanup happened due to the timeouts of
subsequent operations, since ioq_entry corresponding to writev
was blocking any other operations from reaching server.

A fix has been submitted for review at 
http://patches.gluster.com/patch/726/

Comment 2 Anand Avati 2009-07-16 03:15:06 UTC
PATCH: http://patches.gluster.com/patch/726 in master (ib-verbs: don't append ioq_entry to pending_list if first attempt of writing to network fails)


Note You need to log in before you can comment on or make changes to this bug.