Hide Forgot
If code is changed as the following and dd'ing a file of block-size of 1G on the mount point crashes the glusterfs client. [root@brick1 glusterfs]# diff -uNr build/glusterfs-2.0.0-custom/transport/ib-verbs/src/ib-verbs.c build/glusterfs-2.0.0/transport/ib-verbs/src/ib-verbs.c --- build/glusterfs-2.0.0-custom/transport/ib-verbs/src/ib-verbs.c 2009-06-26 06:44:42.000000000 -0700 +++ build/glusterfs-2.0.0/transport/ib-verbs/src/ib-verbs.c 2009-04-24 03:10:57.000000000 -0700 @@ -1269,8 +1269,8 @@ /* TODO: validate arguments from options below */ - options->send_size = 1024 ;/*this->xl->ctx->page_size;*/ - options->recv_size = 1024; /*this->xl->ctx->page_size;*/ + options->send_size = this->xl->ctx->page_size; + options->recv_size = this->xl->ctx->page_size; options->send_count = 32; options->recv_count = 32; The client crashes and the bt is indicating memory corruption. (gdb) bt #0 0x0000003caaa6faa7 in malloc_consolidate () from /lib64/libc.so.6 #1 0x0000003caaa71c72 in _int_malloc () from /lib64/libc.so.6 #2 0x0000003caaa72efd in malloc () from /lib64/libc.so.6 #3 0x0000003caaa6129a in __fopen_internal () from /lib64/libc.so.6 #4 0x00002aaaab66022a in internal_setent () from /lib64/libnss_files.so.2 #5 0x00002aaaab660aba in _nss_files_gethostbyname2_r () from /lib64/libnss_files.so.2 #6 0x0000003caaab9def in gaih_inet () from /lib64/libc.so.6 #7 0x0000003caaabb7aa in getaddrinfo () from /lib64/libc.so.6 #8 0x00002b29bdf9ee5e in gf_resolve_ip6 (hostname=0x1a344790 "brick1", port=23456, family=0, dnscache=0x1a347a50, addr_info=0x7fffecb211e0) at common-utils.c:100 #9 0x00002aaaaaab1f04 in ibverbs_client_get_remote_sockaddr (this=0x1a347a00, sockaddr=0x7fffecb21230, sockaddr_len=0x7fffecb212b8) at name.c:238 #10 0x00002aaaaaaae8be in ib_verbs_connect (this=0x1a347a00) at ib-verbs.c:2052 #11 0x00002b29bea4ba0f in client_protocol_reconnect (trans_ptr=<value optimized out>) at client-protocol.c:6405 #12 0x00002b29bea52d19 in notify (this=0x1a343e20, event=4, data=0x1a347a00) at client-protocol.c:7000 #13 0x00002aaaaaaae714 in ib_verbs_handshake_pollerr (this=0x1a347a00) at ib-verbs.c:1902 #14 0x00002aaaaaab0812 in ib_verbs_event_handler (fd=<value optimized out>, idx=568, data=0x1a347a00, poll_in=1, poll_out=0, poll_err=16) at ib-verbs.c:2004 #15 0x00002b29bdfaced5 in event_dispatch_epoll (event_pool=0x1a33e380) at event.c:804 #16 0x0000000000403769 in main (argc=8, argv=0x7fffecb22448) at glusterfsd.c:1154 I've moved volfiles, logfiles and 2 cores to /share/tickets/Bug#/
When a buffer bigger than ib-verbs-work-request-send-size was attempted to write to network, ib-verbs returns ENOTCONN. Neverthless, the ioq_entry corresponding to the write was appended to the pending lists of ioq_entries waiting to be written to network. This resulted in double free of header, once in protocol_client_xfer and again during cleanup of transport. The transport cleanup happened due to the timeouts of subsequent operations, since ioq_entry corresponding to writev was blocking any other operations from reaching server. A fix has been submitted for review at http://patches.gluster.com/patch/726/
PATCH: http://patches.gluster.com/patch/726 in master (ib-verbs: don't append ioq_entry to pending_list if first attempt of writing to network fails)