Hide Forgot
Backtrace: Program terminated with signal 11, Segmentation fault. #0 client_readv_cbk (frame=0x2aaab0d965d0, hdr=0x2aaaac399f10, hdrlen=<value optimized out>, iobuf=0x0) at client-protocol.c:4188 4188 vector.iov_base = iobuf->ptr; (gdb) bt #0 client_readv_cbk (frame=0x2aaab0d965d0, hdr=0x2aaaac399f10, hdrlen=<value optimized out>, iobuf=0x0) at client-protocol.c:4188 #1 0x00002b9101b3e1ba in protocol_client_pollin (this=0x143c9010, trans=0x143d1500) at client-protocol.c:6435 #2 0x00002b9101b4cb52 in notify (this=0xd9, event=2, data=0x143d1500) at client-protocol.c:6554 #3 0x00002b910107b433 in xlator_notify (xl=0x143c9010, event=2, data=0x143d1500) at xlator.c:919 #4 0x00002aaaaab9b073 in socket_event_handler (fd=<value optimized out>, idx=4, data=0x143d1500, poll_in=1, poll_out=0, poll_err=0) at socket.c:831 #5 0x00002b91010964e5 in event_dispatch_epoll (event_pool=0x143c1350) at event.c:804 #6 0x0000000000404367 in main (argc=5, argv=0x7fff0dcf97f8) at glusterfsd.c:1494 (gdb) p *vectore No symbol "vectore" in current context. (gdb) p *vector Structure has no component named operator*. (gdb) p vector $1 = {iov_base = 0x0, iov_len = 7893} (gdb) fr 1 #1 0x00002b9101b3e1ba in protocol_client_pollin (this=0x143c9010, trans=0x143d1500) at client-protocol.c:6435 6435 ret = protocol_client_interpret (this, trans, hdr, hdrlen, (gdb) list 6430 6431 ret = transport_receive (trans, &hdr, &hdrlen, &iobuf); 6432 6433 if (ret == 0) 6434 { 6435 ret = protocol_client_interpret (this, trans, hdr, hdrlen, 6436 iobuf); 6437 } 6438 6439 /* TODO: use mem-pool */ (gdb) list transport_receive 319 320 321 int32_t 322 transport_receive (transport_t *this, char **hdr_p, size_t *hdrlen_p, 323 struct iobuf **iobuf_p) 324 { 325 int32_t ret = -1; 326 327 GF_VALIDATE_OR_GOTO("transport", this, fail); 328 (gdb) 329 if (this->peer_trans) { 330 *hdr_p = this->handover.msg->hdr; 331 *hdrlen_p = this->handover.msg->hdrlen; 332 *iobuf_p = this->handover.msg->iobuf; 333 334 return 0; 335 } 336 337 ret = this->ops->receive (this, hdr_p, hdrlen_p, iobuf_p); 338 fail: (gdb)
The symptom is that the transport code is receiving a NULL iobuf even though op_ret says that the readv fop has returned with over 7000 bytes of data.
GNFS crashed after running for some time while running SFS2008. The volfile is ------------------------------------------------- volume brick5 type protocol/client option transport-type socket option transport.socket.remote-port 7001 option remote-host 10.3.10.15 option remote-subvolume posix1-locked-iot end-volume volume brick6 type protocol/client option transport-type socket option transport.socket.remote-port 7001 option remote-host 10.3.10.16 option remote-subvolume posix1-locked-iot end-volume volume brick7 type protocol/client option transport-type socket option transport.socket.remote-port 7001 option remote-host 10.3.10.17 option remote-subvolume posix1-locked-iot end-volume volume brick8 type protocol/client option transport-type socket option transport.socket.remote-port 7001 option remote-host 10.3.10.18 option remote-subvolume posix1-locked-iot end-volume volume dist type cluster/distribute subvolumes brick5 brick6 brick7 brick8 end-volume volume for-wb type performance/io-threads subvolumes dist end-volume volume distribute type performance/write-behind option window-size 1Gb subvolumes for-wb end-volume #volume distribute # type performance/read-ahead # subvolumes for-ra #end-volume volume nfsserver type nfs/server subvolumes distribute option rpc-auth.addr.allow * end-volume -------------------------------------------------------------------------- The log file is : -------------------------------------------------------------------------- [2010-08-22 22:11:21] W [xlator.c:651:validate_xlator_volume_options] distribute: option 'window-size' is deprecated, preferred is 'cache-si ze', continuing with correction [2010-08-22 22:11:21] W [xlator.c:651:validate_xlator_volume_options] brick8: option 'transport.socket.remote-port' is deprecated, preferred is 'remote-port', continuing with correction [2010-08-22 22:11:21] W [xlator.c:651:validate_xlator_volume_options] brick7: option 'transport.socket.remote-port' is deprecated, preferred is 'remote-port', continuing with correction [2010-08-22 22:11:21] W [xlator.c:651:validate_xlator_volume_options] brick6: option 'transport.socket.remote-port' is deprecated, preferred is 'remote-port', continuing with correction [2010-08-22 22:11:21] W [xlator.c:651:validate_xlator_volume_options] brick5: option 'transport.socket.remote-port' is deprecated, preferred is 'remote-port', continuing with correction [2010-08-22 22:11:21] N [glusterfsd.c:1477:main] glusterfs: Successfully started [2010-08-22 22:11:21] N [client-protocol.c:5857:client_setvolume_cbk] brick7: Connected to 10.3.10.17:7001, attached to remote volume 'posix 1-locked-iot'. [2010-08-22 22:11:21] N [client-protocol.c:5857:client_setvolume_cbk] brick8: Connected to 10.3.10.18:7001, attached to remote volume 'posix 1-locked-iot'. [2010-08-22 22:11:21] N [client-protocol.c:5857:client_setvolume_cbk] brick5: Connected to 10.3.10.15:7001, attached to remote volume 'posix 1-locked-iot'. [2010-08-22 22:11:21] N [client-protocol.c:5857:client_setvolume_cbk] brick6: Connected to 10.3.10.16:7001, attached to remote volume 'posix 1-locked-iot'. [2010-08-22 22:11:21] N [client-protocol.c:5857:client_setvolume_cbk] brick5: Connected to 10.3.10.15:7001, attached to remote volume 'posix 1-locked-iot'. [2010-08-22 22:11:21] N [client-protocol.c:5857:client_setvolume_cbk] brick7: Connected to 10.3.10.17:7001, attached to remote volume 'posix 1-locked-iot'. [2010-08-22 22:11:21] N [client-protocol.c:5857:client_setvolume_cbk] brick7: Connected to 10.3.10.17:7001, attached to remote volume 'posix 1-locked-iot'. [2010-08-22 22:11:21] N [client-protocol.c:5857:client_setvolume_cbk] brick8: Connected to 10.3.10.18:7001, attached to remote volume 'posix1-locked-iot'. [2010-08-22 22:11:21] N [client-protocol.c:5857:client_setvolume_cbk] brick6: Connected to 10.3.10.16:7001, attached to remote volume 'posix1-locked-iot'. pending frames: patchset: v3.0.0-245-g849f5ec signal received: 11 time of crash: 2010-08-22 22:29:02 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs nfs_beta_rc10 /lib64/libc.so.6[0x32df6302d0] /opt/gnfs/lib/glusterfs/nfs_beta_rc10/xlator/protocol/client.so(client_readv_cbk+0x2a2)[0x2b9101b56a02] /opt/gnfs/lib/glusterfs/nfs_beta_rc10/xlator/protocol/client.so(protocol_client_pollin+0xca)[0x2b9101b3e1ba] /opt/gnfs/lib/glusterfs/nfs_beta_rc10/xlator/protocol/client.so(notify+0x212)[0x2b9101b4cb52] /opt/gnfs/lib/libglusterfs.so.0(xlator_notify+0x43)[0x2b910107b433] /opt/gnfs/lib/glusterfs/nfs_beta_rc10/transport/socket.so(socket_event_handler+0xd3)[0x2aaaaab9b073] /opt/gnfs/lib/libglusterfs.so.0[0x2b91010964e5] /opt/gnfs/sbin/glusterfs(main+0xb17)[0x404367] /lib64/libc.so.6(__libc_start_main+0xf4)[0x32df61d994] /opt/gnfs/sbin/glusterfs[0x4027a9] --------- ------------------------------------------------------------------------------- and the core file is at gluster.163.210:/gluster/pbt/core.23900.tbz
Shehjar, In 3.1 we will not be having this protocol code, hence this bug may not be valid anymore. What do you want to do with this?
We ran into this again while running SFS: Program terminated with signal 11, Segmentation fault. #0 client_readv_cbk (frame=0x2aaab8efea00, hdr=0x2aaab611e390, hdrlen=<value optimized out>, iobuf=0x0) at client-protocol.c:4188 4188 vector.iov_base = iobuf->ptr; (gdb) bt #0 client_readv_cbk (frame=0x2aaab8efea00, hdr=0x2aaab611e390, hdrlen=<value optimized out>, iobuf=0x0) at client-protocol.c:4188 #1 0x00002ad5510d61ba in protocol_client_pollin (this=0x800e70, trans=0x80cc00) at client-protocol.c:6435 #2 0x00002ad5510e4b52 in notify (this=0x94, event=2, data=0x80cc00) at client-protocol.c:6554 #3 0x00002ad550613433 in xlator_notify (xl=0x800e70, event=2, data=0x80cc00) at xlator.c:919 #4 0x00002aaaaab9b073 in socket_event_handler (fd=<value optimized out>, idx=0, data=0x80cc00, poll_in=1, poll_out=0, poll_err=0) at socket.c:831 #5 0x00002ad55062e4e5 in event_dispatch_epoll (event_pool=0x7fb350) at event.c:804 #6 0x0000000000404367 in main (argc=5, argv=0x7fffc4bd2e28) at glusterfsd.c:1494
(In reply to comment #3) > Shehjar, In 3.1 we will not be having this protocol code, hence this bug may > not be valid anymore. What do you want to do with this? Hi Amar, This bug is turning out to be a blocker for SFS tests on nfs-beta branch. If a quick-fix is possible, lets check that out otherwise, i think we can safely keep this as low prio, unless of course AB says that SFS over nfs-beta branch is a prio. This crash does not happen on any other tests with nfs-beta-rcX, only with SFS.
The crash did not occur when I ran GNFS with following command ./dsh tc4 "export GLUSTERFS_DISABLE_MEM_ACCT=1;/opt/gnfs/sbin/glusterfs -f /share/shehjart/volfiles/gnfs-1v-4d.vol -l /tmp/gnnnn3" The SFS test ran to completion with all 5 iterations finishing. Performance is slightly slower(less) but need to investigate more before it could be confirmed.
Sorry, last comment was meant for BUG-1499 :P
reducing the priority as we have removed legacy protocol from build, and SFS running work started over NFS in mainline..
Shehjar/Prithu ji, I will close this bug as now NFS has started working on mainline, and legacy xlator is removed from the build. This particular backtrace snapshot is no more valid now. Please file new bug wrt to mainline failures. -Amar