Hide Forgot
Use of STACK_UNWIND_STRICT will prevent these kind of bugs from happening.
The glusterfs client crashed in afr_readlink_cbk. 4 afr servers. 3 running with error-gen loaded above posix. Error simulated is ENOMEM. Second server was without error-gen. 2 clients with distributed replicate. Sanity script was running on one client and kernel compilation on other client. 2nd server(one wthout error-gen) was brought down and again brought up. this is the backtrace of the core generated. [?1034hGNU gdb 6.8 Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-slackware-linux"... warning: Can't read pathname for load map: Input/output error. Reading symbols from /opt/glusterfs/3.0.5rc3/lib/libglusterfs.so.0...done. Loaded symbols for /opt/glusterfs/3.0.5rc3/lib/libglusterfs.so.0 Reading symbols from /lib64/libdl.so.2...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/libpthread.so.0...done. Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/protocol/client.so...done. Loaded symbols for /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/protocol/client.so Reading symbols from /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/cluster/replicate.so...done. Loaded symbols for /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/cluster/replicate.so Reading symbols from /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/cluster/distribute.so...done. Loaded symbols for /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/cluster/distribute.so Reading symbols from /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/performance/write-behind.so...done. Loaded symbols for /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/performance/write-behind.so Reading symbols from /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/performance/read-ahead.so...done. Loaded symbols for /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/performance/read-ahead.so Reading symbols from /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/performance/io-cache.so...done. Loaded symbols for /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/performance/io-cache.so Reading symbols from /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/performance/quick-read.so...done. Loaded symbols for /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/performance/quick-read.so Reading symbols from /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/performance/stat-prefetch.so...done. Loaded symbols for /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/performance/stat-prefetch.so Reading symbols from /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/mount/fuse.so...done. Loaded symbols for /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/xlator/mount/fuse.so Reading symbols from /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/transport/socket.so...done. Loaded symbols for /opt/glusterfs/3.0.5rc3/lib/glusterfs/3.0.5rc3/transport/socket.so Reading symbols from /lib64/libnss_files.so.2...done. Loaded symbols for /lib64/libnss_files.so.2 Reading symbols from /usr/lib64/libgcc_s.so.1...done. Loaded symbols for /usr/lib64/libgcc_s.so.1 Core was generated by `/opt/glusterfs/3.0.5rc3/sbin/glusterfs -f client1.vol /mnt/hd/ -l /tmp/error_cl'. Program terminated with signal 11, Segmentation fault. [New process 13957] [New process 13963] [New process 13958] #0 0x00007ff5c2a34019 in afr_readlink_cbk (frame=0x7ff5b411eca0, cookie=0x0, this=0x611ae0, op_ret=-1, op_errno=22, buf=0x0, sbuf=0x60000000d) at ../../../../../xlators/cluster/afr/src/afr-inode-read.c:474 474 sbuf->st_ino = local->cont.readlink.ino; (gdb) bt #0 0x00007ff5c2a34019 in afr_readlink_cbk (frame=0x7ff5b411eca0, cookie=0x0, this=0x611ae0, op_ret=-1, op_errno=22, buf=0x0, sbuf=0x60000000d) at ../../../../../xlators/cluster/afr/src/afr-inode-read.c:474 #1 0x00007ff5c2c7939c in client_readlink (frame=0x7ff5bc101d70, this=0x610d50, loc=0x7ff5b4129520, size=4096) at ../../../../../xlators/protocol/client/src/client-protocol.c:924 #2 0x00007ff5c2a33fe5 in afr_readlink_cbk (frame=0x7ff5b411eca0, cookie=0x0, this=0x611ae0, op_ret=-1, op_errno=12, buf=0x0, sbuf=0x7ffffbbbeaa0) at ../../../../../xlators/cluster/afr/src/afr-inode-read.c:463 #3 0x00007ff5c2c87dfb in client_readlink_cbk (frame=0x7ff5b410f630, hdr=0x7ff5bc0d4c50, hdrlen=188, iobuf=0x0) at ../../../../../xlators/protocol/client/src/client-protocol.c:4688 #4 0x00007ff5c2c8d9dd in protocol_client_interpret (this=0x6108a0, trans=0x618140, hdr_p=0x7ff5bc0d4c50 "", hdrlen=188, iobuf=0x0) at ../../../../../xlators/protocol/client/src/client-protocol.c:6570 #5 0x00007ff5c2c8e6a3 in protocol_client_pollin (this=0x6108a0, trans=0x618140) at ../../../../../xlators/protocol/client/src/client-protocol.c:6868 #6 0x00007ff5c2c8ed17 in notify (this=0x6108a0, event=2, data=0x618140) at ../../../../../xlators/protocol/client/src/client-protocol.c:6987 #7 0x00007ff5c3e402fa in xlator_notify (xl=0x6108a0, event=2, data=0x618140) at ../../../libglusterfs/src/xlator.c:929 #8 0x00007ff5c1170257 in socket_event_poll_in (this=0x618140) at ../../../../transport/socket/src/socket.c:771 #9 0x00007ff5c1170551 in socket_event_handler (fd=13, idx=5, data=0x618140, poll_in=1, poll_out=0, poll_err=0) at ../../../../transport/socket/src/socket.c:871 #10 0x00007ff5c3e64f1f in event_dispatch_epoll_handler (event_pool=0x60a320, events=0x61b200, i=1) at ../../../libglusterfs/src/event.c:804 #11 0x00007ff5c3e650ee in event_dispatch_epoll (event_pool=0x60a320) at ../../../libglusterfs/src/event.c:867 #12 0x00007ff5c3e653ff in event_dispatch (event_pool=0x60a320) at ../../../libglusterfs/src/event.c:975 #13 0x000000000040634b in main (argc=6, argv=0x7ffffbbbf758) at ../../../glusterfsd/src/glusterfsd.c:1425 (gdb) f 0 #0 0x00007ff5c2a34019 in afr_readlink_cbk (frame=0x7ff5b411eca0, cookie=0x0, this=0x611ae0, op_ret=-1, op_errno=22, buf=0x0, sbuf=0x60000000d) at ../../../../../xlators/cluster/afr/src/afr-inode-read.c:474 474 sbuf->st_ino = local->cont.readlink.ino; (gdb)p sbuf $1 = (struct stat *) 0x60000000d (gdb) p sbuf->st_ino Cannot access memory at address 0x600000015 (gdb) q Here sbuf contains some wrong value afr_readlink_cbk is being called from client_readlink by STACK_WIND like this, STACK_WIND(frame,-1,EINVAL,NULL). But afr_readlink_cbk takes arguments as afr_readlink_cbk(frame,cookie,op_ret,op_errno, link,sbuf). Since in STACK_WIND after link(which is NULL) threr is no argument being passed afr_redlink_cbk takes some wrong value.
I think the bug is due to accessing sbuf->st_ino in afr_readlink_cbk. This happens when readlink has failed on all nodes. In this case (from backtrace) one as ENOMEM and the other as EINVAL. In posix if readlink fails, we still pass the stbuf address without performing the lstat and this ends up getting accessed in the failure path of afr which results in a crash. The cookie value is extracted in the UNWIND path from the frame at that xlator where the STACK_WIND was performed - so not a case of missing argument in client protocol.
#0 0x00007ff5c2a34019 in afr_readlink_cbk (frame=0x7ff5b411eca0, cookie=0x0, this=0x611ae0, op_ret=-1, op_errno=22, buf=0x0, sbuf=0x60000000d) at ../../../../../xlators/cluster/afr/src/afr-inode-read.c:474 #1 0x00007ff5c2c7939c in client_readlink (frame=0x7ff5bc101d70, this=0x610d50, loc=0x7ff5b4129520, size=4096) at ../../../../../xlators/protocol/client/src/client-protocol.c:924 This looks like a regression of fixes to bug 762683. client_readlink failed and returned an EINVAL with wrong arguments STACK_UNWIND (frame, -1, EINVAL, NULL); return 0;
Patch has been taken into release-3.0 and master.
Created attachment 234 [details] xdpyinfo output
For 3.0.5rc3: I tried with a readlink program running and attaching the client process to gdb , and having breakpoints at client_readlink, afr_readlink_cbk. I made the path to hit the situation where crash happended (by setting some variables in gdb) and when it tried to access sbuf->st_ino it crashed. For 3.0.5rc6 repeated the same procedure but because of the use of STACK_UNWIND_STRICT instead of STACK_UNWIND in client_readlink, afr_readlink_cbk recieves sbuf structure as NULL and hence does not access its members. Hence it did not crash. Thus moving to the verified state. Attached is the program which can be used as the testcase for this bug.