Hide Forgot
<snip> LOCK (&frame->lock); { local = frame->local; if (op_ret >= 0) local->op_ret = op_ret; local->op_errno = op_errno; } UNLOCK (&frame->lock); call_count = afr_frame_return (frame); if (call_count == 0) { if (local->op_ret == 0) { ret = afr_fd_ctx_set (this, local->fd); <<< <snip> The bug is in setting local->op_ret. If the last opendir call fails (call_count == 0) and previous ones have succeeded, it results in a crash.
Found this crash with error-gen on Replicate with 4 subvolumes when 2 subvolumes of Replicates were error-gen subvolumes. volume replicate type cluster/replicate subvolumes client1 client2 client3-error-gen client4-error-gen end-volume (gdb) bt #0 0x00007f410694c18e in afr_opendir_cbk (frame=0x143a3d0, cookie=0x143d680, this=0x142bed0, op_ret=-1, op_errno=19, fd=0x0) at afr-dir-read.c:242 #1 0x00007f4106b9854d in error_gen_opendir (frame=0x143d680, this=0x142bbd0, loc=0x14350b8, fd=0x14426c0) at error-gen.c:1272 #2 0x00007f410694c909 in afr_opendir (frame=0x143a3d0, this=0x142bed0, loc=0x14350b8, fd=0x14426c0) at afr-dir-read.c:314 #3 0x00007f4107f8078b in default_opendir (frame=0x143a370, this=0x142c790, loc=0x14350b8, fd=0x14426c0) at defaults.c:701 #4 0x00007f4106b986d3 in error_gen_opendir (frame=0x143af40, this=0x142d0b0, loc=0x14350b8, fd=0x14426c0) at error-gen.c:1276 #5 0x00007f4107f8078b in default_opendir (frame=0x1441040, this=0x142d340, loc=0x14350b8, fd=0x14426c0) at defaults.c:701 #6 0x00007f410630f6e2 in fuse_opendir (this=0x1425550, finh=0x143aba0, msg=0x143abc8) at fuse-bridge.c:2131 #7 0x00007f410631433e in fuse_thread_proc (data=0x1425550) at fuse-bridge.c:3191 #8 0x00007f4107b46a04 in start_thread (arg=<value optimized out>) at pthread_create.c:300 #9 0x00007f41078b080d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
PATCH: http://patches.gluster.com/patch/3271 in master (cluster/afr: Don't dereference fd ptr - it might be NULL due to a failed call.)
PATCH: http://patches.gluster.com/patch/3272 in release-3.0 (cluster/afr: Don't dereference fd ptr - it might be NULL due to a failed call.)