+++ This bug was initially created as a clone of Bug #1648687 +++ Description of problem: afr_open() stores fd reference in local->cont.open structure. But in the callback function "afr_open_ftruncate_cbk" ( when openflags contain 'O_TRUNC' set), we reference local->fd in AFR_STACK_UNWIND resulting in error or crash. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Soumya Koduri on 2018-11-11 12:34:37 EST --- Patch posted for review - https://review.gluster.org/#/c/glusterfs/+/21617/ --- Additional comment from Worker Ant on 2018-11-11 12:45:42 EST --- REVIEW: https://review.gluster.org/21617 (afr: open_ftruncate_cbk should read fd from local->cont.open struct) posted (#2) for review on master by soumya k --- Additional comment from Worker Ant on 2018-11-15 00:08:03 EST --- REVIEW: https://review.gluster.org/21617 (afr: open_ftruncate_cbk should read fd from local->cont.open struct) posted (#8) for review on master by Ravishankar N
Upstream Patch: https://review.gluster.org/#/c/glusterfs/+/21617/
Hi Soumya, Can you please provide your observations in and around the fix or provide details to reproduce this issue. This would help validating the fix.
Hi Anees, The issue is that when afr receives a OPEN request with O_TRUNC flag set, while sending response back to upper xlators, its sending a NULL pointer instead of valid fd. So this will result in either crash (if those upper xlators try to access that NULL pointer) or error messages in log file (for .eg., like reported in https://bugzilla.redhat.com/show_bug.cgi?id=1642488#c0) Here are few cases where I observed this issue. 1) Export a volume via nfs-ganesha and try to write to a file or perform truncate operation 2) Write a simple gfapi program (will attach in the bug) and execute it against the replicate volume created 3) Few upstream users also reported this issue - https://bugzilla.redhat.com/show_bug.cgi?id=1642488#c4 And I observed that above cases shall log a error message in case of dist-rep volume, whereas for 1*3 volume, the client process shall crash.
Hi Soumya, Thank-you for the gfapi program and offline help in verifying the fix, Verified the test-case with the below steps: 1. Executed the gfapi program as per the steps provided in comment #11 2. Executed the same program with multiple clients 3. No client process crash was seen 4. No NULL pointer errors are seen as that mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1642488#c0 Tested on both repilca 3 and dist-replica volumes, Setting this to verified
The above test verification was done on latest build # rpm -qa | grep gluster gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-rdma-3.12.2-34.el7rhgs.x86_64 glusterfs-server-3.12.2-34.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-34.el7rhgs.x86_64 glusterfs-fuse-3.12.2-34.el7rhgs.x86_64 glusterfs-events-3.12.2-34.el7rhgs.x86_64 -Thanks,
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0263