Bug 1118387

Summary: Server crashes if NFSv4 client does an call to OP4_WRITE on a directory if the related inode is not cached
Product: [Retired] nfs-ganesha Reporter: Philippe DENIEL <philippe.deniel>
Component: Protocols-NFS4Assignee: Frank Filz <ffilz>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.1CC: dpal, kkeithle, lieb
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-24 11:10:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Philippe DENIEL 2014-07-10 14:57:27 UTC
Hi,

I found a bug by chance (made a typo when typing a command line). The 
annoying point : it's actually a major bug.
It happens with NFSv4 and NFSv4.1, whatever the FSAL that you use.
I use latest 2.2-1 from Frank's repo. Stack from gdb is at the end of 
this mail.

Reproducer is:
     - start Ganesha and wait grace period
     - mount ganesha's share (let's say you mounted it under /mnt)
     - immediately do "echo something > /mnt/dir" where "/mnt/dir" is a 
directory that already exists
Instead of returning  "Is a directory", ganesha crashes.

If you "ls" right after the mount (or something that will call readdir() 
and populate the cache inode on the server side), you get a regular 
behaviour and no crash.
To reproduce the bug, you should operate on a non-cached directory.

This bug is pretty annoying : if the server restarts, then the client 
reconnects, waits for the grace period and then redo the same write 
request to the directory... and the server crashes again.
The loop can be broken by doing a ls to populate the cash during the 
grace period.

Segfault observed in gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff90dfa700 (LWP 29901)]
0x00000000004c6636 in cache_inode_setattr (entry=0x0, 
attr=0x7fff90df8a60, is_open_write=true) at 
/opt/GANESHA/src/cache_inode/cache_inode_setattr.c:63
63              struct fsal_obj_handle *obj_handle = entry->obj_handle;
Missing separate debuginfos, use: debuginfo-install 
glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 
krb5-libs-1.9-22.el6.x86_64 libattr-2.4.44-7.el6.x86_64 
libblkid-2.17.2-12.14.el6_5.x86_64 libcap-2.16-5.2.el6.x86_64 
libcom_err-1.41.12-11.el6.x86_64 libcom_err-1.41.90.wc3-7.el6.x86_64 
libselinux-2.0.94-5.el6.x86_64 libuuid-2.17.2-12.14.el6_5.x86_64 
nfs-utils-lib-1.1.5-6.el6.x86_64
(gdb) where
#0  0x00000000004c6636 in cache_inode_setattr (entry=0x0, 
attr=0x7fff90df8a60, is_open_write=true)
     at /opt/GANESHA/src/cache_inode/cache_inode_setattr.c:63
#1  0x0000000000463a1b in open4_create (arg=0x7fff54002628, 
data=0x7fff90df8d50, res=0x7fff6c000c78, parent=0x7eb370, 
entry=0x7fff90df8c38,
     filename=0x7fff6c0011f0 "dir") at 
/opt/GANESHA/src/Protocols/NFS/nfs4_op_open.c:737
#2  0x0000000000463b34 in open4_claim_null (arg=0x7fff54002628, 
data=0x7fff90df8d50, res=0x7fff6c000c78, entry=0x7fff90df8c38)
     at /opt/GANESHA/src/Protocols/NFS/nfs4_op_open.c:802
#3  0x0000000000464c0d in nfs4_op_open (op=0x7fff54002620, 
data=0x7fff90df8d50, resp=0x7fff6c000c70)
     at /opt/GANESHA/src/Protocols/NFS/nfs4_op_open.c:1188
#4  0x00000000004547cc in nfs4_Compound (arg=0x7fff540009f0, 
worker=0x7fff6c0008c0, req=0x7fff54000938, res=0x7fff6c000a20)
     at /opt/GANESHA/src/Protocols/NFS/nfs4_Compound.c:679
#5  0x0000000000449ba8 in nfs_rpc_execute (req=0x7fff540008c0, 
worker_data=0x7fff6c0008c0) at 
/opt/GANESHA/src/MainNFSD/nfs_worker_thread.c:1257
#6  0x000000000044a5ac in worker_run (ctx=0x885340) at 
/opt/GANESHA/src/MainNFSD/nfs_worker_thread.c:1506
#7  0x00000000004f07f9 in fridgethr_start_routine (arg=0x885340) at 
/opt/GANESHA/src/support/fridgethr.c:561
#8  0x00000030368077e1 in start_thread () from /lib64/libpthread.so.0
#9  0x00000030364e577d in clone () from /lib64/libc.so.6

Comment 1 Frank Filz 2015-10-05 21:45:27 UTC
Does this still occur in V2.3-rc5?