Bug 1414324

Summary: NFS segfaults during file renames in parallel with create
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: shilpa <smanjara>
Component: RGWAssignee: Matt Benjamin (redhat) <mbenjamin>
Status: CLOSED ERRATA QA Contact: Ramakrishnan Periyasamy <rperiyas>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.1CC: cbodley, ceph-eng-bugs, hnallurv, kbader, kdreyer, mbenjamin, owasserm, sweil, tserlin
Target Milestone: rcKeywords: Rebase
Target Release: 2.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.5-11.el7cp Ubuntu: ceph_10.2.5-5redhat1xenial Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-14 15:47:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description shilpa 2017-01-18 09:58:06 UTC
Description of problem:
Mount NFS share on two clients. Created random files with different sizes in 10*10 directory structure. After creating about 46K files, started renaming files at the top level. Started hitting Input/Output error while renaming.

Version-Release number of selected component (if applicable):
ceph-radosgw-10.2.5-3.el7cp.x86_64

Steps to Reproduce:
1. Configure nfs-ganesha on RGW server and create an S3 bucket. Mount export on two clients.
2. Start creating random size files with -b 10 -d 10 -n 1000 where n is number of files that created at each directory level.  
3. Also, parallely start running S3 workload of creating objects on the same bucket 'my-new-bucket'. 
4. It was working fine until this point. About 46K files were created and both S3 ops and file ops were OK.
5. Now on another mountpoint, start renaming files from top. During this process, an I/O error occured and ls command were hung. 
6. nfsd process had exited with segfault:

    -2> 2017-01-18 09:01:42.399707 7f6b2a7f4700  2 req 0:0.937762:: :list_bucket:http status=200
    -1> 2017-01-18 09:01:42.399712 7f6b2a7f4700  1 ====== process_request req done req=0x7f6b2a7f2a30 http_status=200 ======
     0> 2017-01-18 09:01:42.401168 7f6adb756700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f6adb756700 thread_name:ganesha.nfsd

 ceph version 10.2.5-3.el7cp (1337a819287fd59af47dbbe186c465dfa1b384e7)
 1: (()+0x56e10a) [0x7f6c617a610a]
 2: (()+0xf370) [0x7f6c6dffb370]
 3: (mdcache_dirent_rename()+0x1b7) [0x7f6c6fb41137]
 4: (()+0x10628a) [0x7f6c6fb3728a]
 5: (fsal_rename()+0x169) [0x7f6c6fa70099]
 6: (nfs4_op_rename()+0x192) [0x7f6c6faab6c2]
 7: (nfs4_Compound()+0x63d) [0x7f6c6fa96fcd]
 8: (nfs_rpc_execute()+0x5bc) [0x7f6c6fa8817c]
 9: (()+0x587da) [0x7f6c6fa897da]
 10: (()+0xe2459) [0x7f6c6fb13459]
 11: (()+0x7dc5) [0x7f6c6dff3dc5]
 12: (clone()+0x6d) [0x7f6c6d6c273d]

Comment 12 Ramakrishnan Periyasamy 2017-02-09 10:06:30 UTC
Not observing any crash, moving this bug to verified state.

Comment 14 errata-xmlrpc 2017-03-14 15:47:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0514.html