Bug 1420231

Summary: nfs daemon crashed while deleting all the directories on the mountpoint
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Hemanth Kumar <hyelloji>
Component: RGWAssignee: Matt Benjamin (redhat) <mbenjamin>
Status: CLOSED ERRATA QA Contact: Hemanth Kumar <hyelloji>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.2CC: cbodley, ceph-eng-bugs, hnallurv, kbader, mbenjamin, owasserm, sweil, tserlin
Target Milestone: rc   
Target Release: 2.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.5-26.el7cp, nfs-ganesha-2.4.2-5.el7cp Ubuntu: ceph_10.2.5-18redhat1, nfs-ganesha_2.4.2-5redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-14 15:49:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Hemanth Kumar 2017-02-08 09:21:59 UTC
Description of problem:
-----------------------
While deleting all the directories created in the nfs share from one of the Client, IO errors were seen on clients and nfs daemon on rgw node had crashed.

Version-Release number of selected component (if applicable):
-------------------------
ceph-radosgw-10.2.5-22.el7cp.x86_64
nfs-ganesha-2.4.2-4.el7cp.x86_64
nfs-ganesha-rgw-2.4.2-4.el7cp.x86_64

Steps to Reproduce:
-------------------
Had few directories created with huge data , upgraded to the latest builds by running yum update on all the cluster nodes and rgw clients, restarted the daemons running and initiated delete of all the directories created.
nfs daemon crashed on rgw node..

Client :-
----------

# ls
bucket1  bucket2  bucket3  bucketlist  copy  copynew  dir1  dir2  dir3  dir4  dir5  dir6  dir7  folder1  folder3
[root@magna048 hell]# rm -rf *
rm: cannot remove ‘dir1’: Directory not empty
rm: cannot remove ‘dir6’: Input/output error
rm: cannot remove ‘dir7’: Input/output error
rm: cannot remove ‘folder1’: Input/output error
rm: cannot remove ‘folder3’: Input/output error

# ls
^C^C^C^C^Cls: cannot open directory .: Input/output error

-----------------------------------------------------------------------
rgw Log
--------

     0> 2017-02-07 12:43:52.810428 7f024cf31700 -1 *** Caught signal (Segmentation fault) **
 in thread 7f024cf31700 thread_name:ganesha.nfsd

 ceph version 10.2.5-22.el7cp (5cec6848b914e87dd6178e559dedae8a37cc08a3)
 1: (()+0x57322a) [0x7f04129d122a]
 2: (()+0xf370) [0x7f041f226370]
 3: (rgw::RGWFileHandle::reclaim()+0x1d7) [0x7f041296b477]
 4: (cohort::lru::LRU<std::mutex>::evict_block()+0x102) [0x7f0412979092]
 5: (rgw::RGWLibFS::lookup_fh(rgw::RGWFileHandle*, char const*, unsigned int)+0x393) [0x7f04129801e3]
 6: (rgw::RGWLibFS::stat_leaf(rgw::RGWFileHandle*, char const*, unsigned int)+0xf51) [0x7f041296e891]
 7: (rgw_lookup()+0xba) [0x7f041296fd6a]
 8: (()+0x4bb7) [0x7f041c0bfbb7]
 9: (()+0x6c16) [0x7f041c0c1c16]
 10: (rgw::RGWReaddirRequest::send_response()+0x453) [0x7f041297da43]
 11: (rgw::RGWLibProcess::process_request(rgw::RGWLibRequest*, rgw::RGWLibIO*)+0x637) [0x7f0412982ad7]
 12: (rgw::RGWLibProcess::process_request(rgw::RGWLibRequest*)+0x29c) [0x7f0412983c8c]
 13: (rgw::RGWFileHandle::readdir(bool (*)(char const*, void*, unsigned long), void*, unsigned long*, bool*, unsigned int)+0xb31) [0x7f041296ab21]
 14: (()+0x4ab9) [0x7f041c0bfab9]
 15: (mdcache_dirent_populate()+0x112) [0x7f0420d6b242]
 16: (()+0x105621) [0x7f0420d61621]
 17: (fsal_readdir()+0x15d) [0x7f0420c9a61d]
 18: (nfs4_op_readdir()+0x26b) [0x7f0420cd535b]
 19: (nfs4_Compound()+0x63d) [0x7f0420cc1ded]
 20: (nfs_rpc_execute()+0x5bc) [0x7f0420cb2f9c]
 21: (()+0x585fa) [0x7f0420cb45fa]
 22: (()+0xe2289) [0x7f0420d3e289]
 23: (()+0x7dc5) [0x7f041f21edc5]
 24: (clone()+0x6d) [0x7f041e8ed73d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Comment 8 Hemanth Kumar 2017-02-22 11:42:33 UTC
No Crashed seen in latest builds while deleting the directories.
Moving to Verified State.

Comment 10 errata-xmlrpc 2017-03-14 15:49:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0514.html