Description of problem: While trying to mount NFS share on a client, the nfsd process crashes with the following crashdump: 0> 2017-01-16 07:36:02.166385 7f8a610ec700 -1 *** Caught signal (Aborted) ** in thread 7f8a610ec700 thread_name:rgw_obj_expirer ceph version 10.2.5-3.el7cp (1337a819287fd59af47dbbe186c465dfa1b384e7) 1: (()+0x56e10a) [0x7f8a8467510a] 2: (()+0xf370) [0x7f8a90eca370] 3: (gsignal()+0x37) [0x7f8a904cf1d7] 4: (abort()+0x148) [0x7f8a904d08c8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f8a78804ab5] 6: (()+0x5ea26) [0x7f8a78802a26] 7: (()+0x5ea53) [0x7f8a78802a53] 8: (()+0x5ec73) [0x7f8a78802c73] 9: (operator new(unsigned long)+0x7d) [0x7f8a7880320d] 10: (std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)+0x59) [0x7f8a78861ce9] 11: (std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long)+0x1b) [0x7f8a788628fb] 12: (std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)+0x5c) [0x7f8a78862fcc] 13: (RGWObjectExpirer::process_single_shard(std::string const&, utime_t const&, utime_t const&)+0x133) [0x7f8a844aced3] 14: (RGWObjectExpirer::inspect_all_shards(utime_t const&, utime_t const&)+0xb2) [0x7f8a844ad542] 15: (RGWObjectExpirer::OEWorker::entry()+0x7f) [0x7f8a844ad7ef] 16: (()+0x7dc5) [0x7f8a90ec2dc5] 17: (clone()+0x6d) [0x7f8a9059173d] Version-Release number of selected component (if applicable): 0> 2017-01-16 07:36:02.166385 7f8a610ec700 -1 *** Caught signal (Aborted) ** in thread 7f8a610ec700 thread_name:rgw_obj_expirer Version: nfs-ganesha-2.4.1-3.el7cp.x86_64 nfs-ganesha-rgw-2.4.1-3.el7cp.x86_64 ceph-radosgw-10.2.5-3.el7cp.x86_64 How reproducible: Always Steps to Reproduce: 1. Configure NFS on RGW server and start ganesha.nfsd process 2. Try to mount it on a client: # mount -t nfs -o nfsvers=4.1,sync,noauto,soft,proto=tcp magna039.ceph.redhat.com:/ /mntr mount.nfs: access denied by server while mounting magna039.ceph.redhat.com:/ 3. After some time check for nfsd process on server. Expected results: The mount should be successful and process should not have crashed Additional info: NFS_Core_Param { #Use supplied name other tha IP In NSM operations NSM_Use_Caller_Name = true; #Copy lock states into "/var/lib/nfs/ganesha" dir Clustered = false; #By default port number '2049' is used for NFS service. #Configure ports for MNT, NLM, RQuota services. #The ports chosen here are from '/etc/sysconfig/nfs' # MNT_Port = 20048; NLM_Port = 32803; Rquota_Port = 875; } CACHEINODE { Entries_HWMark = 25000; } EXPORT_DEFAULTS { # To reflect nfsnobody Anonymous_uid = 65534; Anonymous_gid = 65534; } EXPORT { Export_ID=1; Path = "/"; Pseudo = "/"; Access_Type = RW; NFS_Protocols = 4; Transport_Protocols = TCP; FSAL { Name = RGW; User_Id = testuser; Access_Key_Id = "I5P9C2G5VH0Y24ZA7F13"; Secret_Access_Key = "cMCTan57vOfRpnZIhL5pz8EJE0tFx61gicWcXful"; } } RGW { name = "client.rgw.magna039"; ceph_conf = "/etc/ceph/ceph.conf"; init_args = "--randomvar=specialk"; }
Adding more detailed logging directives (see /etc/ganesha/ganesha.conf, plus redirecting logging to a file, I see the following likely root cause: [root@magna039 ganesha]# /usr/bin/ganesha.nfsd -f /etc/ganesha/ganesha.conf -F 2017-01-16 18:34:50.720992 7ff78f6a20c0 -1 auth: unable to find a keyring on /var/lib/ceph/radosgw/-admin/keyring: (2) No such file or directory 2017-01-16 18:34:50.722071 7ff78f6a20c0 -1 monclient(hunting): authenticate NOTE: no keyring found; disabled cephx authentication 2017-01-16 18:34:50.722536 7ff78f6a20c0 -1 Couldn't init storage provider (RADOS) *** Caught signal (Segmentation fault) ** I.e., I think the problem with path to a radosgw admin keyring is preventing RGW from starting within the NFS ganesha instance.
Update with working setup: 1. there is a segfault on shutdown after failure to initialize RADOS--triggered proximately by misconfiguration (tracker 17638)--this won't be fixed in 2.2, but is being worked on 2. the root cause of the misconfiguration is missing values for the radosgw arguments "--name" and "--cluster"; as of 2.1, the correct way to set these values (on an installation that requires them, such as this one), is to pass them as parameters in the RGW FSAL configuration block: RGW { ceph_conf = "/etc/ceph/ceph.conf"; cluster = "ceph"; name = "client.rgw.magna039"; init_args = "-d --debug-rgw=16"; } It turns out that currently the "init_args" argument should be passed as a set of separate, null-terminated strings appended to the librgw_create(...) argv argument, not passed in on a single line.