Description of problem: When trying to perform a scale validation of 500 exports on 50 clients, we can see that the nfs-mounts are taking way longer time than expected. This behaviour is visible only after 300 + mounts. At times the mounts take more than 10-15 hours to complete. Version-Release number of selected component (if applicable): rhcs 7.0, ganesha 5.5 How reproducible: Always Steps to Reproduce: 1. DEploy the rhcs 7.0 cluster 2. Create 500 exports 3. Try mounting the exports from the clients Actual results: Mounts are taking longer than expected Expected results: Mounts should succeeded without major delays Additional info: Log :http://magna002.ceph.redhat.com/cephci-jenkins/nfs_ganesha_scale_50_clients_500_exports.log config : http://pastebin.test.redhat.com/1109131
Yes, let's get a fix in to consolidate Ganesha EXPORTs into a single (or a few) Ceph clients/mounts and see if that resolves this issue.
Hi Frank, Could you confirm if this is the new cephfs client consolidation patch - http://registry-proxy.engineering.redhat.com/rh-osbs/rhceph:7-72.0.TEST.gerrithub1170100 Do we need to do any other changes? Or deployment of NFS with CephFS using this will be sufficient to run our test?
(In reply to Manisha Saini from comment #9) > Hi Frank, > > Could you confirm if this is the new cephfs client consolidation patch - > http://registry-proxy.engineering.redhat.com/rh-osbs/rhceph:7-72.0.TEST. > gerrithub1170100 I get a 404 error on this link. Please check that it is correct. Yes, I'm on the vpn.
(In reply to Kaleb KEITHLEY from comment #10) > (In reply to Manisha Saini from comment #9) > > Hi Frank, > > > > Could you confirm if this is the new cephfs client consolidation patch - > > http://registry-proxy.engineering.redhat.com/rh-osbs/rhceph:7-72.0.TEST. > > gerrithub1170100 > > I get a 404 error on this link. Please check that it is correct. Yes, I'm on > the vpn. Visiting http://registry-proxy.engineering.redhat.com won't work. The Brew link for that container is: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2701372 and you would have to pull with podman, e.g.: $ podman pull registry-proxy.engineering.redhat.com/rh-osbs/rhceph:7-72.0.TEST.gerrithub1170100 Anyway, the -patches branch for that custom nfs-ganesha build in that container is: https://gitlab.cee.redhat.com/ceph/nfs-ganesha/-/commits/private-tserlin-ffilz-gerrithub-1170100-rebase-back-onto-V5.5 Thomas
Hi frank, I tried both the scenarios to apply the additional changes i.e 1. modifying the export file 2.Modifying ganesha.conf file with export info. But unable to get this working in containerised deployment. Scenario 1: Editing export file and adding "cmount_path": "/" in export file =============== As updated in doc - https://docs.google.com/document/d/1o4DBWc9oP-ayMmHtP7XimB1FCAICGp9I0akQ9In0cLk/edit , the export file is not getting updated with "cmount_path": "/" and "user_id" is getting added again when updating the export file. Scenario 2: Editing ganesha.conf with export block ================= Below is the ganesha.conf file which I updated it with export block. [ceph: root@argo018 /]# ceph config-key get mgr/cephadm/services/nfs/ganesha.conf # This file is generated by cephadm. NFS_CORE_PARAM { Enable_NLM = false; Enable_RQUOTA = false; Protocols = 4; NFS_Port = 12049; HAProxy_Hosts = 10.8.128.216, 172.20.21.16, 2620:52:0:880:ae1f:6bff:fe0a:1844, 10.8.128.218, 172.20.21.18, 2620:52:0:880:ec4:7aff:fef3:e9f4, 10.8.128.219, 172.20.21.19, 2620:52:0:880:ae1f:6bff:fe0a:17ec, 10.8.128.220, 172.20.21.20, 2620:52:0:880:ec4:7aff:fef3:e9ee, 10.8.128.221, 172.20.21.21, 2620:52:0:880:ae1f:6bff:fe0a:180e; } NFSv4 { Delegations = false; RecoveryBackend = 'rados_cluster'; Minor_Versions = 1, 2; } RADOS_KV { UserId = "nfs.nfsganesha.0.0.argo018.apiqnm"; nodeid = "nfs.nfsganesha.0"; pool = ".nfs"; namespace = "nfsganesha"; } RGW { cluster = "ceph"; name = "client.nfs.nfsganesha.0.0.argo018.apiqnm-rgw"; } EXPORT { # Export Id (mandatory, each EXPORT must have a unique Export_Id) Export_Id = 2; # Exported path (mandatory) Path = "/"; # Pseudo Path (required for NFS v4) Pseudo = "/ganesha2"; # Required for access (default is None) # Could use CLIENT blocks instead Access_Type = RW; SecType = "sys"; Protocols = 3,4; MaxRead = 9000000; MaxWrite = 9000000; Squash = No_Root_Squash; # Exporting FSAL FSAL { Name = CEPH; cmount_path = "/"; } } =============== When restarting the nfs ganesha (redeploying the nfs daemon), ganesha service is started on argo018 but failed on argo019. [ceph: root@argo016 /]# ceph orch ps | grep nfs haproxy.nfs.nfsganesha.argo018.bayiim argo018 *:2049,9049 running (114m) 10m ago 114m 17.2M - 2.4.17-9f97155 bda92490ac6c a519fb6243e6 haproxy.nfs.nfsganesha.argo019.cjdvgw argo019 *:2049,9049 running (114m) 10m ago 114m 18.6M - 2.4.17-9f97155 bda92490ac6c 6cb692b58a51 keepalived.nfs.nfsganesha.argo018.jcxjht argo018 running (114m) 10m ago 114m 1765k - 2.2.4 b79b516c07ed 7a4d08395cde keepalived.nfs.nfsganesha.argo019.hknukw argo019 running (114m) 10m ago 114m 1765k - 2.2.4 b79b516c07ed 4fef36f212fb nfs.nfsganesha.0.0.argo018.apiqnm argo018 *:12049 running (10m) 10m ago 114m 46.9M - 5.5 ffbec4c9b233 dd6cf28dce72 nfs.nfsganesha.1.0.argo019.pfkqrp argo019 *:12049 unknown 10m ago 114m - - <unknown> <unknown> <unknown> ================ When I tried mounting the share from argo018 machine, the mount command is getting hanged [root@argo021 mnt]# mount -t nfs -o vers=4,port=2049 10.8.128.218:/ganesha1 /mnt/test/ mount.nfs: Connection refused ================ ganesha.log of argo018 machine 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] nfs_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] nfs_start_grace :STATE :EVENT :grace reload client info completed from backend 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0) 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] rados_cluster_grace_enforcing :CLIENT ID :EVENT :rados_cluster_grace_enforcing: ret=0 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] create_export :FSAL :CRIT :Unable to init Ceph handle. 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] mdcache_fsal_create_export :FSAL :MAJ :Failed to call create_export on underlying FSAL Ceph 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] fsal_cfg_commit :CONFIG :CRIT :Could not create export for (/ganesha2) to (/) 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] export_commit_common :CONFIG :WARN :A protocol is specified for export 2 that is not enabled in NFS_CORE_PARAM, fixing up 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] export_commit_common :CONFIG :CRIT :fsal_export is NULL 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] main :NFS STARTUP :WARN :No export entries found in configuration file !!! 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:54): 1 validation errors in block FSAL 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:54): Errors processing block (FSAL) 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:29): 1 validation errors in block EXPORT 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:29): Errors processing block (EXPORT) 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:24): Unknown block (RGW) 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed for proper quota management in FSAL 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] lower_my_caps :NFS STARTUP :EVENT :currently set capabilities are: cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_sys> 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] gsh_dbus_pkginit :DBUS :CRIT :dbus_bus_get failed (Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory) 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] gsh_dbus_register_path :DBUS :CRIT :dbus_connection_register_object_path called with no DBUS connection 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] gsh_dbus_register_path :DBUS :CRIT :dbus_connection_register_object_path called with no DBUS connection 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] gsh_dbus_register_path :DBUS :CRIT :dbus_connection_register_object_path called with no DBUS connection 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] gsh_dbus_register_path :DBUS :CRIT :dbus_connection_register_object_path called with no DBUS connection 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory (/var/run/ganesha) already exists 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] find_keytab_entry :NFS CB :WARN :Configuration file does not specify default realm while getting default realm name 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] gssd_refresh_krb5_machine_credential :NFS CB :CRIT :ERROR: gssd_refresh_krb5_machine_credential: no usable keytab entry found in keytab /etc/krb5.keytab for connection with host localhost 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN :gssd_refresh_krb5_machine_credential failed (-1765328160:2) 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] nfs_Start_threads :THREAD :EVENT :Starting delayed executor. 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[dbus] gsh_dbus_thread :DBUS :CRIT :DBUS not initialized, service thread exiting 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[dbus] gsh_dbus_thread :DBUS :EVENT :shutdown 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] nfs_Start_threads :THREAD :EVENT :admin thread was started successfully 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] nfs_Start_threads :THREAD :EVENT :General fridge was started successfully 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0) 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] nfs_start :NFS STARTUP :EVENT :------------------------------------------------- 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] nfs_start :NFS STARTUP :EVENT : NFS SERVER INITIALIZED 42:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[main] nfs_start :NFS STARTUP :EVENT :------------------------------------------------- 42:40 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0) 42:50 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0) 43:00 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0) 43:10 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0) 43:20 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0) 43:30 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0) 43:40 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0) 43:50 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0) 44:00 : epoch 651f1fb6 : argo018 : ganesha.nfsd-2[reaper] nfs_try_lift_grace :STATE :EVENT :check grace:reclaim complete(0) clid count(0) =============== Ganesha.log of argo019 machine Oct 05 20:43:16 argo019 bash[4042305]: 92df895698f29bef0cc23a351104616ea136ce033668e3c04edbd8cd7832a521 Oct 05 20:43:16 argo019 podman[4042305]: 2023-10-05 20:43:16.881949627 +0000 UTC m=+0.025476943 image pull registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:59d44bf3148860191743c61856075f58315e4ecc425ef436ba32a12e6790e4d4 Oct 05 20:43:16 argo019 systemd[1]: Started Ceph nfs.nfsganesha.1.0.argo019.pfkqrp for 0662c37c-4315-11ee-b3bd-ac1f6b0a1844. Oct 05 20:43:16 argo019 ceph-0662c37c-4315-11ee-b3bd-ac1f6b0a1844-nfs-nfsganesha-1-0-argo019-pfkqrp[4042327]: 05/10/2023 20:43:16 : epoch 651f1fe4 : argo019 : ganesha.nfsd-2[main] init_logging :LOG :NULL :LOG: Setting log level for all components to NIV_EVENT Oct 05 20:43:16 argo019 ceph-0662c37c-4315-11ee-b3bd-ac1f6b0a1844-nfs-nfsganesha-1-0-argo019-pfkqrp[4042327]: 05/10/2023 20:43:16 : epoch 651f1fe4 : argo019 : ganesha.nfsd-2[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 5.5 Oct 05 20:43:16 argo019 ceph-0662c37c-4315-11ee-b3bd-ac1f6b0a1844-nfs-nfsganesha-1-0-argo019-pfkqrp[4042327]: 05/10/2023 20:43:16 : epoch 651f1fe4 : argo019 : ganesha.nfsd-2[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed Oct 05 20:43:16 argo019 ceph-0662c37c-4315-11ee-b3bd-ac1f6b0a1844-nfs-nfsganesha-1-0-argo019-pfkqrp[4042327]: 05/10/2023 20:43:16 : epoch 651f1fe4 : argo019 : ganesha.nfsd-2[main] monitoring_init :NFS STARTUP :EVENT :Init monitoring at 0.0.0.0:9587 Oct 05 20:43:16 argo019 ceph-0662c37c-4315-11ee-b3bd-ac1f6b0a1844-nfs-nfsganesha-1-0-argo019-pfkqrp[4042327]: 05/10/2023 20:43:16 : epoch 651f1fe4 : argo019 : ganesha.nfsd-2[main] fsal_init_fds_limit :MDCACHE LRU :EVENT :Setting the system-imposed limit on FDs to 1048576. Oct 05 20:43:16 argo019 ceph-0662c37c-4315-11ee-b3bd-ac1f6b0a1844-nfs-nfsganesha-1-0-argo019-pfkqrp[4042327]: 05/10/2023 20:43:16 : epoch 651f1fe4 : argo019 : ganesha.nfsd-2[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper. Oct 05 20:43:16 argo019 ceph-0662c37c-4315-11ee-b3bd-ac1f6b0a1844-nfs-nfsganesha-1-0-argo019-pfkqrp[4042327]: 05/10/2023 20:43:16 : epoch 651f1fe4 : argo019 : ganesha.nfsd-2[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized. Oct 05 20:43:16 argo019 ceph-0662c37c-4315-11ee-b3bd-ac1f6b0a1844-nfs-nfsganesha-1-0-argo019-pfkqrp[4042327]: 05/10/2023 20:43:16 : epoch 651f1fe4 : argo019 : ganesha.nfsd-2[main] rados_kv_connect :CLIENT ID :EVENT :Failed to connect: -13 Oct 05 20:43:16 argo019 ceph-0662c37c-4315-11ee-b3bd-ac1f6b0a1844-nfs-nfsganesha-1-0-argo019-pfkqrp[4042327]: 05/10/2023 20:43:16 : epoch 651f1fe4 : argo019 : ganesha.nfsd-2[main] rados_cluster_init :CLIENT ID :EVENT :Failed to connect to cluster: -13 Oct 05 20:43:17 argo019 podman[4042353]: 2023-10-05 20:43:17.366389648 +0000 UTC m=+0.032668356 container died 92df895698f29bef0cc23a351104616ea136ce033668e3c04edbd8cd7832a521 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:59d44bf3148860191743c61856075f58315e4ecc425ef436ba32a12e6790e4d4, name=ceph-0662c37c-4315-11ee-b3bd-ac1f6b0a1844-nfs-nfsganesha-1-0-argo019-pfkqrp, RELEASE=main, summary=Provides the latest Red Hat Ceph Storage 7 on RHEL 9 in a fully featured and supported base image., CEPH_POINT_RELEASE=, GIT_BRANCH=main, com.redhat.license_terms=https://www.redhat.com/agreements, io.openshift.tags=rhceph ceph, vendor=Red Hat, Inc., GIT_REPO=https://github.com/ceph/ceph-container.git, name=rhceph, build-date=2023-10-04T22:16:49, distribution-scope=public, com.redhat.component=rhceph-container, io.k8s.description=Red Hat Ceph Storage 7, io.k8s.display-name=Red Hat Ceph Storage 7 on RHEL 9, description=Red Hat Ceph Storage 7, io.openshift.expose-services=, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/7-81.1.TEST.gerrithub1170100, vcs-type=git, GIT_CLEAN=True, ceph=True, io.buildah.version=1.29.0, architecture=x86_64, GIT_COMMIT=54fe819971d3d2dbde321203c5644c08d10742d5, version=7, vcs-ref=6a3109234de1e767361375a550322ef998fe07ed, maintainer=Guillaume Abrioux <gabrioux>, release=81.1.TEST.gerrithub1170100) Oct 05 20:43:17 argo019 podman[4042353]: 2023-10-05 20:43:17.381389564 +0000 UTC m=+0.047668269 container remove 92df895698f29bef0cc23a351104616ea136ce033668e3c04edbd8cd7832a521 (image=registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:59d44bf3148860191743c61856075f58315e4ecc425ef436ba32a12e6790e4d4, name=ceph-0662c37c-4315-11ee-b3bd-ac1f6b0a1844-nfs-nfsganesha-1-0-argo019-pfkqrp, build-date=2023-10-04T22:16:49, GIT_REPO=https://github.com/ceph/ceph-container.git, url=https://access.redhat.com/containers/#/registry.access.redhat.com/rhceph/images/7-81.1.TEST.gerrithub1170100, description=Red Hat Ceph Storage 7, vendor=Red Hat, Inc., GIT_BRANCH=main, name=rhceph, release=81.1.TEST.gerrithub1170100, io.openshift.tags=rhceph ceph, com.redhat.license_terms=https://www.redhat.com/agreements, io.k8s.display-name=Red Hat Ceph Storage 7 on RHEL 9, io.buildah.version=1.29.0, vcs-ref=6a3109234de1e767361375a550322ef998fe07ed, vcs-type=git, GIT_COMMIT=54fe819971d3d2dbde321203c5644c08d10742d5, distribution-scope=public, RELEASE=main, com.redhat.component=rhceph-container, ceph=True, version=7, maintainer=Guillaume Abrioux <gabrioux>, io.openshift.expose-services=, CEPH_POINT_RELEASE=, io.k8s.description=Red Hat Ceph Storage 7, architecture=x86_64, GIT_CLEAN=True, summary=Provides the latest Red Hat Ceph Storage 7 on RHEL 9 in a fully featured and supported base image.) Oct 05 20:43:17 argo019 systemd[1]: ceph-0662c37c-4315-11ee-b3bd-ac1f6b0a1844.1.0.argo019.pfkqrp.service: Main process exited, code=exited, status=139/n/a Oct 05 20:43:17 argo019 systemd[1]: ceph-0662c37c-4315-11ee-b3bd-ac1f6b0a1844.1.0.argo019.pfkqrp.service: Failed with result 'exit-code'. Oct 05 20:43:27 argo019 systemd[1]: ceph-0662c37c-4315-11ee-b3bd-ac1f6b0a1844.1.0.argo019.pfkqrp.service: Scheduled restart job, restart counter is at 5. Oct 05 20:43:27 argo019 systemd[1]: Stopped Ceph nfs.nfsganesha.1.0.argo019.pfkqrp for 0662c37c-4315-11ee-b3bd-ac1f6b0a1844. Oct 05 20:43:27 argo019 systemd[1]: ceph-0662c37c-4315-11ee-b3bd-ac1f6b0a1844.1.0.argo019.pfkqrp.service: Start request repeated too quickly. Oct 05 20:43:27 argo019 systemd[1]: ceph-0662c37c-4315-11ee-b3bd-ac1f6b0a1844.1.0.argo019.pfkqrp.service: Failed with result 'exit-code'. Oct 05 20:43:27 argo019 systemd[1]: Failed to start Ceph nfs.nfsganesha.1.0.argo019.pfkqrp for 0662c37c-4315-11ee-b3bd-ac1f6b0a1844.
The most straightforward approach to accomplish this task is by directly editing the export file for each export. However, I'm having difficulty determining the exact method to achieve this. The steps I've attempted so far haven't successfully updated the export block, particularly in terms of setting "cmount_path" to "/" and removing the "user_id" field. I have been successful in modifying other entries, such as changing permissions to read-only (RO) and adding client permissions. Could you please provide guidance on how to make this work in my test environment for testing the patch? I'm uncertain about whom I should contact to gain a better understanding of the RADOS functioning to update the Ganesha export file.
The best bet for immediate might be to set up exports just using sub-directories. Add the following to your ganesha.conf: CEPH { # change this to the actual location of your ceph configuration # that should give Ganesha access to the MDS and any cephx keys required. ceph_conf = "/etc/ceph/ceph.conf"; } Then add a ceph root export: { # pick an export id: Export_Id = 99999; # Exported path (mandatory) Path = "/"; # Pseudo Path (required for NFS v4) Pseudo = "/ceph"; # Required for access (default is None) # Could use CLIENT blocks instead Access_Type = RW; SecType = "sys"; Protocols = 4; Squash = No_Root_Squash; # Exporting FSAL FSAL { Name = CEPH; cmount_path = "/"; } } Then mount that from a client and create directories for your exports: mount server:/ceph /mnt mkdir /mnt/exp001 mkdir /mnt/exp002 etc. You may want to write a script to create 100 directories... Then add an export for each: { # change this for each export Export_Id = 1001; # Exported path (mandatory), change for each export Path = "/exp001"; # Pseudo Path (required for NFS v4), change for each export Pseudo = "/ceph/exp001"; # Required for access (default is None) # Could use CLIENT blocks instead Access_Type = RW; SecType = "sys"; Protocols = 4; Squash = No_Root_Squash; # Exporting FSAL FSAL { Name = CEPH; cmount_path = "/"; } } After adding those to the config file, you can either restart Ganesha, or invoke dynamic export config with SIGHUP: killall -HUP ganesha.nfsd That SHOULD get you a working setup with 100 exports. Alternatively, contact Mark Kogan for how he set his system up.
Logged into the setup: 1. Found that NFS Ganesha container was running on cali019 node only and not running on cali015 node [root@cali015 ~]# cephadm ls | grep nfs "name": "nfs.cephfs-nfs.0.0.cali015.gxdbuc", "systemd_unit": "ceph-4e687a60-638e-11ee-8772-b49691cee574.0.0.cali015.gxdbuc", "service_name": "nfs.cephfs-nfs", "name": "haproxy.nfs.cephfs-nfs.cali015.xmcwkq", "systemd_unit": "ceph-4e687a60-638e-11ee-8772-b49691cee574.cephfs-nfs.cali015.xmcwkq", "service_name": "ingress.nfs.cephfs-nfs", "name": "keepalived.nfs.cephfs-nfs.cali015.amerio", "systemd_unit": "ceph-4e687a60-638e-11ee-8772-b49691cee574.cephfs-nfs.cali015.amerio", "service_name": "ingress.nfs.cephfs-nfs", [root@cali015 ~]# cephadm logs --name haproxy.nfs.cephfs-nfs.cali015.xmcwkq Inferring fsid 4e687a60-638e-11ee-8772-b49691cee574 -- No entries -- [root@cali015 ~]# cephadm logs --name nfs.cephfs-nfs.0.0.cali015.gxdbuc Inferring fsid 4e687a60-638e-11ee-8772-b49691cee574 -- No entries -- 2. On node cali019, NFS Ganesha faced syntax errors while processing the exports Nov 17 10:57:05 cali019 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-1-0-cali019-tptjzg[530205]: 17/11/2023 10:57:05 : epoch 65530ef4 : cali019 : ganesha.nfsd-2[sigmgr] sigmgr_thread :MAIN :EVENT :SIGHUP_HANDLER: Received SIGHUP.... initiating export list reload Nov 17 10:57:05 cali019 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-1-0-cali019-tptjzg[530205]: 17/11/2023 10:57:05 : epoch 65530ef4 : cali019 : ganesha.nfsd-2[sigmgr] ganesha_yyerror :CONFIG :CRIT :Config file (rados://.nfs/cephfs-nfs/conf-nfs.cephfs-nfs:1) error: syntax error Nov 17 10:57:05 cali019 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-1-0-cali019-tptjzg[530205]: Nov 17 10:57:05 cali019 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-1-0-cali019-tptjzg[530205]: 17/11/2023 10:57:05 : epoch 65530ef4 : cali019 : ganesha.nfsd-2[sigmgr] reread_config :CONFIG :CRIT :Error while parsing new configuration file /etc/ganesha/ganesha.conf Nov 17 10:57:05 cali019 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-1-0-cali019-tptjzg[530205]: 17/11/2023 10:57:05 : epoch 65530ef4 : cali019 : ganesha.nfsd-2[sigmgr] config_errs_to_log :CONFIG :CRIT :Config File (rados://.nfs/cephfs-nfs/conf-nfs.cephfs-nfs:1): Unexpected character (%) Nov 17 10:57:05 cali019 ceph-4e687a60-638e-11ee-8772-b49691cee574-nfs-cephfs-nfs-1-0-cali019-tptjzg[530205]: 17/11/2023 10:57:05 : epoch 65530ef4 : cali019 : ganesha.nfsd-2[sigmgr] config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:0): Configuration syntax errors found In summary, NFS Ganesha process is running but not serving any export.
More details : NFS Ganesha refers to file "/etc/ganesha/ganesha.conf" This file has below entries (the last entry points to exports related config) # cat /etc/ganesha/ganesha.conf # This file is generated by cephadm. NFS_CORE_PARAM { Enable_NLM = false; Enable_RQUOTA = false; Protocols = 4; NFS_Port = 12049; } NFSv4 { Delegations = false; RecoveryBackend = 'rados_cluster'; Minor_Versions = 1, 2; } RADOS_KV { UserId = "nfs.cephfs-nfs.1.0.cali019.tptjzg"; nodeid = "nfs.cephfs-nfs.1"; pool = ".nfs"; namespace = "cephfs-nfs"; } RADOS_URLS { UserId = "nfs.cephfs-nfs.1.0.cali019.tptjzg"; watch_url = "rados://.nfs/cephfs-nfs/conf-nfs.cephfs-nfs"; } RGW { cluster = "ceph"; name = "client.nfs.cephfs-nfs.1.0.cali019.tptjzg-rgw"; } %url rados://.nfs/cephfs-nfs/conf-nfs.cephfs-nfs The error is reported for 1st line in the file/rados object "rados://.nfs/cephfs-nfs/conf-nfs.cephfs-nfs" And the rados object has below contents, which is leading to this issue : [ceph: root@cali013 /]# rados ls -p .nfs --all | grep conf cephfs-nfs conf-nfs.cephfs-nfs [ceph: root@cali013 /]# rados get -N cephfs-nfs -p ".nfs" conf-nfs.cephfs-nfs /nfs_conf [ceph: root@cali013 /]# cat /nfs_conf %%url "rados://.nfs/cephfs-nfs/export-1" <=problematic extra % %url "rados://.nfs/cephfs-nfs/export-2" %url "rados://.nfs/cephfs-nfs/export-3" %url "rados://.nfs/cephfs-nfs/export-4" ... ...
This is needed for 7.0. I don't think it needs doc text
That makes sense. So this is a setup issue and not a bug. Can we close this as not a bug?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:7780