Bug 1570084 - [Ganesha] Rebooted node doesn't sync ongoing volume export entries in "showmount -e" in parallel when node is coming up
Summary: [Ganesha] Rebooted node doesn't sync ongoing volume export entries in "showmo...
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nfs-ganesha
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Kaleb KEITHLEY
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On:
Blocks: 1503143
TreeView+ depends on / blocked
 
Reported: 2018-04-20 15:24 UTC by Manisha Saini
Modified: 2019-04-29 11:56 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Cause: Exporting volume before completing ganesha start up Consequence: dbus command use to send for exporting the volume will fail Workaround (if any): Need to restart nfs-ganesha process Result: Volume export will fail
Clone Of:
Environment:
Last Closed: 2019-04-29 11:56:07 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Manisha Saini 2018-04-20 15:24:43 UTC
Description of problem:

When the node is coming up post reboot,It doesn't sync ongoing volume export entries in parallel in "showmount -e" (on rebooted node).Since its in process of exporting the older volumes (which were exported before node is rebooted),it skips the volume entries export in showmount -e which are ongoing in parallel on other 3 nodes.

When ~ 25 volumes are exported,it will skip 1-2 volume entries.But when it has around ~100 volume export ongoing it skips around 5-6 volume entries.

This mismatch in export entries across node will cause problem in volume unexport (stagging failed errors)


Version-Release number of selected component (if applicable):

# rpm -qa | grep ganesha
glusterfs-ganesha-3.12.2-7.el7rhgs.x86_64
nfs-ganesha-gluster-2.5.5-5.el7rhgs.x86_64
nfs-ganesha-2.5.5-5.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.5.5-5.el7rhgs.x86_64



How reproducible:
3/3

Steps to Reproduce:
1.Create 4 node ganesha cluster
2.Create and start ~100 volume
3.Start exporting all ~100 volumes in loop.
4.When export is happening reboot the node

Actual results:

Some of the volume export entries failed to reflect in "showmount -e" on the rebooted node.


Expected results:
All volume exported via ganesha should reflect on rebooted node.


Additional info:
--
Rebooted node snippet where volume entries from vol58 to vol61 are missing 
--

----------------------------------
/vol37 (everyone)
/vol38 (everyone)
/vol39 (everyone)
/vol40 (everyone)
/vol41 (everyone)
/vol42 (everyone)
/vol43 (everyone)
/vol44 (everyone)
/vol45 (everyone)
/vol46 (everyone)
/vol47 (everyone)
/vol48 (everyone)
/vol49 (everyone)
/vol50 (everyone)
/vol51 (everyone)
/vol52 (everyone)
/vol53 (everyone)
/vol54 (everyone)
/vol55 (everyone)
/vol56 (everyone)
/vol57 (everyone)
/vol62 (everyone)
/vol63 (everyone)
/vol64 (everyone)

-------------------------

One of the other 3 nodes-

-------------

/vol38 (everyone)
/vol39 (everyone)
/vol40 (everyone)
/vol41 (everyone)
/vol42 (everyone)
/vol43 (everyone)
/vol44 (everyone)
/vol45 (everyone)
/vol46 (everyone)
/vol47 (everyone)
/vol48 (everyone)
/vol49 (everyone)
/vol50 (everyone)
/vol51 (everyone)
/vol52 (everyone)
/vol53 (everyone)
/vol54 (everyone)
/vol55 (everyone)
/vol56 (everyone)
/vol57 (everyone)
/vol58 (everyone)
/vol59 (everyone)
/vol60 (everyone)
/vol61 (everyone)
/vol62 (everyone)

---------------

Rebooted node ganesha.log snippet
---------------------------------

20/04/2018 20:47:26 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[main] glusterfs_create_export :FSAL :EVENT :Volume vol56 exported at : '/'
20/04/2018 20:47:29 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[main] glusterfs_create_export :FSAL :EVENT :Volume vol57 exported at : '/'
20/04/2018 20:47:31 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[main] lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed for proper quota management in FSAL
20/04/2018 20:47:31 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[main] lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
20/04/2018 20:47:32 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[main] nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs
20/04/2018 20:47:32 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized
20/04/2018 20:47:32 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[main] nfs4_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 90
20/04/2018 20:47:32 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory (/var/run/ganesha) already exists
20/04/2018 20:47:32 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN :gssd_refresh_krb5_machine_credential failed (-1765328160:0)
20/04/2018 20:47:32 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[main] nfs_Start_threads :THREAD :EVENT :Starting delayed executor.
20/04/2018 20:47:33 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[main] nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully
20/04/2018 20:47:33 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[main] nfs_Start_threads :THREAD :EVENT :admin thread was started successfully
20/04/2018 20:47:33 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[main] nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully
20/04/2018 20:47:33 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[main] nfs_Start_threads :THREAD :EVENT :General fridge was started successfully
20/04/2018 20:47:33 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
20/04/2018 20:47:33 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[main] nfs_start :NFS STARTUP :EVENT :             NFS SERVER INITIALIZED
20/04/2018 20:47:33 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
20/04/2018 20:47:33 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE
20/04/2018 20:47:49 : epoch c6970000 : dhcp37-218.lab.eng.blr.redhat.com : ganesha.nfsd-4152[dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume vol62 exported at : '/'

-------------------------------

Comment 2 Daniel Gryniewicz 2018-04-20 16:59:19 UTC
These exports are being added by dbus, correct?  I think there's a window between when exports are read on startup, and when the DBUS thread is started during which exports could be lost.

I'm tempted to say "don't do that".  I'm not sure we can close this hole entirely.  Frank, what are you're thoughts?


Note You need to log in before you can comment on or make changes to this bug.