Created attachment 1044954 [details] coredump of nfs-ganesha process Description of problem: I tried to execute volume start twice and saw a nfs-ganesha coredump. (gdb) bt #0 0x00000030ba632625 in raise () from /lib64/libc.so.6 #1 0x00000030ba633e05 in abort () from /lib64/libc.so.6 #2 0x00000030ba62b74e in __assert_fail_base () from /lib64/libc.so.6 #3 0x00000030ba62b810 in __assert_fail () from /lib64/libc.so.6 #4 0x000000000051a2c1 in free_export () #5 0x00000000005070b9 in export_init () #6 0x0000000000534597 in proc_block () #7 0x000000000053526d in load_config_from_node () #8 0x000000000051c393 in gsh_export_addexport () #9 0x000000000052ed50 in dbus_message_entrypoint () #10 0x00000030bda1cefe in ?? () from /lib64/libdbus-1.so.3 #11 0x00000030bda10b4c in dbus_connection_dispatch () from /lib64/libdbus-1.so.3 #12 0x00000030bda10dd9 in ?? () from /lib64/libdbus-1.so.3 #13 0x000000000052f913 in gsh_dbus_thread () #14 0x00000030baa07a51 in start_thread () from /lib64/libpthread.so.0 #15 0x00000030ba6e896d in clone () from /lib64/libc.so.6 Version-Release number of selected component (if applicable): glusterfs-3.7.1-6.el6rhs.x86_64 nfs-ganesha-2.2.0-3.el6rhs.x86_64 How reproducible: always Actual results: coredump as mentioned with description section. ganesha.log, 01/07/2015 13:59:32 : epoch 559386bb : nfs11 : ganesha.nfsd-13136[dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume vol2 exported at : '/' 01/07/2015 13:59:37 : epoch 559386bb : nfs11 : ganesha.nfsd-13136[dbus_heartbeat] export_commit_common :CONFIG :CRIT :Pseudo path (/vol2) is a duplicate 01/07/2015 13:59:37 : epoch 559386bb : nfs11 : ganesha.nfsd-13136[dbus_heartbeat] export_commit_common :CONFIG :CRIT :Duplicate export id = 12 Expected results: volume start is not suppose to cause issue with nfs-ganesha process. Additional info:
Hit the same backtrace again, but with different scenario: (gdb) bt #0 0x0000003a9dc32625 in raise () from /lib64/libc.so.6 #1 0x0000003a9dc33e05 in abort () from /lib64/libc.so.6 #2 0x0000003a9dc2b74e in __assert_fail_base () from /lib64/libc.so.6 #3 0x0000003a9dc2b810 in __assert_fail () from /lib64/libc.so.6 #4 0x000000000051a2c1 in free_export () #5 0x00000000005070b9 in export_init () #6 0x0000000000534597 in proc_block () #7 0x000000000053526d in load_config_from_node () #8 0x000000000051c393 in gsh_export_addexport () #9 0x000000000052ed50 in dbus_message_entrypoint () #10 0x0000003aa041cefe in ?? () from /lib64/libdbus-1.so.3 #11 0x0000003aa0410b4c in dbus_connection_dispatch () from /lib64/libdbus-1.so.3 #12 0x0000003aa0410dd9 in ?? () from /lib64/libdbus-1.so.3 #13 0x000000000052f913 in gsh_dbus_thread () #14 0x0000003a9e007a51 in start_thread () from /lib64/libpthread.so.0 #15 0x0000003a9dce896d in clone () from /lib64/libc.so.6 Steps: Run an automated test for selfheal: 1. Create a 6x2 dit-rep volume, enable ganesha and mount it 2. While creating some directories/files, kill 1 brick process from each of replica-pair 3. Allow I/O to complete and start self-heal 4. Self-heal completes successfully 5. Create a new volume again, mount fails 6. Ganesha process crashes on all the nodes Not raising a new bug since the backtrace is same
This issue is fixed as part of bug1237053
This issue is fixed and it works only when SELinux is in permissive mode
With the SElinux workaround, Apeksha has been able to verify the bug. Keeping the bug state the same until the SElinux fix is available in a build for RHEL6.7
(In reply to Soumya Koduri from comment #3) > This issue is fixed as part of bug1237053 Can this be closed as a duplicate?
In enforcing mode after using the workaround mentioned in the bug -https://bugzilla.redhat.com/show_bug.cgi?id=1239017 , i dont see any avc showmount errors. Also tried executing volume force multiple times, ganesha process dint crash
The fix is already available downstream but it does not work when SElinux is in enforcing mode. It works fine in permissive mode. The fix will be available when we have the next SElinux build.
Doc text is edited. Please sign off to be included in Known Issues.
Updated the doc text. Kindly verify the same.
doc text looks good to me.
Attached to RHGS 3.1 Update 1 (z-stream) Tracker BZ
Since the SElinux fixes are available, moving it to ON_QA.
Verified on glusterfs-3.7.1-12.el7rhgs.x86_64
Jiffin, Could you review and sign-off the edited doc text?
Looks good to me, verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1845.html