Bug 1325975

Summary: nfs-ganesha crashes with segfault error while doing refresh config on volume.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Shashank Raj <sraj>
Component: nfs-ganeshaAssignee: Soumya Koduri <skoduri>
Status: CLOSED ERRATA QA Contact: Shashank Raj <sraj>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: asrivast, jthottan, kkeithle, ndevos, nlevinki, rhinduja, sashinde, skoduri, smohan
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.1.3   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.7.9-3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1326627 (view as bug list) Environment:
Last Closed: 2016-06-23 05:16:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1330892    
Bug Blocks: 1311817, 1326627    

Description Shashank Raj 2016-04-11 13:59:13 UTC
Description of problem:
nfs-ganesha crashes with segfault error while doing refresh config on volume.

Version-Release number of selected component (if applicable):
glusterfs-3.7.9-1

How reproducible:
Always

Steps to Reproduce:
1.configure a 4 node cluster with ganesha.
2.perform the root squash test case with ID 342834 which does as below:

2016-04-11 18:34:01,242 INFO run "gluster volume create newvolume replica 2     transport tcp  dhcp37-180.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick0 dhcp37-158.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick1 dhcp37-127.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick2 dhcp37-174.lab.eng.blr.redhat.com:/bricks/brick0/newvolume_brick3 dhcp37-180.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick4 dhcp37-158.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick5 dhcp37-127.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick6 dhcp37-174.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick7 dhcp37-180.lab.eng.blr.redhat.com:/bricks/brick3/newvolume_brick8 dhcp37-158.lab.eng.blr.redhat.com:/bricks/brick3/newvolume_brick9 dhcp37-127.lab.eng.blr.redhat.com:/bricks/brick3/newvolume_brick10 dhcp37-174.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick11 --mode=script force" on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is 
 volume create: newvolume: success: please start the volume to access data


2016-04-11 18:34:06,076 INFO run "gluster volume start newvolume " on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is 
 volume start: newvolume: success


2016-04-11 18:34:07,194 INFO run "gluster volume set newvolume ganesha.enable on --mode=script" on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is 
 volume set: success


2016-04-11 18:34:18,089 INFO run "showmount -e localhost" on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is 
 Export list for localhost:
/newvolume (everyone)


2016-04-11 18:34:18,111 INFO run Executing test -d /mnt/nfs1460379829.45 || mkdir -p /mnt/nfs1460379829.45 on dhcp37-206.lab.eng.blr.redhat.com
2016-04-11 18:34:18,127 INFO run "test -d /mnt/nfs1460379829.45 || mkdir -p /mnt/nfs1460379829.45" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0
2016-04-11 18:34:18,127 INFO run Executing mount -t nfs -o vers=4 10.70.36.217:/newvolume /mnt/nfs1460379829.45 on dhcp37-206.lab.eng.blr.redhat.com
2016-04-11 18:34:18,252 INFO run "mount -t nfs -o vers=4 10.70.36.217:/newvolume /mnt/nfs1460379829.45" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0
2016-04-11 18:34:18,252 INFO run Executing dd if=/dev/zero of=/mnt/nfs1460379829.45/file.dd bs=1024 count=1024 on dhcp37-206.lab.eng.blr.redhat.com
2016-04-11 18:34:18,416 INFO run "dd if=/dev/zero of=/mnt/nfs1460379829.45/file.dd bs=1024 count=1024" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0
2016-04-11 18:34:18,416 ERROR run "dd if=/dev/zero of=/mnt/nfs1460379829.45/file.dd bs=1024 count=1024" on dhcp37-206.lab.eng.blr.redhat.com: STDERR is 
 1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.0906276 s, 11.6 MB/s



016-04-11 18:34:18,416 INFO run Executing mkdir /mnt/nfs1460379829.45/dir on dhcp37-206.lab.eng.blr.redhat.com
2016-04-11 18:34:18,456 INFO run "mkdir /mnt/nfs1460379829.45/dir" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0
2016-04-11 18:34:18,456 INFO run Executing chmod 777 /mnt/nfs1460379829.45/dir on dhcp37-206.lab.eng.blr.redhat.com
2016-04-11 18:34:18,479 INFO run "chmod 777 /mnt/nfs1460379829.45/dir" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0



2016-04-11 18:34:18,479 INFO run Executing sed -i  s/'Squash=.*'/'Squash="Root_squash";'/g /etc/ganesha/exports/export.newvolume.conf on dhcp37-180.lab.eng.blr.redhat.com
2016-04-11 18:34:18,496 INFO run "sed -i  s/'Squash=.*'/'Squash="Root_squash";'/g /etc/ganesha/exports/export.newvolume.conf" on dhcp37-180.lab.eng.blr.redhat.com: RETCODE is 0
2016-04-11 18:34:18,496 INFO set_root_squash edited the export file newvolume successfully for rootsquash on
2016-04-11 18:34:18,496 INFO run Executing /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume on dhcp37-180.lab.eng.blr.redhat.com
2016-04-11 18:34:39,136 INFO run "/usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume" on dhcp37-180.lab.eng.blr.redhat.com: RETCODE is 0
2016-04-11 18:34:39,136 INFO run "/usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume" on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is 
 Refresh-config completed on dhcp37-127.
Refresh-config completed on dhcp37-158.
Refresh-config completed on dhcp37-174.
Success: refresh-config completed.



2016-04-11 18:34:44,141 INFO run Executing dd if=/dev/zero of=/mnt/nfs1460379829.45/dir/file.1 bs=1024 count=1024 on dhcp37-206.lab.eng.blr.redhat.com
2016-04-11 18:34:44,300 INFO run "dd if=/dev/zero of=/mnt/nfs1460379829.45/dir/file.1 bs=1024 count=1024" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0
2016-04-11 18:34:44,300 ERROR run "dd if=/dev/zero of=/mnt/nfs1460379829.45/dir/file.1 bs=1024 count=1024" on dhcp37-206.lab.eng.blr.redhat.com: STDERR is 
 1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.0852137 s, 12.3 MB/s



2016-04-11 18:34:44,300 INFO run Executing mkdir /mnt/nfs1460379829.45/dir/dir1 on dhcp37-206.lab.eng.blr.redhat.com
2016-04-11 18:34:44,337 INFO run "mkdir /mnt/nfs1460379829.45/dir/dir1" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0
2016-04-11 18:34:44,337 INFO run Executing touch /mnt/nfs1460379829.45/dir/dir1/file.1 on dhcp37-206.lab.eng.blr.redhat.com
2016-04-11 18:34:44,364 INFO run "touch /mnt/nfs1460379829.45/dir/dir1/file.1" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0
2016-04-11 18:34:44,428 INFO RootSquashEnable342834 posix.stat_result(st_mode=33188, st_ino=-8765314141872974386, st_dev=39L, st_nlink=1, st_uid=4294967294, st_gid=4294967294, st_size=0, st_atime=1460341524, st_mtime=1460341524, st_ctime=1460341523)
2016-04-11 18:34:44,431 INFO RootSquashEnable342834 4294967294
2016-04-11 18:34:44,432 INFO RootSquashEnable342834 4294967294
2016-04-11 18:34:44,435 INFO RootSquashEnable342834 file.1 has got the correct uid/gid post root-squash is put to on




2016-04-11 18:34:44,435 INFO run Executing service glusterd stop on dhcp37-180.lab.eng.blr.redhat.com
2016-04-11 18:34:44,548 INFO run "service glusterd stop" on dhcp37-180.lab.eng.blr.redhat.com: RETCODE is 0
2016-04-11 18:34:44,549 ERROR run "service glusterd stop" on dhcp37-180.lab.eng.blr.redhat.com: STDERR is 
 Redirecting to /bin/systemctl stop  glusterd.service

2016-04-11 18:34:54,559 INFO run Executing service glusterd start on dhcp37-180.lab.eng.blr.redhat.com
2016-04-11 18:35:00,530 INFO run "service glusterd start" on dhcp37-180.lab.eng.blr.redhat.com: RETCODE is 0
2016-04-11 18:35:00,530 ERROR run "service glusterd start" on dhcp37-180.lab.eng.blr.redhat.com: STDERR is 
 Redirecting to /bin/systemctl start  glusterd.service



2016-04-11 18:53:00,671 INFO RootSquashEnable342834 posix.stat_result(st_mode=33188, st_ino=-8765314141872974386, st_dev=39L, st_nlink=1, st_uid=4294967294, st_gid=4294967294, st_size=0, st_atime=1460341524, st_mtime=1460341524, st_ctime=1460341523)
2016-04-11 18:53:00,713 INFO RootSquashEnable342834 4294967294
2016-04-11 18:53:00,714 INFO RootSquashEnable342834 4294967294
2016-04-11 18:53:00,716 INFO RootSquashEnable342834 file.1 has got the correct uid/gid post root-squash is put to on



2016-04-11 18:53:00,716 INFO run Executing sed -i  s/'Squash=.*'/'Squash="No_root_squash";'/g /etc/ganesha/exports/export.newvolume.conf on dhcp37-180.lab.eng.blr.redhat.com
2016-04-11 18:53:00,734 INFO run "sed -i  s/'Squash=.*'/'Squash="No_root_squash";'/g /etc/ganesha/exports/export.newvolume.conf" on dhcp37-180.lab.eng.blr.redhat.com: RETCODE is 0
2016-04-11 18:53:00,734 INFO set_root_squash edited the export file newvolume successfully for rootsquash off
2016-04-11 18:53:00,734 INFO run Executing /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume on dhcp37-180.lab.eng.blr.redhat.com
2016-04-11 18:53:43,711 INFO run "/usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume" on dhcp37-180.lab.eng.blr.redhat.com: RETCODE is 1
2016-04-11 18:53:43,711 INFO run "/usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume" on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is 
 Refresh-config completed on dhcp37-127.
Refresh-config completed on dhcp37-158.
Refresh-config completed on dhcp37-174.
Error: refresh-config failed on localhost.



Observe that refresh config fails on the localhost and the nfs-ganesha process crashes with segmentation fault message.


Trace observed is as below:

(gdb) bt
#0  __inode_ref_reduce_by_n (inode=inode@entry=0x7f54c8002b78, nref=nref@entry=0) at inode.c:686
#1  0x00007f54e1ddea2a in inode_table_destroy (inode_table=0x7f54c8002b50) at inode.c:1794
#2  0x00007f54e1ddeb21 in inode_table_destroy_all (ctx=ctx@entry=0x7f54dc007050) at inode.c:1725
#3  0x00007f54e206a07f in pub_glfs_fini (fs=0x7f54dc06c040) at glfs.c:1133
#4  0x00007f54e2491051 in export_release (exp_hdl=0x7f54dc06bf30) at /usr/src/debug/nfs-ganesha-2.3.1/src/FSAL/FSAL_GLUSTER/export.c:84
#5  0x00007f54f6fcbacb in free_export_resources (export=0x7f54dc038eb8) at /usr/src/debug/nfs-ganesha-2.3.1/src/support/exports.c:1519
#6  0x00007f54f6fda9e3 in free_export (export=0x7f54dc038eb8) at /usr/src/debug/nfs-ganesha-2.3.1/src/support/export_mgr.c:252
#7  0x00007f54f6fdc6a4 in gsh_export_removeexport (args=<optimized out>, reply=<optimized out>, error=0x7f54e3e9e2e0)
    at /usr/src/debug/nfs-ganesha-2.3.1/src/support/export_mgr.c:1077
#8  0x00007f54f6fe95e9 in dbus_message_entrypoint (conn=0x7f54f75da7a0, msg=0x7f54f75dacd0, user_data=<optimized out>)
    at /usr/src/debug/nfs-ganesha-2.3.1/src/dbus/dbus_server.c:518
#9  0x00007f54f688bc86 in _dbus_object_tree_dispatch_and_unlock () from /lib64/libdbus-1.so.3
#10 0x00007f54f687de49 in dbus_connection_dispatch () from /lib64/libdbus-1.so.3
#11 0x00007f54f687e0e2 in _dbus_connection_read_write_dispatch () from /lib64/libdbus-1.so.3
#12 0x00007f54f6fea640 in gsh_dbus_thread (arg=<optimized out>) at /usr/src/debug/nfs-ganesha-2.3.1/src/dbus/dbus_server.c:743
#13 0x00007f54f54a7dc5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f54f4b761cd in clone () from /lib64/libc.so.6
(gdb) l
681	
682	        if (!nref)
683	                inode->ref = 0;
684	
685	        if (!inode->ref) {
686	                inode->table->active_size--;
687	
688	                if (inode->nlookup)
689	                        __inode_passivate (inode);
690	                else
(gdb) p inode->table
$1 = (inode_table_t *) 0x36e9



ganesha service status on the mounted node:

[root@dhcp37-180 ~]# service nfs-ganesha status -l
Redirecting to /bin/systemctl status  -l nfs-ganesha.service
● nfs-ganesha.service - NFS-Ganesha file server
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled; vendor preset: disabled)
   Active: failed (Result: signal) since Mon 2016-04-11 04:47:03 IST; 2min 25s ago
     Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki
  Process: 29889 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS)
 Main PID: 23858 (code=killed, signal=SEGV)

Apr 11 04:34:28 dhcp37-180.lab.eng.blr.redhat.com nfs-ganesha[23858]: [main] nfs_start :NFS STARTUP :EVENT :-------------------------------------------------
Apr 11 04:35:28 dhcp37-180.lab.eng.blr.redhat.com nfs-ganesha[23858]: [reaper] nfs_in_grace :STATE :EVENT :NFS Server Now NOT IN GRACE
Apr 11 04:38:48 dhcp37-180.lab.eng.blr.redhat.com nfs-ganesha[23858]: [dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume testvolume exported at : '/'
Apr 11 04:40:53 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[23858]: [dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume testvolume exported at : '/'
Apr 11 04:41:48 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[23858]: [dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume testvolume exported at : '/'
Apr 11 04:45:56 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[23858]: [dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume testvolume exported at : '/'
Apr 11 04:46:26 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[23858]: [dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume testvolume exported at : '/'
Apr 11 04:47:03 dhcp37-180.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.service: main process exited, code=killed, status=11/SEGV
Apr 11 04:47:03 dhcp37-180.lab.eng.blr.redhat.com systemd[1]: Unit nfs-ganesha.service entered failed state.
Apr 11 04:47:03 dhcp37-180.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.service failed.


message in /var/log/messages:


[ 2504.300515] ganesha.nfsd[23886]: segfault at 3759 ip 00007fdc715b5c61 sp 00007fdc73677040 error 6 in libglusterfs.so.0.0.1[7fdc7156f000+c6000]



Actual results:

nfs-ganesha crashes with segfault error while doing refresh config on volume.

Expected results:

there should not be any crash observed.

Additional info:

Issue can be easily reproduced by running case ID 342834 under rootsquash in distaf.

Comment 2 Soumya Koduri 2016-04-12 17:52:10 UTC
I could reproduce the issue - 

Breakpoint 1, inode_table_destroy (inode_table=0x7f523802db80) at inode.c:1729
1729	        inode_t  *tmp = NULL, *trav = NULL;
(gdb) b __inode_retire
Breakpoint 2 at 0x7f525a57bba1: file inode.c, line 430.
(gdb) p inode_table->active
$1 = {next = 0x7f5240001444, prev = 0x7f5238164ec4}
(gdb) p inode_table->active_size
$2 = 3
(gdb) p &inode_table->active
$3 = (struct list_head *) 0x7f523802dbe0
(gdb) p/x &inode_table->active->next-104
$4 = 0x7f523802d8a0
(gdb) p/x &inode_table->active->next->next-104
$5 = 0x7f5240001104
(gdb) p/x &inode_table->active->next->next->next-104
$6 = 0xb96ed4
(gdb) p/x &inode_table->active->next->next->next->next-104
$7 = 0x7f5238164b84
(gdb) p/x &inode_table->active->next->next->next->next->next-104
$8 = 0x7f523802d8a0
(gdb) p/x &inode_table->active->next->next->next->next
$9 = 0x7f5238164ec4
(gdb) p *(inode_t *)$4
$10 = {table = 0x7f5238019000, 
  gfid = "\276\272\376\312", '\000' <repeats 11 times>, lock = 0, nlookup = 0, 
  fd_count = 0, ref = 0, ia_type = IA_INVAL, fd_list = {next = 0x0, 
    prev = 0x7f522002a3a0}, dentry_list = {next = 0x0, prev = 0x100000001}, 
  hash = {next = 0x3802d92400000001, prev = 0x3802d92c00007f52}, list = {
    next = 0x7f52, prev = 0x0}, _ctx = 0x100000000}
(gdb) p &inode_table->active
$11 = (struct list_head *) 0x7f523802dbe0
(gdb) p &inode_table->active->next->list-104
There is no member named list.
(gdb) p &inode_table->active->next
$12 = (struct list_head **) 0x7f523802dbe0
(gdb) p inode_table->active->next
$13 = (struct list_head *) 0x7f5240001444
(gdb) p inode_table->active->next-104
$14 = (struct list_head *) 0x7f5240000dc4
(gdb) p *(inode_t *)inode_table->active->next-104
Structure has no component named operator-.
(gdb) p *(inode_t *)(inode_table->active->next-104)
$15 = {table = 0x0, gfid = '\000' <repeats 15 times>, lock = 0, nlookup = 0, 
  fd_count = 0, ref = 0, ia_type = IA_INVAL, fd_list = {next = 0x0, prev = 0x0}, 
  dentry_list = {next = 0x0, prev = 0x0}, hash = {next = 0x0, prev = 0x0}, 
  list = {next = 0x0, prev = 0x0}, _ctx = 0x0}
(gdb) p inode_table
$16 = (inode_table_t *) 0x7f523802db80
(gdb) p inode_table->active
$17 = {next = 0x7f5240001444, prev = 0x7f5238164ec4}
(gdb) p inode_table->active_size
$18 = 3
(gdb) p inode_table->active->next
$19 = (struct list_head *) 0x7f5240001444
(gdb) p inode_table->active->next->next
$20 = (struct list_head *) 0xb97214
(gdb) p inode_table->active->next->next->next
$21 = (struct list_head *) 0x7f5238164ec4
(gdb) p/x *(inode_t *) $19-104
Structure has no component named operator-.
(gdb) p/x *(inode_t *)($19-104)
$22 = {table = 0x0, gfid = {0x0 <repeats 16 times>}, lock = 0x0, nlookup = 0x0, 
  fd_count = 0x0, ref = 0x0, ia_type = 0x0, fd_list = {next = 0x0, prev = 0x0}, 
  dentry_list = {next = 0x0, prev = 0x0}, hash = {next = 0x0, prev = 0x0}, 
  list = {next = 0x0, prev = 0x0}, _ctx = 0x0}
(gdb) b __inode_retire
Note: breakpoint 2 also set at pc 0x7f525a57bba1.
Breakpoint 3 at 0x7f525a57bba1: file inode.c, line 430.
(gdb) p &$17
$23 = (struct list_head *) 0x7f523802dbe0
(gdb) p inode_table->active->next->next->next->next
$24 = (struct list_head *) 0x7f523802dbe0
(gdb) p/x *(inode_t *)(0x7f5240001444-104)
$25 = {table = 0x7f523802db80, gfid = {0xad, 0x7e, 0xb1, 0x60, 0x3, 0x1b, 0x4f, 
    0x8e, 0x99, 0xf, 0x38, 0x2e, 0xe4, 0xb8, 0xf2, 0x93}, lock = 0x1, 
  nlookup = 0x0, fd_count = 0x1, ref = 0x1, ia_type = 0x1, fd_list = {
    next = 0x7f524000177c, prev = 0x7f524000177c}, dentry_list = {
    next = 0x7f52400019ac, prev = 0x7f52400019ac}, hash = {next = 0x7f5238120700, 
    prev = 0x7f5238120700}, list = {next = 0xb97214, prev = 0x7f523802dbe0}, 
  _ctx = 0x7f52400014b0}
(gdb) p/x *(inode_t *)(0xb97214-104)
$26 = {table = 0x7f523802db80, gfid = {0x6d, 0xa, 0x7a, 0x59, 0x76, 0xf4, 0x40, 
    0x6f, 0xaf, 0x35, 0x3b, 0x8e, 0xaf, 0x78, 0xec, 0xd6}, lock = 0x1, 
  nlookup = 0x0, fd_count = 0x0, ref = 0x1, ia_type = 0x2, fd_list = {
    next = 0xb971e4, prev = 0xb971e4}, dentry_list = {next = 0xb96f5c, 
    prev = 0xb96f5c}, hash = {next = 0x7f523811ab30, prev = 0x7f523811ab30}, 
  list = {next = 0x7f5238164ec4, prev = 0x7f5240001444}, _ctx = 0xbc9b40}
(gdb) p/x *(inode_t *)(0x7f5238164ec4-104)
$27 = {table = 0x7f523802db80, gfid = {0x0 <repeats 15 times>, 0x1}, lock = 0x1, 
  nlookup = 0x0, fd_count = 0x0, ref = 0x1, ia_type = 0x2, fd_list = {
    next = 0x7f5238164e94, prev = 0x7f5238164e94}, dentry_list = {
    next = 0x7f5238164ea4, prev = 0x7f5238164ea4}, hash = {next = 0x7f523802dde0, 
    prev = 0x7f523802dde0}, list = {next = 0x7f523802dbe0, prev = 0xb97214}, 
  _ctx = 0x7f5238164f30}
(gdb) p/x (0x7f5238164ec4-104)
$28 = 0x7f5238164e5c
(gdb) p/x (0x7f5238164ec4-104)
$29 = 0x7f5238164e5c
(gdb) p/x (0xb97214-104)
$30 = 0xb971ac
(gdb) p/x (0x7f5240001444-104)
$31 = 0x7f52400013dc
(gdb) n
1731	        if (inode_table == NULL)
(gdb) 
1756	        pthread_mutex_lock (&inode_table->lock);
(gdb) 
1768	                while (!list_empty (&inode_table->lru)) {
(gdb) 
1776	                list_for_each_entry_safe (trav, tmp, &inode_table->active,
(gdb) 
1782	                        if (trav != inode_table->root)
(gdb) p trav
$32 = (inode_t *) 0x7f52400013dc
(gdb) p tmp
$33 = (inode_t *) 0xb971ac
(gdb) p *trav
$34 = {table = 0x7f523802db80, 
  gfid = "\255~\261`\003\033O\216\231\017\070.\344\270", <incomplete sequence \362\223>, lock = 1, nlookup = 0, fd_count = 1, ref = 1, ia_type = IA_IFREG, 
  fd_list = {next = 0x7f524000177c, prev = 0x7f524000177c}, dentry_list = {
    next = 0x7f52400019ac, prev = 0x7f52400019ac}, hash = {next = 0x7f5238120700, 
    prev = 0x7f5238120700}, list = {next = 0xb97214, prev = 0x7f523802dbe0}, 
  _ctx = 0x7f52400014b0}
(gdb) p *tmp
$35 = {table = 0x7f523802db80, 
  gfid = "m\nzYv\364@o\257\065;\216\257x\354", <incomplete sequence \326>, 
  lock = 1, nlookup = 0, fd_count = 0, ref = 1, ia_type = IA_IFDIR, fd_list = {
    next = 0xb971e4, prev = 0xb971e4}, dentry_list = {next = 0xb96f5c, 
    prev = 0xb96f5c}, hash = {next = 0x7f523811ab30, prev = 0x7f523811ab30}, 
  list = {next = 0x7f5238164ec4, prev = 0x7f5240001444}, _ctx = 0xbc9b40}
(gdb) c
Continuing.
[Thread 0x7f52345fb700 (LWP 5032) exited]

Breakpoint 2, __inode_retire (inode=0x7f52400013dc) at inode.c:430
430	        dentry_t      *dentry = NULL;
(gdb) c
Continuing.

Breakpoint 2, __inode_retire (inode=0xb971ac) at inode.c:430
430	        dentry_t      *dentry = NULL;
(gdb) bt
#0  __inode_retire (inode=0xb971ac) at inode.c:430
#1  0x00007f525a57bd55 in __inode_unref (inode=0xb971ac) at inode.c:473
#2  0x00007f525a57b001 in __dentry_unset (dentry=0x7f52400019ac) at inode.c:141
#3  0x00007f525a57bc78 in __inode_retire (inode=0x7f52400013dc) at inode.c:445
#4  0x00007f525a57c367 in __inode_ref_reduce_by_n (inode=0x7f52400013dc, nref=0)
    at inode.c:686
#5  0x00007f525a57e7d2 in inode_table_destroy (inode_table=0x7f523802db80)
    at inode.c:1789
#6  0x00007f525a57e63c in inode_table_destroy_all (ctx=0x7f5220003c00)
    at inode.c:1720
#7  0x00007f525e4e2e42 in pub_glfs_fini (fs=0x7f5220003a80) at glfs.c:1158
#8  0x00007f525e6c6281 in export_release (exp_hdl=0x7f5220003960)
    at /home/guest/Documents/workspace/nfs-ganesha/src/FSAL/FSAL_GLUSTER/export.c:86
#9  0x000000000050c317 in free_export_resources (export=0x7f5220000ec8)
    at /home/guest/Documents/workspace/nfs-ganesha/src/support/exports.c:1497
#10 0x000000000051d5a8 in free_export (export=0x7f5220000ec8)
    at /home/guest/Documents/workspace/nfs-ganesha/src/support/export_mgr.c:250
#11 0x000000000051e93d in put_gsh_export (export=0x7f5220000ec8)
    at /home/guest/Documents/workspace/nfs-ganesha/src/support/export_mgr.c:631
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) l
425	
426	
427	static void
428	__inode_retire (inode_t *inode)
429	{
430	        dentry_t      *dentry = NULL;
431	        dentry_t      *t = NULL;
432	
433	        if (!inode) {
434	                gf_msg_callingfn (THIS->name, GF_LOG_WARNING, 0,
(gdb) 
435	                                  LG_MSG_INODE_NOT_FOUND, "inode not found");
436	                return;
437	        }
438	
439	        list_move_tail (&inode->list, &inode->table->purge);
440	        inode->table->purge_size++;
441	
442	        __inode_unhash (inode);
443	
444	        list_for_each_entry_safe (dentry, t, &inode->dentry_list, inode_list) {
(gdb) 

As can be seen above, the 'tmp' inode entry is being moved from active list to purge list by next iteration. Since the inode entries can get moved from one list to another between iterations, its best to not fetch them early. So the fix can be to use 'list_each_entry' and is safe as there will be no other thread accessing these inodes.

Comment 3 Soumya Koduri 2016-04-13 09:12:27 UTC
Fix posted upstream - 

http://review.gluster.org/13987

Comment 5 Shashank Raj 2016-04-20 18:04:47 UTC
Even while executing gluster operations automation suite, hit below segfault issue and ganesha crashes on mounted node.

below bt is observed:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f6de71fb700 (LWP 20592)]
__inode_ref_reduce_by_n (inode=inode@entry=0x7f6d71e9b5a8, nref=nref@entry=0) at inode.c:686
686                     inode->table->active_size--;
(gdb) bt
#0  __inode_ref_reduce_by_n (inode=inode@entry=0x7f6d71e9b5a8, nref=nref@entry=0) at inode.c:686
#1  0x00007f6de4865a2a in inode_table_destroy (inode_table=0x7f6d71e9b580) at inode.c:1794
#2  0x00007f6de4865b21 in inode_table_destroy_all (ctx=ctx@entry=0x7f6de0007f90) at inode.c:1725
#3  0x00007f6de4af107f in pub_glfs_fini (fs=0x7f6de0007e30) at glfs.c:1133
#4  0x00007f6de4f18051 in export_release () from /usr/lib64/ganesha/libfsalgluster.so
#5  0x00007f6dfa327acb in free_export_resources ()
#6  0x00007f6dfa3369e3 in free_export ()
#7  0x00007f6dfa3386a4 in gsh_export_removeexport ()
#8  0x00007f6dfa3455e9 in dbus_message_entrypoint ()
#9  0x00007f6df9be7c86 in _dbus_object_tree_dispatch_and_unlock () from /lib64/libdbus-1.so.3
#10 0x00007f6df9bd9e49 in dbus_connection_dispatch () from /lib64/libdbus-1.so.3
#11 0x00007f6df9bda0e2 in _dbus_connection_read_write_dispatch () from /lib64/libdbus-1.so.3
#12 0x00007f6dfa346640 in gsh_dbus_thread ()
#13 0x00007f6df8803dc5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f6df7ed21cd in clone () from /lib64/libc.so.6

Comment 8 Shashank Raj 2016-05-03 12:27:46 UTC
Verified this bug with latest glusterfs-3.7.9-3 build and after performing rootsquash regression automated cases, did not observe this issue.

Automation test case with ID 342834, which originally was reproducing this issue, works fine now and no refresh config failure or crash seen in this case. verified on both v3 and v4 ganesha mounts.

Based on the above observation, moving this bug to verified state.

Comment 10 errata-xmlrpc 2016-06-23 05:16:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240