Description of problem: nfs-ganesha crashes with segfault error while doing refresh config on volume. Version-Release number of selected component (if applicable): glusterfs-3.7.9-1 How reproducible: Always Steps to Reproduce: 1.configure a 4 node cluster with ganesha. 2.perform the root squash test case with ID 342834 which does as below: 2016-04-11 18:34:01,242 INFO run "gluster volume create newvolume replica 2 transport tcp dhcp37-180.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick0 dhcp37-158.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick1 dhcp37-127.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick2 dhcp37-174.lab.eng.blr.redhat.com:/bricks/brick0/newvolume_brick3 dhcp37-180.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick4 dhcp37-158.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick5 dhcp37-127.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick6 dhcp37-174.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick7 dhcp37-180.lab.eng.blr.redhat.com:/bricks/brick3/newvolume_brick8 dhcp37-158.lab.eng.blr.redhat.com:/bricks/brick3/newvolume_brick9 dhcp37-127.lab.eng.blr.redhat.com:/bricks/brick3/newvolume_brick10 dhcp37-174.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick11 --mode=script force" on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is volume create: newvolume: success: please start the volume to access data 2016-04-11 18:34:06,076 INFO run "gluster volume start newvolume " on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is volume start: newvolume: success 2016-04-11 18:34:07,194 INFO run "gluster volume set newvolume ganesha.enable on --mode=script" on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is volume set: success 2016-04-11 18:34:18,089 INFO run "showmount -e localhost" on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is Export list for localhost: /newvolume (everyone) 2016-04-11 18:34:18,111 INFO run Executing test -d /mnt/nfs1460379829.45 || mkdir -p /mnt/nfs1460379829.45 on dhcp37-206.lab.eng.blr.redhat.com 2016-04-11 18:34:18,127 INFO run "test -d /mnt/nfs1460379829.45 || mkdir -p /mnt/nfs1460379829.45" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-11 18:34:18,127 INFO run Executing mount -t nfs -o vers=4 10.70.36.217:/newvolume /mnt/nfs1460379829.45 on dhcp37-206.lab.eng.blr.redhat.com 2016-04-11 18:34:18,252 INFO run "mount -t nfs -o vers=4 10.70.36.217:/newvolume /mnt/nfs1460379829.45" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-11 18:34:18,252 INFO run Executing dd if=/dev/zero of=/mnt/nfs1460379829.45/file.dd bs=1024 count=1024 on dhcp37-206.lab.eng.blr.redhat.com 2016-04-11 18:34:18,416 INFO run "dd if=/dev/zero of=/mnt/nfs1460379829.45/file.dd bs=1024 count=1024" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-11 18:34:18,416 ERROR run "dd if=/dev/zero of=/mnt/nfs1460379829.45/file.dd bs=1024 count=1024" on dhcp37-206.lab.eng.blr.redhat.com: STDERR is 1024+0 records in 1024+0 records out 1048576 bytes (1.0 MB) copied, 0.0906276 s, 11.6 MB/s 016-04-11 18:34:18,416 INFO run Executing mkdir /mnt/nfs1460379829.45/dir on dhcp37-206.lab.eng.blr.redhat.com 2016-04-11 18:34:18,456 INFO run "mkdir /mnt/nfs1460379829.45/dir" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-11 18:34:18,456 INFO run Executing chmod 777 /mnt/nfs1460379829.45/dir on dhcp37-206.lab.eng.blr.redhat.com 2016-04-11 18:34:18,479 INFO run "chmod 777 /mnt/nfs1460379829.45/dir" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-11 18:34:18,479 INFO run Executing sed -i s/'Squash=.*'/'Squash="Root_squash";'/g /etc/ganesha/exports/export.newvolume.conf on dhcp37-180.lab.eng.blr.redhat.com 2016-04-11 18:34:18,496 INFO run "sed -i s/'Squash=.*'/'Squash="Root_squash";'/g /etc/ganesha/exports/export.newvolume.conf" on dhcp37-180.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-11 18:34:18,496 INFO set_root_squash edited the export file newvolume successfully for rootsquash on 2016-04-11 18:34:18,496 INFO run Executing /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume on dhcp37-180.lab.eng.blr.redhat.com 2016-04-11 18:34:39,136 INFO run "/usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume" on dhcp37-180.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-11 18:34:39,136 INFO run "/usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume" on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is Refresh-config completed on dhcp37-127. Refresh-config completed on dhcp37-158. Refresh-config completed on dhcp37-174. Success: refresh-config completed. 2016-04-11 18:34:44,141 INFO run Executing dd if=/dev/zero of=/mnt/nfs1460379829.45/dir/file.1 bs=1024 count=1024 on dhcp37-206.lab.eng.blr.redhat.com 2016-04-11 18:34:44,300 INFO run "dd if=/dev/zero of=/mnt/nfs1460379829.45/dir/file.1 bs=1024 count=1024" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-11 18:34:44,300 ERROR run "dd if=/dev/zero of=/mnt/nfs1460379829.45/dir/file.1 bs=1024 count=1024" on dhcp37-206.lab.eng.blr.redhat.com: STDERR is 1024+0 records in 1024+0 records out 1048576 bytes (1.0 MB) copied, 0.0852137 s, 12.3 MB/s 2016-04-11 18:34:44,300 INFO run Executing mkdir /mnt/nfs1460379829.45/dir/dir1 on dhcp37-206.lab.eng.blr.redhat.com 2016-04-11 18:34:44,337 INFO run "mkdir /mnt/nfs1460379829.45/dir/dir1" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-11 18:34:44,337 INFO run Executing touch /mnt/nfs1460379829.45/dir/dir1/file.1 on dhcp37-206.lab.eng.blr.redhat.com 2016-04-11 18:34:44,364 INFO run "touch /mnt/nfs1460379829.45/dir/dir1/file.1" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-11 18:34:44,428 INFO RootSquashEnable342834 posix.stat_result(st_mode=33188, st_ino=-8765314141872974386, st_dev=39L, st_nlink=1, st_uid=4294967294, st_gid=4294967294, st_size=0, st_atime=1460341524, st_mtime=1460341524, st_ctime=1460341523) 2016-04-11 18:34:44,431 INFO RootSquashEnable342834 4294967294 2016-04-11 18:34:44,432 INFO RootSquashEnable342834 4294967294 2016-04-11 18:34:44,435 INFO RootSquashEnable342834 file.1 has got the correct uid/gid post root-squash is put to on 2016-04-11 18:34:44,435 INFO run Executing service glusterd stop on dhcp37-180.lab.eng.blr.redhat.com 2016-04-11 18:34:44,548 INFO run "service glusterd stop" on dhcp37-180.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-11 18:34:44,549 ERROR run "service glusterd stop" on dhcp37-180.lab.eng.blr.redhat.com: STDERR is Redirecting to /bin/systemctl stop glusterd.service 2016-04-11 18:34:54,559 INFO run Executing service glusterd start on dhcp37-180.lab.eng.blr.redhat.com 2016-04-11 18:35:00,530 INFO run "service glusterd start" on dhcp37-180.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-11 18:35:00,530 ERROR run "service glusterd start" on dhcp37-180.lab.eng.blr.redhat.com: STDERR is Redirecting to /bin/systemctl start glusterd.service 2016-04-11 18:53:00,671 INFO RootSquashEnable342834 posix.stat_result(st_mode=33188, st_ino=-8765314141872974386, st_dev=39L, st_nlink=1, st_uid=4294967294, st_gid=4294967294, st_size=0, st_atime=1460341524, st_mtime=1460341524, st_ctime=1460341523) 2016-04-11 18:53:00,713 INFO RootSquashEnable342834 4294967294 2016-04-11 18:53:00,714 INFO RootSquashEnable342834 4294967294 2016-04-11 18:53:00,716 INFO RootSquashEnable342834 file.1 has got the correct uid/gid post root-squash is put to on 2016-04-11 18:53:00,716 INFO run Executing sed -i s/'Squash=.*'/'Squash="No_root_squash";'/g /etc/ganesha/exports/export.newvolume.conf on dhcp37-180.lab.eng.blr.redhat.com 2016-04-11 18:53:00,734 INFO run "sed -i s/'Squash=.*'/'Squash="No_root_squash";'/g /etc/ganesha/exports/export.newvolume.conf" on dhcp37-180.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-11 18:53:00,734 INFO set_root_squash edited the export file newvolume successfully for rootsquash off 2016-04-11 18:53:00,734 INFO run Executing /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume on dhcp37-180.lab.eng.blr.redhat.com 2016-04-11 18:53:43,711 INFO run "/usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume" on dhcp37-180.lab.eng.blr.redhat.com: RETCODE is 1 2016-04-11 18:53:43,711 INFO run "/usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume" on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is Refresh-config completed on dhcp37-127. Refresh-config completed on dhcp37-158. Refresh-config completed on dhcp37-174. Error: refresh-config failed on localhost. Observe that refresh config fails on the localhost and the nfs-ganesha process crashes with segmentation fault message. Trace observed is as below: (gdb) bt #0 __inode_ref_reduce_by_n (inode=inode@entry=0x7f54c8002b78, nref=nref@entry=0) at inode.c:686 #1 0x00007f54e1ddea2a in inode_table_destroy (inode_table=0x7f54c8002b50) at inode.c:1794 #2 0x00007f54e1ddeb21 in inode_table_destroy_all (ctx=ctx@entry=0x7f54dc007050) at inode.c:1725 #3 0x00007f54e206a07f in pub_glfs_fini (fs=0x7f54dc06c040) at glfs.c:1133 #4 0x00007f54e2491051 in export_release (exp_hdl=0x7f54dc06bf30) at /usr/src/debug/nfs-ganesha-2.3.1/src/FSAL/FSAL_GLUSTER/export.c:84 #5 0x00007f54f6fcbacb in free_export_resources (export=0x7f54dc038eb8) at /usr/src/debug/nfs-ganesha-2.3.1/src/support/exports.c:1519 #6 0x00007f54f6fda9e3 in free_export (export=0x7f54dc038eb8) at /usr/src/debug/nfs-ganesha-2.3.1/src/support/export_mgr.c:252 #7 0x00007f54f6fdc6a4 in gsh_export_removeexport (args=<optimized out>, reply=<optimized out>, error=0x7f54e3e9e2e0) at /usr/src/debug/nfs-ganesha-2.3.1/src/support/export_mgr.c:1077 #8 0x00007f54f6fe95e9 in dbus_message_entrypoint (conn=0x7f54f75da7a0, msg=0x7f54f75dacd0, user_data=<optimized out>) at /usr/src/debug/nfs-ganesha-2.3.1/src/dbus/dbus_server.c:518 #9 0x00007f54f688bc86 in _dbus_object_tree_dispatch_and_unlock () from /lib64/libdbus-1.so.3 #10 0x00007f54f687de49 in dbus_connection_dispatch () from /lib64/libdbus-1.so.3 #11 0x00007f54f687e0e2 in _dbus_connection_read_write_dispatch () from /lib64/libdbus-1.so.3 #12 0x00007f54f6fea640 in gsh_dbus_thread (arg=<optimized out>) at /usr/src/debug/nfs-ganesha-2.3.1/src/dbus/dbus_server.c:743 #13 0x00007f54f54a7dc5 in start_thread () from /lib64/libpthread.so.0 #14 0x00007f54f4b761cd in clone () from /lib64/libc.so.6 (gdb) l 681 682 if (!nref) 683 inode->ref = 0; 684 685 if (!inode->ref) { 686 inode->table->active_size--; 687 688 if (inode->nlookup) 689 __inode_passivate (inode); 690 else (gdb) p inode->table $1 = (inode_table_t *) 0x36e9 ganesha service status on the mounted node: [root@dhcp37-180 ~]# service nfs-ganesha status -l Redirecting to /bin/systemctl status -l nfs-ganesha.service ● nfs-ganesha.service - NFS-Ganesha file server Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled; vendor preset: disabled) Active: failed (Result: signal) since Mon 2016-04-11 04:47:03 IST; 2min 25s ago Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki Process: 29889 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS) Main PID: 23858 (code=killed, signal=SEGV) Apr 11 04:34:28 dhcp37-180.lab.eng.blr.redhat.com nfs-ganesha[23858]: [main] nfs_start :NFS STARTUP :EVENT :------------------------------------------------- Apr 11 04:35:28 dhcp37-180.lab.eng.blr.redhat.com nfs-ganesha[23858]: [reaper] nfs_in_grace :STATE :EVENT :NFS Server Now NOT IN GRACE Apr 11 04:38:48 dhcp37-180.lab.eng.blr.redhat.com nfs-ganesha[23858]: [dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume testvolume exported at : '/' Apr 11 04:40:53 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[23858]: [dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume testvolume exported at : '/' Apr 11 04:41:48 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[23858]: [dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume testvolume exported at : '/' Apr 11 04:45:56 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[23858]: [dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume testvolume exported at : '/' Apr 11 04:46:26 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[23858]: [dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume testvolume exported at : '/' Apr 11 04:47:03 dhcp37-180.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.service: main process exited, code=killed, status=11/SEGV Apr 11 04:47:03 dhcp37-180.lab.eng.blr.redhat.com systemd[1]: Unit nfs-ganesha.service entered failed state. Apr 11 04:47:03 dhcp37-180.lab.eng.blr.redhat.com systemd[1]: nfs-ganesha.service failed. message in /var/log/messages: [ 2504.300515] ganesha.nfsd[23886]: segfault at 3759 ip 00007fdc715b5c61 sp 00007fdc73677040 error 6 in libglusterfs.so.0.0.1[7fdc7156f000+c6000] Actual results: nfs-ganesha crashes with segfault error while doing refresh config on volume. Expected results: there should not be any crash observed. Additional info: Issue can be easily reproduced by running case ID 342834 under rootsquash in distaf.
I could reproduce the issue - Breakpoint 1, inode_table_destroy (inode_table=0x7f523802db80) at inode.c:1729 1729 inode_t *tmp = NULL, *trav = NULL; (gdb) b __inode_retire Breakpoint 2 at 0x7f525a57bba1: file inode.c, line 430. (gdb) p inode_table->active $1 = {next = 0x7f5240001444, prev = 0x7f5238164ec4} (gdb) p inode_table->active_size $2 = 3 (gdb) p &inode_table->active $3 = (struct list_head *) 0x7f523802dbe0 (gdb) p/x &inode_table->active->next-104 $4 = 0x7f523802d8a0 (gdb) p/x &inode_table->active->next->next-104 $5 = 0x7f5240001104 (gdb) p/x &inode_table->active->next->next->next-104 $6 = 0xb96ed4 (gdb) p/x &inode_table->active->next->next->next->next-104 $7 = 0x7f5238164b84 (gdb) p/x &inode_table->active->next->next->next->next->next-104 $8 = 0x7f523802d8a0 (gdb) p/x &inode_table->active->next->next->next->next $9 = 0x7f5238164ec4 (gdb) p *(inode_t *)$4 $10 = {table = 0x7f5238019000, gfid = "\276\272\376\312", '\000' <repeats 11 times>, lock = 0, nlookup = 0, fd_count = 0, ref = 0, ia_type = IA_INVAL, fd_list = {next = 0x0, prev = 0x7f522002a3a0}, dentry_list = {next = 0x0, prev = 0x100000001}, hash = {next = 0x3802d92400000001, prev = 0x3802d92c00007f52}, list = { next = 0x7f52, prev = 0x0}, _ctx = 0x100000000} (gdb) p &inode_table->active $11 = (struct list_head *) 0x7f523802dbe0 (gdb) p &inode_table->active->next->list-104 There is no member named list. (gdb) p &inode_table->active->next $12 = (struct list_head **) 0x7f523802dbe0 (gdb) p inode_table->active->next $13 = (struct list_head *) 0x7f5240001444 (gdb) p inode_table->active->next-104 $14 = (struct list_head *) 0x7f5240000dc4 (gdb) p *(inode_t *)inode_table->active->next-104 Structure has no component named operator-. (gdb) p *(inode_t *)(inode_table->active->next-104) $15 = {table = 0x0, gfid = '\000' <repeats 15 times>, lock = 0, nlookup = 0, fd_count = 0, ref = 0, ia_type = IA_INVAL, fd_list = {next = 0x0, prev = 0x0}, dentry_list = {next = 0x0, prev = 0x0}, hash = {next = 0x0, prev = 0x0}, list = {next = 0x0, prev = 0x0}, _ctx = 0x0} (gdb) p inode_table $16 = (inode_table_t *) 0x7f523802db80 (gdb) p inode_table->active $17 = {next = 0x7f5240001444, prev = 0x7f5238164ec4} (gdb) p inode_table->active_size $18 = 3 (gdb) p inode_table->active->next $19 = (struct list_head *) 0x7f5240001444 (gdb) p inode_table->active->next->next $20 = (struct list_head *) 0xb97214 (gdb) p inode_table->active->next->next->next $21 = (struct list_head *) 0x7f5238164ec4 (gdb) p/x *(inode_t *) $19-104 Structure has no component named operator-. (gdb) p/x *(inode_t *)($19-104) $22 = {table = 0x0, gfid = {0x0 <repeats 16 times>}, lock = 0x0, nlookup = 0x0, fd_count = 0x0, ref = 0x0, ia_type = 0x0, fd_list = {next = 0x0, prev = 0x0}, dentry_list = {next = 0x0, prev = 0x0}, hash = {next = 0x0, prev = 0x0}, list = {next = 0x0, prev = 0x0}, _ctx = 0x0} (gdb) b __inode_retire Note: breakpoint 2 also set at pc 0x7f525a57bba1. Breakpoint 3 at 0x7f525a57bba1: file inode.c, line 430. (gdb) p &$17 $23 = (struct list_head *) 0x7f523802dbe0 (gdb) p inode_table->active->next->next->next->next $24 = (struct list_head *) 0x7f523802dbe0 (gdb) p/x *(inode_t *)(0x7f5240001444-104) $25 = {table = 0x7f523802db80, gfid = {0xad, 0x7e, 0xb1, 0x60, 0x3, 0x1b, 0x4f, 0x8e, 0x99, 0xf, 0x38, 0x2e, 0xe4, 0xb8, 0xf2, 0x93}, lock = 0x1, nlookup = 0x0, fd_count = 0x1, ref = 0x1, ia_type = 0x1, fd_list = { next = 0x7f524000177c, prev = 0x7f524000177c}, dentry_list = { next = 0x7f52400019ac, prev = 0x7f52400019ac}, hash = {next = 0x7f5238120700, prev = 0x7f5238120700}, list = {next = 0xb97214, prev = 0x7f523802dbe0}, _ctx = 0x7f52400014b0} (gdb) p/x *(inode_t *)(0xb97214-104) $26 = {table = 0x7f523802db80, gfid = {0x6d, 0xa, 0x7a, 0x59, 0x76, 0xf4, 0x40, 0x6f, 0xaf, 0x35, 0x3b, 0x8e, 0xaf, 0x78, 0xec, 0xd6}, lock = 0x1, nlookup = 0x0, fd_count = 0x0, ref = 0x1, ia_type = 0x2, fd_list = { next = 0xb971e4, prev = 0xb971e4}, dentry_list = {next = 0xb96f5c, prev = 0xb96f5c}, hash = {next = 0x7f523811ab30, prev = 0x7f523811ab30}, list = {next = 0x7f5238164ec4, prev = 0x7f5240001444}, _ctx = 0xbc9b40} (gdb) p/x *(inode_t *)(0x7f5238164ec4-104) $27 = {table = 0x7f523802db80, gfid = {0x0 <repeats 15 times>, 0x1}, lock = 0x1, nlookup = 0x0, fd_count = 0x0, ref = 0x1, ia_type = 0x2, fd_list = { next = 0x7f5238164e94, prev = 0x7f5238164e94}, dentry_list = { next = 0x7f5238164ea4, prev = 0x7f5238164ea4}, hash = {next = 0x7f523802dde0, prev = 0x7f523802dde0}, list = {next = 0x7f523802dbe0, prev = 0xb97214}, _ctx = 0x7f5238164f30} (gdb) p/x (0x7f5238164ec4-104) $28 = 0x7f5238164e5c (gdb) p/x (0x7f5238164ec4-104) $29 = 0x7f5238164e5c (gdb) p/x (0xb97214-104) $30 = 0xb971ac (gdb) p/x (0x7f5240001444-104) $31 = 0x7f52400013dc (gdb) n 1731 if (inode_table == NULL) (gdb) 1756 pthread_mutex_lock (&inode_table->lock); (gdb) 1768 while (!list_empty (&inode_table->lru)) { (gdb) 1776 list_for_each_entry_safe (trav, tmp, &inode_table->active, (gdb) 1782 if (trav != inode_table->root) (gdb) p trav $32 = (inode_t *) 0x7f52400013dc (gdb) p tmp $33 = (inode_t *) 0xb971ac (gdb) p *trav $34 = {table = 0x7f523802db80, gfid = "\255~\261`\003\033O\216\231\017\070.\344\270", <incomplete sequence \362\223>, lock = 1, nlookup = 0, fd_count = 1, ref = 1, ia_type = IA_IFREG, fd_list = {next = 0x7f524000177c, prev = 0x7f524000177c}, dentry_list = { next = 0x7f52400019ac, prev = 0x7f52400019ac}, hash = {next = 0x7f5238120700, prev = 0x7f5238120700}, list = {next = 0xb97214, prev = 0x7f523802dbe0}, _ctx = 0x7f52400014b0} (gdb) p *tmp $35 = {table = 0x7f523802db80, gfid = "m\nzYv\364@o\257\065;\216\257x\354", <incomplete sequence \326>, lock = 1, nlookup = 0, fd_count = 0, ref = 1, ia_type = IA_IFDIR, fd_list = { next = 0xb971e4, prev = 0xb971e4}, dentry_list = {next = 0xb96f5c, prev = 0xb96f5c}, hash = {next = 0x7f523811ab30, prev = 0x7f523811ab30}, list = {next = 0x7f5238164ec4, prev = 0x7f5240001444}, _ctx = 0xbc9b40} (gdb) c Continuing. [Thread 0x7f52345fb700 (LWP 5032) exited] Breakpoint 2, __inode_retire (inode=0x7f52400013dc) at inode.c:430 430 dentry_t *dentry = NULL; (gdb) c Continuing. Breakpoint 2, __inode_retire (inode=0xb971ac) at inode.c:430 430 dentry_t *dentry = NULL; (gdb) bt #0 __inode_retire (inode=0xb971ac) at inode.c:430 #1 0x00007f525a57bd55 in __inode_unref (inode=0xb971ac) at inode.c:473 #2 0x00007f525a57b001 in __dentry_unset (dentry=0x7f52400019ac) at inode.c:141 #3 0x00007f525a57bc78 in __inode_retire (inode=0x7f52400013dc) at inode.c:445 #4 0x00007f525a57c367 in __inode_ref_reduce_by_n (inode=0x7f52400013dc, nref=0) at inode.c:686 #5 0x00007f525a57e7d2 in inode_table_destroy (inode_table=0x7f523802db80) at inode.c:1789 #6 0x00007f525a57e63c in inode_table_destroy_all (ctx=0x7f5220003c00) at inode.c:1720 #7 0x00007f525e4e2e42 in pub_glfs_fini (fs=0x7f5220003a80) at glfs.c:1158 #8 0x00007f525e6c6281 in export_release (exp_hdl=0x7f5220003960) at /home/guest/Documents/workspace/nfs-ganesha/src/FSAL/FSAL_GLUSTER/export.c:86 #9 0x000000000050c317 in free_export_resources (export=0x7f5220000ec8) at /home/guest/Documents/workspace/nfs-ganesha/src/support/exports.c:1497 #10 0x000000000051d5a8 in free_export (export=0x7f5220000ec8) at /home/guest/Documents/workspace/nfs-ganesha/src/support/export_mgr.c:250 #11 0x000000000051e93d in put_gsh_export (export=0x7f5220000ec8) at /home/guest/Documents/workspace/nfs-ganesha/src/support/export_mgr.c:631 ---Type <return> to continue, or q <return> to quit---q Quit (gdb) l 425 426 427 static void 428 __inode_retire (inode_t *inode) 429 { 430 dentry_t *dentry = NULL; 431 dentry_t *t = NULL; 432 433 if (!inode) { 434 gf_msg_callingfn (THIS->name, GF_LOG_WARNING, 0, (gdb) 435 LG_MSG_INODE_NOT_FOUND, "inode not found"); 436 return; 437 } 438 439 list_move_tail (&inode->list, &inode->table->purge); 440 inode->table->purge_size++; 441 442 __inode_unhash (inode); 443 444 list_for_each_entry_safe (dentry, t, &inode->dentry_list, inode_list) { (gdb) As can be seen above, the 'tmp' inode entry is being moved from active list to purge list by next iteration. Since the inode entries can get moved from one list to another between iterations, its best to not fetch them early. So the fix can be to use 'list_each_entry' and is safe as there will be no other thread accessing these inodes.
Fix posted upstream - http://review.gluster.org/13987
Even while executing gluster operations automation suite, hit below segfault issue and ganesha crashes on mounted node. below bt is observed: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f6de71fb700 (LWP 20592)] __inode_ref_reduce_by_n (inode=inode@entry=0x7f6d71e9b5a8, nref=nref@entry=0) at inode.c:686 686 inode->table->active_size--; (gdb) bt #0 __inode_ref_reduce_by_n (inode=inode@entry=0x7f6d71e9b5a8, nref=nref@entry=0) at inode.c:686 #1 0x00007f6de4865a2a in inode_table_destroy (inode_table=0x7f6d71e9b580) at inode.c:1794 #2 0x00007f6de4865b21 in inode_table_destroy_all (ctx=ctx@entry=0x7f6de0007f90) at inode.c:1725 #3 0x00007f6de4af107f in pub_glfs_fini (fs=0x7f6de0007e30) at glfs.c:1133 #4 0x00007f6de4f18051 in export_release () from /usr/lib64/ganesha/libfsalgluster.so #5 0x00007f6dfa327acb in free_export_resources () #6 0x00007f6dfa3369e3 in free_export () #7 0x00007f6dfa3386a4 in gsh_export_removeexport () #8 0x00007f6dfa3455e9 in dbus_message_entrypoint () #9 0x00007f6df9be7c86 in _dbus_object_tree_dispatch_and_unlock () from /lib64/libdbus-1.so.3 #10 0x00007f6df9bd9e49 in dbus_connection_dispatch () from /lib64/libdbus-1.so.3 #11 0x00007f6df9bda0e2 in _dbus_connection_read_write_dispatch () from /lib64/libdbus-1.so.3 #12 0x00007f6dfa346640 in gsh_dbus_thread () #13 0x00007f6df8803dc5 in start_thread () from /lib64/libpthread.so.0 #14 0x00007f6df7ed21cd in clone () from /lib64/libc.so.6
Verified this bug with latest glusterfs-3.7.9-3 build and after performing rootsquash regression automated cases, did not observe this issue. Automation test case with ID 342834, which originally was reproducing this issue, works fine now and no refresh config failure or crash seen in this case. verified on both v3 and v4 ganesha mounts. Based on the above observation, moving this bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240