Hide Forgot
Description of problem: Failed refresh config generates lot of "inode.c:1792:inode_table_destroy" messages in ganesha-gfapi.log Version-Release number of selected component (if applicable): glusterfs-3.7.9-1 nfs-ganesha-gluster-2.3.1-2.el7rhgs.x86_64 nfs-ganesha-2.3.1-2.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1.Create a 4 node cluster and configure ganesha on it. 2.Create a volume and enable ganesha on it. 3.Perform a refresh config scenario where it fails on one of the node. 4.Observe that continuous below messages gets generated in ganesha-gfapi.log [2016-04-11 22:22:41.432307] W [inode.c:1792:inode_table_destroy] (-->/lib64/libgfapi.so.0(glfs_fini+0x3cf) [0x7f15d962907f] -->/lib64/libglusterfs.so.0(inode_table_destroy_all+0x51) [0x7f15d939db21] -->/lib64/libglusterfs.so.0(inode_table_destroy+0xf6) [0x7f15d939da16] ) 0-gfapi: Active inode(0x7f15bc007bb4) with refcount(0) found during cleanup Actual results: Expected results: There should not be such large number of messages getting logged in ganesha-gfapi.log Additional info:
Shashank, I suspect the issue of this BZ is same as 1325975. To confirm that could you list the exact steps performed to reproduce this issue (may distaf logs) as you had listed in bug1325975.
Please follow the steps below: create a volume gluster volume create newvolume replica 2 transport tcp dhcp37-180.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick0 dhcp37-158.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick1 dhcp37-127.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick2 dhcp37-174.lab.eng.blr.redhat.com:/bricks/brick0/newvolume_brick3 dhcp37-180.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick4 dhcp37-158.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick5 dhcp37-127.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick6 dhcp37-174.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick7 dhcp37-180.lab.eng.blr.redhat.com:/bricks/brick3/newvolume_brick8 dhcp37-158.lab.eng.blr.redhat.com:/bricks/brick3/newvolume_brick9 dhcp37-127.lab.eng.blr.redhat.com:/bricks/brick3/newvolume_brick10 dhcp37-174.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick11 --mode=script force" on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is volume create: newvolume: success: please start the volume to access data Start the volume 2016-04-12 12:38:45,478 INFO run "gluster volume start newvolume " on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is volume start: newvolume: success Enable ganesha on the volume 2016-04-12 12:38:46,653 INFO run "gluster volume set newvolume ganesha.enable on --mode=script" on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is volume set: success Do a showmount 2016-04-12 12:38:57,561 INFO run "showmount -e localhost" on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is Export list for localhost: /newvolume (everyone) Mount the volume on one client and write a file on the mount point 2016-04-12 12:38:57,598 INFO run Executing mount -t nfs -o vers=4 10.70.36.217:/newvolume /mnt/nfs1460444907.67 on dhcp37-206.lab.eng.blr.redhat.com 2016-04-12 12:38:57,731 INFO run "mount -t nfs -o vers=4 10.70.36.217:/newvolume /mnt/nfs1460444907.67" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-12 12:38:57,731 INFO run Executing dd if=/dev/zero of=/mnt/nfs1460444907.67/file.dd bs=1024 count=1024 on dhcp37-206.lab.eng.blr.redhat.com 2016-04-12 12:38:57,929 INFO run "dd if=/dev/zero of=/mnt/nfs1460444907.67/file.dd bs=1024 count=1024" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-12 12:38:57,929 ERROR run "dd if=/dev/zero of=/mnt/nfs1460444907.67/file.dd bs=1024 count=1024" on dhcp37-206.lab.eng.blr.redhat.com: STDERR is 1024+0 records in 1024+0 records out 1048576 bytes (1.0 MB) copied, 0.124464 s, 8.4 MB/s Mount the volume on other client and write a file 2016-04-12 12:38:57,949 INFO run Executing mount -t nfs -o vers=4 10.70.36.217:/newvolume /mnt/nfs1460444907.67 on dhcp46-206.lab.eng.blr.redhat.com 2016-04-12 12:38:58,048 INFO run "mount -t nfs -o vers=4 10.70.36.217:/newvolume /mnt/nfs1460444907.67" on dhcp46-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-12 12:38:58,048 INFO run Executing dd if=/dev/zero of=/mnt/nfs1460444907.67/file.dd bs=1024 count=1024 on dhcp46-206.lab.eng.blr.redhat.com 2016-04-12 12:38:58,223 INFO run "dd if=/dev/zero of=/mnt/nfs1460444907.67/file.dd bs=1024 count=1024" on dhcp46-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-12 12:38:58,223 ERROR run "dd if=/dev/zero of=/mnt/nfs1460444907.67/file.dd bs=1024 count=1024" on dhcp46-206.lab.eng.blr.redhat.com: STDERR is 1024+0 records in 1024+0 records out 1048576 bytes (1.0 MB) copied, 0.110896 s, 9.5 MB/s Create a dir from first client, change the squash parameter to Root_squash and do a refersh config from the mounted node 2016-04-12 12:38:58,223 INFO run Executing mkdir /mnt/nfs1460444907.67/dir on dhcp37-206.lab.eng.blr.redhat.com 2016-04-12 12:38:58,260 INFO run "mkdir /mnt/nfs1460444907.67/dir" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-12 12:38:58,260 INFO run Executing chmod 777 /mnt/nfs1460444907.67/dir on dhcp37-206.lab.eng.blr.redhat.com 2016-04-12 12:38:58,281 INFO run "chmod 777 /mnt/nfs1460444907.67/dir" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-12 12:38:58,281 INFO run Executing sed -i s/'Squash=.*'/'Squash="Root_squash";'/g /etc/ganesha/exports/export.newvolume.conf on dhcp37-180.lab.eng.blr.redhat.com 2016-04-12 12:38:58,299 INFO run "sed -i s/'Squash=.*'/'Squash="Root_squash";'/g /etc/ganesha/exports/export.newvolume.conf" on dhcp37-180.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-12 12:38:58,299 INFO set_root_squash edited the export file newvolume successfully for rootsquash on 2016-04-12 12:38:58,299 INFO run Executing /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume on dhcp37-180.lab.eng.blr.redhat.com 2016-04-12 12:39:17,508 INFO run "/usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume" on dhcp37-180.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-12 12:39:17,509 INFO run "/usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume" on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is Refresh-config completed on dhcp37-127. Refresh-config completed on dhcp37-158. Refresh-config completed on dhcp37-174. Success: refresh-config completed. Create another directory inside previously created dir, create a file and perform write and read on that file as below 2016-04-12 12:39:17,509 INFO refresh_config refresh-config script successfully ran for newvolume 2016-04-12 12:39:22,514 INFO run Executing mkdir /mnt/nfs1460444907.67/dir/dir1 on dhcp37-206.lab.eng.blr.redhat.com 2016-04-12 12:39:22,558 INFO run "mkdir /mnt/nfs1460444907.67/dir/dir1" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-12 12:39:22,559 INFO run Executing touch /mnt/nfs1460444907.67/dir/dir1/file.1 on dhcp37-206.lab.eng.blr.redhat.com 2016-04-12 12:39:22,588 INFO run "touch /mnt/nfs1460444907.67/dir/dir1/file.1" on dhcp37-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-12 12:39:22,651 INFO RootSquashEnable342825 posix.stat_result(st_mode=33188, st_ino=-6078174508180817136, st_dev=39L, st_nlink=1, st_uid=4294967294, st_gid=4294967294, st_size=0, st_atime=1460406603, st_mtime=1460406603, st_ctime=1460406603) 2016-04-12 12:39:22,655 INFO RootSquashEnable342825 4294967294 2016-04-12 12:39:22,656 INFO RootSquashEnable342825 4294967294 2016-04-12 12:39:22,658 INFO RootSquashEnable342825 file.1 has got the correct uid/gid post root-squash is put to on 2016-04-12 12:39:22,659 INFO run Executing echo "Hello World" >> /mnt/nfs1460444907.67/dir/dir1/file.1 on dhcp46-206.lab.eng.blr.redhat.com 2016-04-12 12:39:22,692 INFO run "echo "Hello World" >> /mnt/nfs1460444907.67/dir/dir1/file.1" on dhcp46-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-12 12:39:22,693 INFO RootSquashEnable342825 write allowed 2016-04-12 12:39:22,693 INFO run Executing cat /mnt/nfs1460444907.67/dir/dir1/file.1 on dhcp46-206.lab.eng.blr.redhat.com 2016-04-12 12:39:22,706 INFO run "cat /mnt/nfs1460444907.67/dir/dir1/file.1" on dhcp46-206.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-12 12:39:22,706 INFO run "cat /mnt/nfs1460444907.67/dir/dir1/file.1" on dhcp46-206.lab.eng.blr.redhat.com: STDOUT is Hello World Change the squash parameter again to No_root_squash from the mounted node and do a refresh config 2016-04-12 12:39:22,706 INFO run Executing sed -i s/'Squash=.*'/'Squash="No_root_squash";'/g /etc/ganesha/exports/export.newvolume.conf on dhcp37-180.lab.eng.blr.redhat.com 2016-04-12 12:39:22,722 INFO run "sed -i s/'Squash=.*'/'Squash="No_root_squash";'/g /etc/ganesha/exports/export.newvolume.conf" on dhcp37-180.lab.eng.blr.redhat.com: RETCODE is 0 2016-04-12 12:39:22,723 INFO set_root_squash edited the export file newvolume successfully for rootsquash off 2016-04-12 12:39:22,723 INFO run Executing /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume on dhcp37-180.lab.eng.blr.redhat.com 2016-04-12 12:40:06,011 INFO run "/usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume" on dhcp37-180.lab.eng.blr.redhat.com: RETCODE is 1 2016-04-12 12:40:06,011 INFO run "/usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume" on dhcp37-180.lab.eng.blr.redhat.com: STDOUT is Refresh-config completed on dhcp37-127. Refresh-config completed on dhcp37-158. Refresh-config completed on dhcp37-174. Error: refresh-config failed on localhost. Observe that the refresh config fails this time on localhost and below messages are continuously generated in ganesha-gfapi.log 2016-04-11 22:22:41.432307] W [inode.c:1792:inode_table_destroy] (-->/lib64/libgfapi.so.0(glfs_fini+0x3cf) [0x7f15d962907f] -->/lib64/libglusterfs.so.0(inode_table_destroy_all+0x51) [0x7f15d939db21] -->/lib64/libglusterfs.so.0(inode_table_destroy+0xf6) [0x7f15d939da16] ) 0-gfapi: Active inode(0x7f15bc007bb4) with refcount(0) found during cleanup
I could reproduce the issue. As suspected it is similar bug1325975. The dbus thread enters infinite loop while destroying inodes in inode_table_destroy. The patch http://review.gluster.org/13987 (posted upstream) shall fix this issue as well.
Shashank, Could you please give qe_ack.
removing the needinfo as qa ack has been provided.
Fix has been merged downstream -https://code.engineering.redhat.com/gerrit/73330
Verified this bug with latest glusterfs-3.7.9-3 build and below are the scenarios tried: Tried reproducing this issue by following below scenarios: 1) Stopped glusterd on one node and changed acl settings in volume export file and performed refresh config from other node: [root@dhcp37-180 exports]# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume Refresh-config completed on dhcp37-127. Refresh-config completed on dhcp37-158. Error: refresh-config failed on dhcp37-174. It failed on the node where glusterd was stopped and i did not observe any “inode_table_destroy” messages in ganesha-gfapi.log. 2) Stopped ganesha service on one node and changed acl settings in volume export file and performed refresh config from other node: [root@dhcp37-180 exports]# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume Refresh-config completed on dhcp37-127. Refresh-config completed on dhcp37-158. Error: refresh-config failed on dhcp37-174. It failed on the node where ganesha service was stopped and i did not observe any “inode_table_destroy” messages in ganesha-gfapi.log. 3) Stopped ganesha service on one node and changed acl settings in volume export file and performed refresh config from that node: [root@dhcp37-180 exports]# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume Refresh-config completed on dhcp37-127. Refresh-config completed on dhcp37-158. Refresh-config completed on dhcp37-174. Error: refresh-config failed on localhost. It failed on the localhost where ganesha service was stopped and i did not observe any “inode_table_destroy” messages in ganesha-gfapi.log. 4) Stopped glusterd on one node and changed acl settings in volume export file and performed refresh config from that node: [root@dhcp37-180 exports]# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config /etc/ganesha/ newvolume Refresh-config completed on dhcp37-127. Refresh-config completed on dhcp37-158. Refresh-config completed on dhcp37-174. Error: refresh-config failed on localhost. It failed on the localhost where glusterd service was stopped and i did not observe any “inode_table_destroy” messages in ganesha-gfapi.log. 5) Also performed the automation case which originally was reproducing this issue and i didn't hit it. Since in any failed refresh config case, i didn't observe "inode_table_destroy" messages. Hence marking this bug as Verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2016:1247