1262776 – nfs-ganesha: ganesha process coredump with "pub_glfs_fsync (glfd=0x7f1078018e70) at glfs-fops.c"

Bug 1262776 - nfs-ganesha: ganesha process coredump with "pub_glfs_fsync (glfd=0x7f1078018e70) at glfs-fops.c"

Summary: nfs-ganesha: ganesha process coredump with "pub_glfs_fsync (glfd=0x7f1078018e...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	nfs-ganesha
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Jiffin
QA Contact:	Shashank Raj
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1299184
TreeView+	depends on / blocked

Reported:	2015-09-14 10:23 UTC by Saurabh
Modified:	2016-11-08 03:52 UTC (History)
CC List:	10 users (show)
Fixed In Version:	nfs-ganesha-2.3.1-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-06-23 05:35:58 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
nfs11 nfs-ganesha coredump (3.56 MB, application/x-bzip) 2015-09-14 10:50 UTC, Saurabh	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2016:1288	0	normal	SHIPPED_LIVE	nfs-ganesha update for Red Hat Gluster Storage 3.1 update 3	2016-06-23 09:12:51 UTC

Description Saurabh 2015-09-14 10:23:13 UTC

While acls and quota was enabled for the volume. 
The data creation is hung as nfs-ganesha has coredumped.
During this I/O, add-brick and rebalance were also attemtpted successfully.

Version-Release number of selected component (if applicable):
glusterfs-3.7.1-14.el7rhgs.x86_64
nfs-ganesha-2.2.0-7.el7rhgs.x86_64

How reproducible:
Test was executed once only

Steps to Reproduce:
1. create a volume of 6x2 type, start it
2. configure nfs-ganesha
3. enable acls for the volume 
4. enable quota on the volume and set a limit of 25GB on "/"
5. mount the volume, start creating data
6. while data creation is going on, execute add-brick and rebalance

Actual results:
data creation is hung, as nfs-ganesha has coredumped,

#0  0x00007f10b99471ac in pub_glfs_fsync (glfd=0x7f1078018e70) at glfs-fops.c:1170
1170		__GLFS_ENTRY_VALIDATE_FD (glfd, invalid_fs);

(gdb) bt
#0  0x00007f10b99471ac in pub_glfs_fsync (glfd=0x7f1078018e70) at glfs-fops.c:1170
#1  0x00007f10b9d6bc76 in commit () from /usr/lib64/ganesha/libfsalgluster.so
#2  0x00000000004d5cee in cache_inode_commit ()
#3  0x0000000000460dba in nfs4_op_commit ()
#4  0x000000000045eab5 in nfs4_Compound ()
#5  0x0000000000453a01 in nfs_rpc_execute ()
#6  0x00000000004545ad in worker_run ()
#7  0x000000000050afeb in fridgethr_start_routine ()
#8  0x00007f10bba7cdf5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f10bb5a21ad in clone () from /lib64/libc.so.6



Expected results:
NFS-ganesha should crash with the operations mentioned above. 

Note: 
Earlier with same steps I had put up the bz 1262191, filing a new BZ as the bt is different this time.

Comment 2 Saurabh 2015-09-14 10:50:13 UTC

Created attachment 1073179 [details]
nfs11 nfs-ganesha coredump

Comment 3 Jiffin 2015-09-15 12:16:51 UTC

The fix has posted in upstream https://review.gerrithub.io/#/c/246586/

Comment 8 Shashank Raj 2016-04-19 10:08:15 UTC

Verified this bug with glusterfs-3.7.9-1 and ganesha-2.3.1-3 with below steps:

1) Create a 4 node cluster and configure ganesha on it.
2) Create a 6x2 volume and start it.

Volume Name: testvolume
Type: Distributed-Replicate
Volume ID: 814b88fe-30a4-47cc-841b-beaf7b348254
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.37.180:/bricks/brick0/b0
Brick2: 10.70.37.158:/bricks/brick0/b0
Brick3: 10.70.37.127:/bricks/brick0/b0
Brick4: 10.70.37.174:/bricks/brick0/b0
Brick5: 10.70.37.180:/bricks/brick1/b1
Brick6: 10.70.37.158:/bricks/brick1/b1
Brick7: 10.70.37.127:/bricks/brick1/b1
Brick8: 10.70.37.174:/bricks/brick1/b1
Brick9: 10.70.37.180:/bricks/brick2/b2
Brick10: 10.70.37.158:/bricks/brick2/b2
Brick11: 10.70.37.127:/bricks/brick2/b2
Brick12: 10.70.37.174:/bricks/brick2/b2
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
nfs.disable: on
performance.readdir-ahead: on
cluster.enable-shared-storage: enable
nfs-ganesha: enable


3) Enable acls on the volume.

[root@dhcp37-180 exports]# cat export.testvolume.conf 
# WARNING : Using Gluster CLI will overwrite manual
# changes made to this file. To avoid it, edit the
# file and run ganesha-ha.sh --refresh-config.
EXPORT{
      Export_Id= 2 ;
      Path = "/testvolume";
      FSAL {
           name = GLUSTER;
           hostname="localhost";
          volume="testvolume";
           }
      Access_type = RW;
      Disable_ACL = false;
      Squash="No_root_squash";
      Pseudo="/testvolume";
      Protocols = "3", "4" ;
      Transports = "UDP","TCP";
      SecType = "sys";
     }


4) Enable quota on the volume and set limit-usage as 25GB on /

[root@dhcp37-180 exports]# gluster volume quota testvolume list
                  Path                   Hard-limit  Soft-limit      Used  Available  Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/                                         25.0GB     80%(20.0GB)   10.9GB  14.1GB              No                   No


5) mount the volume with version=4 and start creating IO on the mount point.

6) While IO is in progress, perform add-brick and rebalance on the volume.

7) No crash is seen on the nodes and IO is going on during the process and it is not hanged.

Based on the above observation, marking this bug as Verified.

Comment 10 errata-xmlrpc 2016-06-23 05:35:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2016:1288

Note You need to log in before you can comment on or make changes to this bug.