Bug 1255471

Summary:	[libgfapi] crash when NFS Ganesha Volume is 100% full
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Harold Miller <hamiller>
Component:	glusterfs	Assignee:	Bipin Kunal <bkunal>
Status:	CLOSED ERRATA	QA Contact:	Saurabh <saujain>
Severity:	high	Docs Contact:
Priority:	high
Version:	rhgs-3.1	CC:	asrivast, bkunal, byarlaga, divya, mzywusko, ndevos, nlevinki, rcyriac, saujain, skoduri, sreber, vagarwal, vbellur
Target Milestone:	---	Keywords:	Patch, ZStream
Target Release:	RHGS 3.1.1
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.7.1-13	Doc Type:	Bug Fix
Doc Text:	Previously, on certain occasions, the libgfapi returned incorrect errors. NFS-Ganesha would handle the incorrect error in such a way that the procedures were retried. However, the used file descriptor should have been marked as bad, and no longer used. As a consequence, using a bad file descriptor caused access to memory that was freed and made NFS-Ganesha segfault. With this fix, libgfapi returns correct errors and marks the file descriptor as bad if the file descriptor should not be used again. Now, NFS-Ganesha does not try to reuse bad file descriptors and prevents segmentation faults.	Story Points:	---
Clone Of:
Clones:	1262798 (view as bug list)		Environment:
Last Closed:	2015-10-05 07:24:21 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1218535, 1240920, 1263094
Bug Blocks:	1251815, 1262798

Description Harold Miller 2015-08-20 16:32:15 UTC

Description of problem: Writing more data than NFS volume will hold crashes all nodes in a Gluster Voume


Version-Release number of selected component (if applicable):


How reproducible: consistent


Steps to Reproduce:

1. setup two VMs with Redhat Gluster Storage
2. Configure NFS Ganesha
3. setup test volume
4. fill the volume with a file, which is too big for the volume:
The test volume was 2GB and i wrote a 2.5 GB file to the mounted directory using dd


Actual results:
5. NFS Ganesha crashes on the node, of which the volume was initially mounted:

19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 22(CACHE_INODE_IO_ERROR) for entry 0x2ab57a0
19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_rdwr_plus :INODE :CRIT :Error closing file in cache_inode_rdwr: 22.
19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_rdwr_plus :INODE :CRIT :Error closing file in cache_inode_rdwr: 22.

6. Even worse, after some seconds, the second NFS Ganesha Process on the 2nd node crashes also:

19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 22(CACHE_INODE_IO_ERROR) for entry 0x2ab57a0
19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_rdwr_plus :INODE :CRIT :Error closing file in cache_inode_rdwr: 22.

Expected results: Error message, but no crash


Additional info: From customer - "Conclusion: By writing a file to a too small volume it's possible to crash the ENTIRE HA cluster. Quite bad :-("

Comment 3 Harold Miller 2015-08-21 13:56:27 UTC

Gluster version is glusterfs-3.7.1-11.el6rhs.x86_64

Comment 7 Saurabh 2015-09-15 09:46:32 UTC

In order to test the fix I tried to test it with the following steps,
1. create a volume of 6x2 type.
2. enable quota on the volume
3. set a quota limit of 2 GB
4. configure nfs-ganesha
5. mount the volume using vers=3
6. use dd to create a file of 3GB.
Result:-  nfs-ganesha coredumps only for one node, with the bt as
(gdb) bt
#0  0x00007f74c81c6b22 in pub_glfs_pwritev (glfd=0x7f74a832b930, iovec=iovec@entry=0x7f74c97f87f0, iovcnt=iovcnt@entry=1, offset=2352373760, flags=0) at glfs-fops.c:936
#1  0x00007f74c81c6e7a in pub_glfs_pwrite (glfd=<optimized out>, buf=<optimized out>, count=<optimized out>, offset=<optimized out>, flags=<optimized out>) at glfs-fops.c:1051
#2  0x00007f74c85ebbe0 in file_write () from /usr/lib64/ganesha/libfsalgluster.so
#3  0x00000000004d458e in cache_inode_rdwr_plus ()
#4  0x00000000004d53a9 in cache_inode_rdwr ()
#5  0x000000000045db41 in nfs3_write ()
#6  0x0000000000453a01 in nfs_rpc_execute ()
#7  0x00000000004545ad in worker_run ()
#8  0x000000000050afeb in fridgethr_start_routine ()
#9  0x00007f74d94f4df5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f74d901a1ad in clone () from /lib64/libc.so.6
(gdb) f 0
#0  0x00007f74c81c6b22 in pub_glfs_pwritev (glfd=0x7f74a832b930, iovec=iovec@entry=0x7f74c97f87f0, iovcnt=iovcnt@entry=1, offset=2352373760, flags=0) at glfs-fops.c:936
936		__GLFS_ENTRY_VALIDATE_FD (glfd, invalid_fs);
(gdb) p * glfd
$1 = {openfds = {next = 0x0, prev = 0x7f74a000ce90}, fs = 0x7f74a8324f20, offset = 140139014803232, fd = 0x7f74a8324f20, entries = {next = 0x78, prev = 0x78}, next = 0x7800000001, 
  readdirbuf = 0x10200000002000}
(gdb) p * glfd->fd
$2 = {pid = 0, flags = -1473070784, refcount = 32628, inode_list = {next = 0x1, prev = 0x7f74a832baa0}, inode = 0x7800000078, lock = 8192, _ctx = 0x0, xl_count = 0, lk_ctx = 0x0, anonymous = _gf_false}
(gdb) p * glfd->fd->inode
Cannot access memory at address 0x7800000078



# gluster volume info vol1
 
Volume Name: vol1
Type: Distributed-Replicate
Volume ID: 3176319c-c033-4d81-a1c2-e46d92a94e9c
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.44.108:/rhs/brick1/d1r11
Brick2: 10.70.44.109:/rhs/brick1/d1r21
Brick3: 10.70.44.110:/rhs/brick1/d2r11
Brick4: 10.70.44.111:/rhs/brick1/d2r21
Brick5: 10.70.44.108:/rhs/brick1/d3r11
Brick6: 10.70.44.109:/rhs/brick1/d3r21
Brick7: 10.70.44.110:/rhs/brick1/d4r11
Brick8: 10.70.44.111:/rhs/brick1/d4r21
Brick9: 10.70.44.108:/rhs/brick1/d5r11
Brick10: 10.70.44.109:/rhs/brick1/d5r21
Brick11: 10.70.44.110:/rhs/brick1/d6r11
Brick12: 10.70.44.111:/rhs/brick1/d6r21
Options Reconfigured:
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
ganesha.enable: on
features.cache-invalidation: on
nfs.disable: on
performance.readdir-ahead: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable

Bipin, can you confirm if it was same bt that you was there before fix as well?

Comment 8 Niels de Vos 2015-09-15 10:17:36 UTC

I think Jiffin is looking into this kind of segfault (quota related?) in bug 1263084.

Comment 9 Bipin Kunal 2015-09-15 12:00:38 UTC

Saurabh,

I am not aware of any backtraces during the crash. 

Customer has not tested with the Quota enabled. He has tested with the small volume size and then creating a file bigger than the volume.

Please have a look at the steps in BZ description.

Thanks,
Bipin Kunal

Comment 10 Saurabh 2015-09-16 06:55:52 UTC

Found the issue on 3.1.1 build as well,
#0  0x00007f5763a83b22 in pub_glfs_pwritev (glfd=0x7f5708011f20, iovec=iovec@entry=0x7f57277fc7f0, iovcnt=iovcnt@entry=1, offset=2103709696, flags=0) at glfs-fops.c:936
936		__GLFS_ENTRY_VALIDATE_FD (glfd, invalid_fs);

(gdb) bt
#0  0x00007f5763a83b22 in pub_glfs_pwritev (glfd=0x7f5708011f20, iovec=iovec@entry=0x7f57277fc7f0, iovcnt=iovcnt@entry=1, offset=2103709696, flags=0) at glfs-fops.c:936
#1  0x00007f5763a83e7a in pub_glfs_pwrite (glfd=<optimized out>, buf=<optimized out>, count=<optimized out>, offset=<optimized out>, flags=<optimized out>) at glfs-fops.c:1051
#2  0x00007f5763ea8be0 in file_write () from /usr/lib64/ganesha/libfsalgluster.so
#3  0x00000000004d458e in cache_inode_rdwr_plus ()
#4  0x00000000004d53a9 in cache_inode_rdwr ()
#5  0x000000000045db41 in nfs3_write ()
#6  0x0000000000453a01 in nfs_rpc_execute ()
#7  0x00000000004545ad in worker_run ()
#8  0x000000000050afeb in fridgethr_start_routine ()
#9  0x00007f5765bb9df5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f57656df1ad in clone () from /lib64/libc.so.6

Comment 11 Soumya Koduri 2015-09-18 09:19:07 UTC

Fix for the latest crash available in nfs-ganesha-2.2.0-9

Comment 12 Saurabh 2015-09-22 10:03:27 UTC

From client,
# cd /mnt
[root@rhsauto019 mnt]# time dd if=/dev/urandom of=f.1 bs=1024 count=4194304
dd: error writing ‘f.1’: Input/output error
2299955+0 records in
2299954+0 records out
2355152896 bytes (2.4 GB) copied, 451.26 s, 5.2 MB/s


from server,
# sleep 10; df -hk | grep rhs
/dev/mapper/vg_vdb-thinp1            2086400 2086380        20 100% /rhs/brick1

# time bash /usr/libexec/ganesha/ganesha-ha.sh --status
Online: [ nfs11 nfs12 nfs13 nfs15 ]

nfs11-cluster_ip-1 nfs11
nfs11-trigger_ip-1 nfs11
nfs12-cluster_ip-1 nfs12
nfs12-trigger_ip-1 nfs12
nfs13-cluster_ip-1 nfs13
nfs13-trigger_ip-1 nfs13
nfs15-cluster_ip-1 nfs15
nfs15-trigger_ip-1 nfs15

real	0m3.086s
user	0m0.678s
sys	0m0.235s

Comment 14 Divya 2015-09-29 06:20:46 UTC

Niels,

Please review and sign-off the edited doc text.

Comment 16 errata-xmlrpc 2015-10-05 07:24:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html