Description of problem: Writing more data than NFS volume will hold crashes all nodes in a Gluster Voume Version-Release number of selected component (if applicable): How reproducible: consistent Steps to Reproduce: 1. setup two VMs with Redhat Gluster Storage 2. Configure NFS Ganesha 3. setup test volume 4. fill the volume with a file, which is too big for the volume: The test volume was 2GB and i wrote a 2.5 GB file to the mounted directory using dd Actual results: 5. NFS Ganesha crashes on the node, of which the volume was initially mounted: 19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 22(CACHE_INODE_IO_ERROR) for entry 0x2ab57a0 19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_rdwr_plus :INODE :CRIT :Error closing file in cache_inode_rdwr: 22. 19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_rdwr_plus :INODE :CRIT :Error closing file in cache_inode_rdwr: 22. 6. Even worse, after some seconds, the second NFS Ganesha Process on the 2nd node crashes also: 19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 22(CACHE_INODE_IO_ERROR) for entry 0x2ab57a0 19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_rdwr_plus :INODE :CRIT :Error closing file in cache_inode_rdwr: 22. Expected results: Error message, but no crash Additional info: From customer - "Conclusion: By writing a file to a too small volume it's possible to crash the ENTIRE HA cluster. Quite bad :-("
Gluster version is glusterfs-3.7.1-11.el6rhs.x86_64
In order to test the fix I tried to test it with the following steps, 1. create a volume of 6x2 type. 2. enable quota on the volume 3. set a quota limit of 2 GB 4. configure nfs-ganesha 5. mount the volume using vers=3 6. use dd to create a file of 3GB. Result:- nfs-ganesha coredumps only for one node, with the bt as (gdb) bt #0 0x00007f74c81c6b22 in pub_glfs_pwritev (glfd=0x7f74a832b930, iovec=iovec@entry=0x7f74c97f87f0, iovcnt=iovcnt@entry=1, offset=2352373760, flags=0) at glfs-fops.c:936 #1 0x00007f74c81c6e7a in pub_glfs_pwrite (glfd=<optimized out>, buf=<optimized out>, count=<optimized out>, offset=<optimized out>, flags=<optimized out>) at glfs-fops.c:1051 #2 0x00007f74c85ebbe0 in file_write () from /usr/lib64/ganesha/libfsalgluster.so #3 0x00000000004d458e in cache_inode_rdwr_plus () #4 0x00000000004d53a9 in cache_inode_rdwr () #5 0x000000000045db41 in nfs3_write () #6 0x0000000000453a01 in nfs_rpc_execute () #7 0x00000000004545ad in worker_run () #8 0x000000000050afeb in fridgethr_start_routine () #9 0x00007f74d94f4df5 in start_thread () from /lib64/libpthread.so.0 #10 0x00007f74d901a1ad in clone () from /lib64/libc.so.6 (gdb) f 0 #0 0x00007f74c81c6b22 in pub_glfs_pwritev (glfd=0x7f74a832b930, iovec=iovec@entry=0x7f74c97f87f0, iovcnt=iovcnt@entry=1, offset=2352373760, flags=0) at glfs-fops.c:936 936 __GLFS_ENTRY_VALIDATE_FD (glfd, invalid_fs); (gdb) p * glfd $1 = {openfds = {next = 0x0, prev = 0x7f74a000ce90}, fs = 0x7f74a8324f20, offset = 140139014803232, fd = 0x7f74a8324f20, entries = {next = 0x78, prev = 0x78}, next = 0x7800000001, readdirbuf = 0x10200000002000} (gdb) p * glfd->fd $2 = {pid = 0, flags = -1473070784, refcount = 32628, inode_list = {next = 0x1, prev = 0x7f74a832baa0}, inode = 0x7800000078, lock = 8192, _ctx = 0x0, xl_count = 0, lk_ctx = 0x0, anonymous = _gf_false} (gdb) p * glfd->fd->inode Cannot access memory at address 0x7800000078 # gluster volume info vol1 Volume Name: vol1 Type: Distributed-Replicate Volume ID: 3176319c-c033-4d81-a1c2-e46d92a94e9c Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: 10.70.44.108:/rhs/brick1/d1r11 Brick2: 10.70.44.109:/rhs/brick1/d1r21 Brick3: 10.70.44.110:/rhs/brick1/d2r11 Brick4: 10.70.44.111:/rhs/brick1/d2r21 Brick5: 10.70.44.108:/rhs/brick1/d3r11 Brick6: 10.70.44.109:/rhs/brick1/d3r21 Brick7: 10.70.44.110:/rhs/brick1/d4r11 Brick8: 10.70.44.111:/rhs/brick1/d4r21 Brick9: 10.70.44.108:/rhs/brick1/d5r11 Brick10: 10.70.44.109:/rhs/brick1/d5r21 Brick11: 10.70.44.110:/rhs/brick1/d6r11 Brick12: 10.70.44.111:/rhs/brick1/d6r21 Options Reconfigured: features.quota-deem-statfs: on features.inode-quota: on features.quota: on ganesha.enable: on features.cache-invalidation: on nfs.disable: on performance.readdir-ahead: on nfs-ganesha: enable cluster.enable-shared-storage: enable Bipin, can you confirm if it was same bt that you was there before fix as well?
I think Jiffin is looking into this kind of segfault (quota related?) in bug 1263084.
Saurabh, I am not aware of any backtraces during the crash. Customer has not tested with the Quota enabled. He has tested with the small volume size and then creating a file bigger than the volume. Please have a look at the steps in BZ description. Thanks, Bipin Kunal
Found the issue on 3.1.1 build as well, #0 0x00007f5763a83b22 in pub_glfs_pwritev (glfd=0x7f5708011f20, iovec=iovec@entry=0x7f57277fc7f0, iovcnt=iovcnt@entry=1, offset=2103709696, flags=0) at glfs-fops.c:936 936 __GLFS_ENTRY_VALIDATE_FD (glfd, invalid_fs); (gdb) bt #0 0x00007f5763a83b22 in pub_glfs_pwritev (glfd=0x7f5708011f20, iovec=iovec@entry=0x7f57277fc7f0, iovcnt=iovcnt@entry=1, offset=2103709696, flags=0) at glfs-fops.c:936 #1 0x00007f5763a83e7a in pub_glfs_pwrite (glfd=<optimized out>, buf=<optimized out>, count=<optimized out>, offset=<optimized out>, flags=<optimized out>) at glfs-fops.c:1051 #2 0x00007f5763ea8be0 in file_write () from /usr/lib64/ganesha/libfsalgluster.so #3 0x00000000004d458e in cache_inode_rdwr_plus () #4 0x00000000004d53a9 in cache_inode_rdwr () #5 0x000000000045db41 in nfs3_write () #6 0x0000000000453a01 in nfs_rpc_execute () #7 0x00000000004545ad in worker_run () #8 0x000000000050afeb in fridgethr_start_routine () #9 0x00007f5765bb9df5 in start_thread () from /lib64/libpthread.so.0 #10 0x00007f57656df1ad in clone () from /lib64/libc.so.6
Fix for the latest crash available in nfs-ganesha-2.2.0-9
From client, # cd /mnt [root@rhsauto019 mnt]# time dd if=/dev/urandom of=f.1 bs=1024 count=4194304 dd: error writing ‘f.1’: Input/output error 2299955+0 records in 2299954+0 records out 2355152896 bytes (2.4 GB) copied, 451.26 s, 5.2 MB/s from server, # sleep 10; df -hk | grep rhs /dev/mapper/vg_vdb-thinp1 2086400 2086380 20 100% /rhs/brick1 # time bash /usr/libexec/ganesha/ganesha-ha.sh --status Online: [ nfs11 nfs12 nfs13 nfs15 ] nfs11-cluster_ip-1 nfs11 nfs11-trigger_ip-1 nfs11 nfs12-cluster_ip-1 nfs12 nfs12-trigger_ip-1 nfs12 nfs13-cluster_ip-1 nfs13 nfs13-trigger_ip-1 nfs13 nfs15-cluster_ip-1 nfs15 nfs15-trigger_ip-1 nfs15 real 0m3.086s user 0m0.678s sys 0m0.235s
Niels, Please review and sign-off the edited doc text.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1845.html