Description of problem: Writing more data than NFS volume will hold crashes all nodes in a Gluster Voume Version-Release number of selected component (if applicable): How reproducible: consistent Steps to Reproduce: 1. setup two VMs with Redhat Gluster Storage 2. Configure NFS Ganesha 3. setup test volume 4. fill the volume with a file, which is too big for the volume: The test volume was 2GB and i wrote a 2.5 GB file to the mounted directory using dd Actual results: 5. NFS Ganesha crashes on the node, of which the volume was initially mounted: 19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 22(CACHE_INODE_IO_ERROR) for entry 0x2ab57a0 19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_rdwr_plus :INODE :CRIT :Error closing file in cache_inode_rdwr: 22. 19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_rdwr_plus :INODE :CRIT :Error closing file in cache_inode_rdwr: 22. 6. Even worse, after some seconds, the second NFS Ganesha Process on the 2nd node crashes also: 19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 22(CACHE_INODE_IO_ERROR) for entry 0x2ab57a0 19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_rdwr_plus :INODE :CRIT :Error closing file in cache_inode_rdwr: 22. Expected results: Error message, but no crash Additional info: From customer - "Conclusion: By writing a file to a too small volume it's possible to crash the ENTIRE HA cluster. Quite bad :-("
Verified this bug with the latest build; the observations and steps followed are as below: 1. Create a cluster of 4 nodes. 2. Configure nfs-ganesha on the cluster. 3. Create a volume with size 4GB and enable ganesha on the volume. 4. Mount the volume on the client [root@dhcp37-206 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhel_dhcp37--206-root 27G 2.3G 25G 9% / devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 3.9G 8.4M 3.9G 1% /run tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/vda1 497M 124M 373M 25% /boot tmpfs 783M 0 783M 0% /run/user/0 rhsqe-repo.lab.eng.blr.redhat.com:/opt 1.9T 353G 1.4T 20% /opt 10.70.36.217:/test 4.0G 130M 3.9G 4% /mnt 5. Create a file on the mount point of size 5GB. After some time observe the status of nfs-ganesha services on all the nodes. On the mounted node observed below messages in ganesha service status and for the other nodes no such messages are seen and no crash observed on any node: [root@dhcp37-180 brick0]# service nfs-ganesha status -l Redirecting to /bin/systemctl status -l nfs-ganesha.service ● nfs-ganesha.service - NFS-Ganesha file server Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled; vendor preset: disabled) Active: active (running) since Wed 2016-04-06 07:20:16 IST; 7min ago Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki Process: 13197 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS) Process: 13706 ExecStartPost=/bin/bash -c prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE (code=exited, status=0/SUCCESS) Process: 13704 ExecStart=/bin/bash -c ${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS} ${EPOCH} (code=exited, status=0/SUCCESS) Main PID: 13705 (ganesha.nfsd) CGroup: /system.slice/nfs-ganesha.service └─13705 /usr/bin/ganesha.nfsd Apr 06 07:21:55 dhcp37-180.lab.eng.blr.redhat.com nfs-ganesha[13705]: [dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume test exported at : '/' Apr 06 07:25:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-13] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 06 07:25:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-8] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 06 07:25:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-9] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 06 07:25:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-12] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 06 07:25:53 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-7] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 06 07:26:51 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat Apr 06 07:26:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat Apr 06 07:26:53 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat Apr 06 07:26:54 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat in /var/log/messages: Apr 6 07:25:52 dhcp37-141 ganesha.nfsd[13705]: [work-13] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 6 07:25:52 dhcp37-141 ganesha.nfsd[13705]: [work-8] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 6 07:25:52 dhcp37-141 ganesha.nfsd[13705]: [work-9] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 6 07:25:52 dhcp37-141 ganesha.nfsd[13705]: [work-12] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 6 07:25:53 dhcp37-141 ganesha.nfsd[13705]: [work-7] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 6 07:26:51 dhcp37-141 ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat Apr 6 07:26:52 dhcp37-141 ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat Apr 6 07:26:53 dhcp37-141 ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat Apr 6 07:26:54 dhcp37-141 ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat in ganesha-agapi.log: The message "W [MSGID: 114031] [client-rpc-fops.c:907:client3_3_writev_cbk] 0-test-client-2: remote operation failed [No space left on device]" repeated 57 times between [2016-04-06 01:55:53.061325] and [2016-04-06 01:55:54.203684] As per discussion with jiffin, these messages are expected. Based on the above observations, marking this bug as Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2016:1288