Bug 1255470
| Summary: | crash when NFS Ganesha Volume is 100% full | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Harold Miller <hamiller> |
| Component: | nfs-ganesha | Assignee: | Jiffin <jthottan> |
| Status: | CLOSED ERRATA | QA Contact: | Shashank Raj <sraj> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.1 | CC: | jthottan, nlevinki, rcyriac, rhinduja, sashinde, skoduri |
| Target Milestone: | --- | Keywords: | ZStream |
| Target Release: | RHGS 3.1.3 | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | nfs-ganesha-2.3.1-1 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-06-23 05:35:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1299184 | ||
Verified this bug with the latest build; the observations and steps followed are as below:
1. Create a cluster of 4 nodes.
2. Configure nfs-ganesha on the cluster.
3. Create a volume with size 4GB and enable ganesha on the volume.
4. Mount the volume on the client
[root@dhcp37-206 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel_dhcp37--206-root 27G 2.3G 25G 9% /
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 8.4M 3.9G 1% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/vda1 497M 124M 373M 25% /boot
tmpfs 783M 0 783M 0% /run/user/0
rhsqe-repo.lab.eng.blr.redhat.com:/opt 1.9T 353G 1.4T 20% /opt
10.70.36.217:/test 4.0G 130M 3.9G 4% /mnt
5. Create a file on the mount point of size 5GB.
After some time observe the status of nfs-ganesha services on all the nodes. On the mounted node observed below messages in ganesha service status and for the other nodes no such messages are seen and no crash observed on any node:
[root@dhcp37-180 brick0]# service nfs-ganesha status -l
Redirecting to /bin/systemctl status -l nfs-ganesha.service
● nfs-ganesha.service - NFS-Ganesha file server
Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled; vendor preset: disabled)
Active: active (running) since Wed 2016-04-06 07:20:16 IST; 7min ago
Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki
Process: 13197 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS)
Process: 13706 ExecStartPost=/bin/bash -c prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE (code=exited, status=0/SUCCESS)
Process: 13704 ExecStart=/bin/bash -c ${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS} ${EPOCH} (code=exited, status=0/SUCCESS)
Main PID: 13705 (ganesha.nfsd)
CGroup: /system.slice/nfs-ganesha.service
└─13705 /usr/bin/ganesha.nfsd
Apr 06 07:21:55 dhcp37-180.lab.eng.blr.redhat.com nfs-ganesha[13705]: [dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume test exported at : '/'
Apr 06 07:25:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-13] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr 06 07:25:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-8] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr 06 07:25:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-9] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr 06 07:25:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-12] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr 06 07:25:53 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-7] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr 06 07:26:51 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat
Apr 06 07:26:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat
Apr 06 07:26:53 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat
Apr 06 07:26:54 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat
in /var/log/messages:
Apr 6 07:25:52 dhcp37-141 ganesha.nfsd[13705]: [work-13] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr 6 07:25:52 dhcp37-141 ganesha.nfsd[13705]: [work-8] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr 6 07:25:52 dhcp37-141 ganesha.nfsd[13705]: [work-9] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr 6 07:25:52 dhcp37-141 ganesha.nfsd[13705]: [work-12] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr 6 07:25:53 dhcp37-141 ganesha.nfsd[13705]: [work-7] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr 6 07:26:51 dhcp37-141 ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat
Apr 6 07:26:52 dhcp37-141 ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat
Apr 6 07:26:53 dhcp37-141 ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat
Apr 6 07:26:54 dhcp37-141 ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat
in ganesha-agapi.log:
The message "W [MSGID: 114031] [client-rpc-fops.c:907:client3_3_writev_cbk] 0-test-client-2: remote operation failed [No space left on device]" repeated 57 times between [2016-04-06 01:55:53.061325] and [2016-04-06 01:55:54.203684]
As per discussion with jiffin, these messages are expected.
Based on the above observations, marking this bug as Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2016:1288 |
Description of problem: Writing more data than NFS volume will hold crashes all nodes in a Gluster Voume Version-Release number of selected component (if applicable): How reproducible: consistent Steps to Reproduce: 1. setup two VMs with Redhat Gluster Storage 2. Configure NFS Ganesha 3. setup test volume 4. fill the volume with a file, which is too big for the volume: The test volume was 2GB and i wrote a 2.5 GB file to the mounted directory using dd Actual results: 5. NFS Ganesha crashes on the node, of which the volume was initially mounted: 19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 22(CACHE_INODE_IO_ERROR) for entry 0x2ab57a0 19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_rdwr_plus :INODE :CRIT :Error closing file in cache_inode_rdwr: 22. 19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_rdwr_plus :INODE :CRIT :Error closing file in cache_inode_rdwr: 22. 6. Even worse, after some seconds, the second NFS Ganesha Process on the 2nd node crashes also: 19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_close :INODE :CRIT :FSAL_close failed, returning 22(CACHE_INODE_IO_ERROR) for entry 0x2ab57a0 19/08/2015 20:41:10 : epoch 55d4ca38 : gluster2.local : ganesha.nfsd-5969[work-13] cache_inode_rdwr_plus :INODE :CRIT :Error closing file in cache_inode_rdwr: 22. Expected results: Error message, but no crash Additional info: From customer - "Conclusion: By writing a file to a too small volume it's possible to crash the ENTIRE HA cluster. Quite bad :-("