Hide Forgot
Description of problem: ganesha mount hangs when trying to create a file larger than the volume size. Version-Release number of selected component (if applicable): glusterfs-3.7.9-1 How reproducible: Twice Steps to Reproduce: 1. Create a cluster of 4 nodes. 2. Configure nfs-ganesha on the cluster. 3. Create a volume with size 4GB and enable ganesha on the volume. Volume Name: test Type: Distribute Volume ID: e1ea7ada-30b3-478d-b7d0-f76d79f063d9 Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: 10.70.37.180:/bricks/brick0/b0 Brick2: 10.70.37.158:/bricks/brick0/b0 Brick3: 10.70.37.127:/bricks/brick0/b0 Brick4: 10.70.37.174:/bricks/brick0/b0 Options Reconfigured: ganesha.enable: on features.cache-invalidation: on nfs.disable: on performance.readdir-ahead: on nfs-ganesha: enable cluster.enable-shared-storage: enable 4. Mount the volume on the client [root@dhcp37-206 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhel_dhcp37--206-root 27G 2.3G 25G 9% / devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 3.9G 8.4M 3.9G 1% /run tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/vda1 497M 124M 373M 25% /boot tmpfs 783M 0 783M 0% /run/user/0 rhsqe-repo.lab.eng.blr.redhat.com:/opt 1.9T 353G 1.4T 20% /opt 10.70.36.217:/test 4.0G 130M 3.9G 4% /mnt 5. Create a file on the mount point of size 5GB. After some time observe the status of nfs-ganesha service on the mounted node [root@dhcp37-180 brick0]# service nfs-ganesha status -l Redirecting to /bin/systemctl status -l nfs-ganesha.service ● nfs-ganesha.service - NFS-Ganesha file server Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled; vendor preset: disabled) Active: active (running) since Wed 2016-04-06 07:20:16 IST; 7min ago Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki Process: 13197 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS) Process: 13706 ExecStartPost=/bin/bash -c prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE (code=exited, status=0/SUCCESS) Process: 13704 ExecStart=/bin/bash -c ${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS} ${EPOCH} (code=exited, status=0/SUCCESS) Main PID: 13705 (ganesha.nfsd) CGroup: /system.slice/nfs-ganesha.service └─13705 /usr/bin/ganesha.nfsd Apr 06 07:21:55 dhcp37-180.lab.eng.blr.redhat.com nfs-ganesha[13705]: [dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume test exported at : '/' Apr 06 07:25:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-13] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 06 07:25:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-8] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 06 07:25:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-9] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 06 07:25:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-12] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 06 07:25:53 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-7] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 06 07:26:51 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat Apr 06 07:26:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat Apr 06 07:26:53 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat Apr 06 07:26:54 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat and below messages in /var/log/messages: Apr 6 07:25:52 dhcp37-141 ganesha.nfsd[13705]: [work-13] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 6 07:25:52 dhcp37-141 ganesha.nfsd[13705]: [work-8] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 6 07:25:52 dhcp37-141 ganesha.nfsd[13705]: [work-9] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 6 07:25:52 dhcp37-141 ganesha.nfsd[13705]: [work-12] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 6 07:25:53 dhcp37-141 ganesha.nfsd[13705]: [work-7] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable Apr 6 07:26:51 dhcp37-141 ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat Apr 6 07:26:52 dhcp37-141 ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat Apr 6 07:26:53 dhcp37-141 ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat Apr 6 07:26:54 dhcp37-141 ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat messages in ganesha-gfapi.log: The message "W [MSGID: 114031] [client-rpc-fops.c:907:client3_3_writev_cbk] 0-test-client-2: remote operation failed [No space left on device]" repeated 57 times between [2016-04-06 01:55:53.061325] and [2016-04-06 01:55:54.203684] 6. Observe that the mount point stays hang. Actual results: ganesha mount hangs when trying to create a file larger than the volume size. Expected results: ganesha mount point should error out with I/O failed message. Additional info:
sosreports, packet trace and logs are placed under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1324489
I tried to reproduce the issue on 1x2 replica volume on a rhel 7. for version 4, client reported error as expected. for version 3, client hanged for one hour and then reported error. In the ganeshsa logs, it seems client tries to retry a lot of times and every time it failed. 07/04/2016 13:01:55 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-5] cache_inode_rdwr :INODE :CRIT :Error closing file in cache_inode_rdwr: 20. 07/04/2016 13:01:57 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-4] file_close :FSAL :CRIT :Error : close returns wit h No space left on device 07/04/2016 13:01:57 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-4] cache_inode_close :INODE :CRIT :FSAL_close failed , returning 20(CACHE_INODE_NO_SPACE_LEFT) for entry 0x1c57820 07/04/2016 13:01:57 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-4] cache_inode_rdwr :INODE :CRIT :Error closing file in cache_inode_rdwr: 20. 07/04/2016 13:02:03 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-11] file_close :FSAL :CRIT :Error : close returns wi th No space left on device 07/04/2016 13:02:03 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-11] cache_inode_close :INODE :CRIT :FSAL_close faile d, returning 20(CACHE_INODE_NO_SPACE_LEFT) for entry 0x1c57820 07/04/2016 13:02:03 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-11] cache_inode_rdwr :INODE :CRIT :Error closing fil e in cache_inode_rdwr: 20. 07/04/2016 13:02:05 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-7] file_close :FSAL :CRIT :Error : close returns wit h No space left on device 07/04/2016 13:02:05 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-7] cache_inode_close :INODE :CRIT :FSAL_close failed , returning 20(CACHE_INODE_NO_SPACE_LEFT) for entry 0x1c57820 07/04/2016 13:02:05 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-7] cache_inode_rdwr :INODE :CRIT :Error closing file in cache_inode_rdwr: 20. 07/04/2016 13:02:05 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-1] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE _IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable 07/04/2016 13:02:07 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat 07/04/2016 13:02:08 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy. Not sending heartbeat 07/04/2016 13:02:11 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-7] file_close :FSAL :CRIT :Error : close returns with No space left on device I didn't get a chance look into packet trace till now.
These are my findings : I created a single brick volume with size = 1G Volume Name: vol Type: Distribute Volume ID: f4d91e5e-be63-41fc-be85-5aecb788332a Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: jiffin17:/mnt/bricks/b2/dir Options Reconfigured: nfs.disable: on features.cache-invalidation: on transport.address-family: inet performance.readdir-ahead: on Mount the volume using v3 option creates a file with size 2GB using dd , it failed [root@dhcp42-42 ~]# dd if=/dev/zero of=/mnt/b bs=20M count=100 dd: closing output file ‘/mnt/b’: No space left on device then creates a another new file dd if=/dev/zero of=/mnt/c bs=20M count=100 it hangs for around 6 hours(now) Here there the nfs clients sends lot write reqs to ganesha server and all those request are in wait state(all the 16 worker threads) and client never got replied. 15 out of 16 threads are in wait state just before cache_inode_close() in cache_inode_rdwr() gdb) bt #0 0x00007f8b47e9dee0 in futex_wait (private=<optimized out>, expected=118, futex_word=0x7f8abc00bd8c) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=<optimized out>, expected=118, futex_word=0x7f8abc00bd8c) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_rwlock_wrlock_slow (rwlock=0x7f8abc00bd80) at pthread_rwlock_wrlock.c:67 #3 0x00000000004e493f in cache_inode_rdwr (entry=0x7f8abc00bc40, io_direction=CACHE_INODE_WRITE, offset=145752064, io_size=1048576, bytes_moved=0x7f8b13f02ea8, buffer=0x7f8aac904230, eof=0x7f8b13f02ea7, sync=0x7f8b13f02ea6, info=0x0) at /root/nfs-ganesha/src/cache_inode/cache_inode_rdwr.c:226 #4 0x000000000045e6f2 in nfs3_write (arg=0x7f8aac903c50, req=0x7f8aac903a90, res=0x7f8ab4000b20) at /root/nfs-ganesha/src/Protocols/NFS/nfs3_write.c:230 #5 0x0000000000444567 in nfs_rpc_execute (reqdata=0x7f8aac903a60) at /root/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1288 #6 0x0000000000444ea5 in worker_run (ctx=0x1fa2eb0) at /root/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1548 #7 0x0000000000519f73 in fridgethr_start_routine (arg=0x1fa2eb0) at /root/nfs-ganesha/src/support/fridgethr.c:561 #8 0x00007f8b47e9960a in start_thread (arg=0x7f8b13f04700) at pthread_create.c:334 #9 0x00007f8b47566a4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 (gdb) f 3 #3 0x00000000004e493f in cache_inode_rdwr (entry=0x7f8abc00bc40, io_direction=CACHE_INODE_WRITE, offset=145752064, io_size=1048576, bytes_moved=0x7f8b13f02ea8, buffer=0x7f8aac904230, eof=0x7f8b13f02ea7, sync=0x7f8b13f02ea6, info=0x0) at /root/nfs-ganesha/src/cache_inode/cache_inode_rdwr.c:226 226 PTHREAD_RWLOCK_wrlock(&entry->content_lock); (gdb) l 221 222 LogCrit (COMPONENT_CACHE_INODE, 223 "cache_inode_rdwr: CLOSING entry %p", 224 entry); 225 PTHREAD_RWLOCK_unlock(&entry->content_lock); 226 PTHREAD_RWLOCK_wrlock(&entry->content_lock); 227 228 cstatus = 229 cache_inode_close(entry, 230 (CACHE_INODE_FLAG_REALLYCLOSE other one thread waits on syncop_writev() call (gdb) t 19 [Switching to thread 19 (Thread 0x7f8b16f0a700 (LWP 8588))] #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 185 62: movl (%rsp), %edi (gdb) bt #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f8b43cbb372 in syncop_writev (subvol=0x7f8b280163f0, fd=0x7f8ad0000f8c, vector=0x7f8b16f08870, count=1, offset=155189248, iobref=0x7f8acc004a50, flags=0, xdata_in=0x0, xdata_out=0x0) at syncop.c:1975 #2 0x00007f8b43f451eb in pub_glfs_pwritev (glfd=0x7f8ad0000eb0, iovec=0x7f8b16f08900, iovcnt=1, offset=155189248, flags=0) at glfs-fops.c:1124 #3 0x00007f8b43f453cc in pub_glfs_pwrite (glfd=0x7f8ad0000eb0, buf=0x7f8aad208a90, count=1048576, offset=155189248, flags=0) at glfs-fops.c:1194 #4 0x00007f8b44162c98 in file_write (obj_hdl=0x7f8abc00b5c8, seek_descriptor=155189248, buffer_size=1048576, buffer=0x7f8aad208a90, write_amount=0x7f8b16f08ea8, fsal_stable=0x7f8b16f089cf) at /root/nfs-ganesha/src/FSAL/FSAL_GLUSTER/handle.c:1129 #5 0x00000000004e45b2 in cache_inode_rdwr (entry=0x7f8abc00bc40, io_direction=CACHE_INODE_WRITE, offset=155189248, io_size=1048576, bytes_moved=0x7f8b16f08ea8, buffer=0x7f8aad208a90, eof=0x7f8b16f08ea7, sync=0x7f8b16f08ea6, info=0x0) at /root/nfs-ganesha/src/cache_inode/cache_inode_rdwr.c:173 #6 0x000000000045e6f2 in nfs3_write (arg=0x7f8aad208480, req=0x7f8aad2082c0, res=0x7f8acc00a7e0) at /root/nfs-ganesha/src/Protocols/NFS/nfs3_write.c:230 #7 0x0000000000444567 in nfs_rpc_execute (reqdata=0x7f8aad208290) at /root/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1288 #8 0x0000000000444ea5 in worker_run (ctx=0x1fa17d0) at /root/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1548 #9 0x0000000000519f73 in fridgethr_start_routine (arg=0x1fa17d0) at /root/nfs-ganesha/src/support/fridgethr.c:561 #10 0x00007f8b47e9960a in start_thread (arg=0x7f8b16f0a700) at pthread_create.c:334 #11 0x00007f8b47566a4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 In the packet trace one write request does not reply from glusterfs server (only call is there, no reply) I took the statedump but it didn't help much And also I am a little confused regarding the way in nfs3_write() in ganesha returns the value. In the code at the end, we can see . . res->res_write3.status = nfs3_Errno(cache_status); nfs_SetWccData(NULL, entry, &res->res_write3.WRITE3res_u.resfail.file_wcc); rc = NFS_REQ_OK; out: /* return references */ if (entry) cache_inode_put(entry); server_stats_io_done(size, written_size, (rc == NFS_REQ_OK) ? true : false, true); return rc; } ganesha stores status of write op in res->res_write3.status, but generally it returns NFS_REQ_OK even if write failed. the rfc1813 says store the status in res->res_write3.status , but still wondering whether it is right to send "OK" Adding to that , I can't find any errors in the write reply calls in packet trace which I collected.(although there are errors in ganesha logs) for v4, it always sends status as a return value
Adding one more comment , restarting ganesha will clear up pending requests and i/o resumes in the client.
Similar behavior is seen when we try to write a file larger than the set quota limit. it gives "remote operation failed : Disk quota exceeded" error message in logs on server side but the client remains in hang state. snippet from ganesha-afapi.log [2016-04-15 02:50:03.081780] W [MSGID: 114031] [client-rpc-fops.c:907:client3_3_writev_cbk] 0-newvolume-client-15: remote operation failed [Disk quota exceeded] The message "W [MSGID: 114031] [client-rpc-fops.c:907:client3_3_writev_cbk] 0-newvolume-client-15: remote operation failed [Disk quota exceeded]" repeated 2 times between [2016-04-15 02:50:03.081780] and [2016-04-15 02:50:03.111949] root@dhcp37-180 /]# gluster volume quota newvolume list Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? ------------------------------------------------------------------------------------------------------------------------------- /dir 25.0GB 80%(20.0GB) 25.0GB 0Bytes Yes Yes [root@dhcp37-180 /]# mounted with v4: 10.70.36.217:/newvolume on /mnt/nfs1 type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.70.37.206,local_lock=none,addr=10.70.36.217)
Jiffin, Can we confirm this behavior?