Bug 1324489

Summary: ganesha mount hangs when trying to create a file larger than the volume size.
Product: Red Hat Gluster Storage Reporter: Shashank Raj <sraj>
Component: nfs-ganeshaAssignee: Jiffin <jthottan>
Status: CLOSED WORKSFORME QA Contact: Manisha Saini <msaini>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: jthottan, kkeithle, mzywusko, ndevos, nlevinki, rhinduja, skoduri
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-30 11:46:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Shashank Raj 2016-04-06 13:06:36 UTC
Description of problem:
ganesha mount hangs when trying to create a file larger than the volume size.

Version-Release number of selected component (if applicable):
glusterfs-3.7.9-1

How reproducible:
Twice

Steps to Reproduce:

1. Create a cluster of 4 nodes.
2. Configure nfs-ganesha on the cluster.
3. Create a volume with size 4GB and enable ganesha on the volume.

Volume Name: test
Type: Distribute
Volume ID: e1ea7ada-30b3-478d-b7d0-f76d79f063d9
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: 10.70.37.180:/bricks/brick0/b0
Brick2: 10.70.37.158:/bricks/brick0/b0
Brick3: 10.70.37.127:/bricks/brick0/b0
Brick4: 10.70.37.174:/bricks/brick0/b0
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
nfs.disable: on
performance.readdir-ahead: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable

4. Mount the volume on the client

[root@dhcp37-206 ~]# df -h
Filesystem                              Size  Used Avail Use% Mounted on
/dev/mapper/rhel_dhcp37--206-root        27G  2.3G   25G   9% /
devtmpfs                                3.9G     0  3.9G   0% /dev
tmpfs                                   3.9G     0  3.9G   0% /dev/shm
tmpfs                                   3.9G  8.4M  3.9G   1% /run
tmpfs                                   3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/vda1                               497M  124M  373M  25% /boot
tmpfs                                   783M     0  783M   0% /run/user/0
rhsqe-repo.lab.eng.blr.redhat.com:/opt  1.9T  353G  1.4T  20% /opt
10.70.36.217:/test                      4.0G  130M  3.9G   4% /mnt

5. Create a file on the mount point of size 5GB.

After some time observe the status of nfs-ganesha service on the mounted node

[root@dhcp37-180 brick0]# service nfs-ganesha status -l
Redirecting to /bin/systemctl status  -l nfs-ganesha.service
● nfs-ganesha.service - NFS-Ganesha file server
   Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2016-04-06 07:20:16 IST; 7min ago
     Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki
  Process: 13197 ExecStop=/bin/dbus-send --system --dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown (code=exited, status=0/SUCCESS)
  Process: 13706 ExecStartPost=/bin/bash -c prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE (code=exited, status=0/SUCCESS)
  Process: 13704 ExecStart=/bin/bash -c ${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS} ${EPOCH} (code=exited, status=0/SUCCESS)
 Main PID: 13705 (ganesha.nfsd)
   CGroup: /system.slice/nfs-ganesha.service
           └─13705 /usr/bin/ganesha.nfsd

Apr 06 07:21:55 dhcp37-180.lab.eng.blr.redhat.com nfs-ganesha[13705]: [dbus_heartbeat] glusterfs_create_export :FSAL :EVENT :Volume test exported at : '/'
Apr 06 07:25:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-13] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr 06 07:25:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-8] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr 06 07:25:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-9] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr 06 07:25:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-12] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr 06 07:25:53 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [work-7] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr 06 07:26:51 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
Apr 06 07:26:52 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
Apr 06 07:26:53 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
Apr 06 07:26:54 dhcp37-180.lab.eng.blr.redhat.com ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat


and below messages in /var/log/messages:


Apr  6 07:25:52 dhcp37-141 ganesha.nfsd[13705]: [work-13] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr  6 07:25:52 dhcp37-141 ganesha.nfsd[13705]: [work-8] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr  6 07:25:52 dhcp37-141 ganesha.nfsd[13705]: [work-9] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr  6 07:25:52 dhcp37-141 ganesha.nfsd[13705]: [work-12] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
Apr  6 07:25:53 dhcp37-141 ganesha.nfsd[13705]: [work-7] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable


Apr  6 07:26:51 dhcp37-141 ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
Apr  6 07:26:52 dhcp37-141 ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
Apr  6 07:26:53 dhcp37-141 ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
Apr  6 07:26:54 dhcp37-141 ganesha.nfsd[13705]: [dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat

messages in ganesha-gfapi.log:

The message "W [MSGID: 114031] [client-rpc-fops.c:907:client3_3_writev_cbk] 0-test-client-2: remote operation failed [No space left on device]" repeated 57 times between [2016-04-06 01:55:53.061325] and [2016-04-06 01:55:54.203684]

6. Observe that the mount point stays hang. 

Actual results:

ganesha mount hangs when trying to create a file larger than the volume size.

Expected results:

ganesha mount point should error out with I/O failed message.

Additional info:

Comment 2 Shashank Raj 2016-04-06 13:15:21 UTC
sosreports, packet trace and logs are placed under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1324489

Comment 3 Jiffin 2016-04-08 17:33:28 UTC
I tried to reproduce the issue on 1x2 replica volume on a rhel 7.

for version 4, client reported error as expected.

for version 3, client hanged for one hour and then reported error. In the ganeshsa logs, it seems client tries to retry a lot of times and every time it failed.
07/04/2016 13:01:55 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-5] cache_inode_rdwr :INODE :CRIT :Error closing file
 in cache_inode_rdwr: 20.
07/04/2016 13:01:57 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-4] file_close :FSAL :CRIT :Error : close returns wit
h No space left on device
07/04/2016 13:01:57 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-4] cache_inode_close :INODE :CRIT :FSAL_close failed
, returning 20(CACHE_INODE_NO_SPACE_LEFT) for entry 0x1c57820
07/04/2016 13:01:57 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-4] cache_inode_rdwr :INODE :CRIT :Error closing file
 in cache_inode_rdwr: 20.
07/04/2016 13:02:03 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-11] file_close :FSAL :CRIT :Error : close returns wi
th No space left on device
07/04/2016 13:02:03 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-11] cache_inode_close :INODE :CRIT :FSAL_close faile
d, returning 20(CACHE_INODE_NO_SPACE_LEFT) for entry 0x1c57820
07/04/2016 13:02:03 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-11] cache_inode_rdwr :INODE :CRIT :Error closing fil
e in cache_inode_rdwr: 20.
07/04/2016 13:02:05 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-7] file_close :FSAL :CRIT :Error : close returns wit
h No space left on device
07/04/2016 13:02:05 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-7] cache_inode_close :INODE :CRIT :FSAL_close failed
, returning 20(CACHE_INODE_NO_SPACE_LEFT) for entry 0x1c57820
07/04/2016 13:02:05 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-7] cache_inode_rdwr :INODE :CRIT :Error closing file
 in cache_inode_rdwr: 20.
07/04/2016 13:02:05 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-1] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE
_IO_ERROR in nfs3_write converted to NFS3ERR_IO but was set non-retryable
07/04/2016 13:02:07 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
07/04/2016 13:02:08 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health status is unhealthy.  Not sending heartbeat
07/04/2016 13:02:11 : epoch 5705fe09 : dhcp43-58.lab.eng.blr.redhat.com : ganesha.nfsd-22034[work-7] file_close :FSAL :CRIT :Error : close returns with No space left on device



I didn't get a chance look into packet trace till now.

Comment 4 Jiffin 2016-04-12 10:47:58 UTC
These are my findings : 

I created a single brick volume with size = 1G

Volume Name: vol
Type: Distribute
Volume ID: f4d91e5e-be63-41fc-be85-5aecb788332a
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: jiffin17:/mnt/bricks/b2/dir
Options Reconfigured:
nfs.disable: on
features.cache-invalidation: on
transport.address-family: inet
performance.readdir-ahead: on

Mount the volume using v3 option
creates a file with size 2GB using dd , it failed
[root@dhcp42-42 ~]# dd if=/dev/zero of=/mnt/b bs=20M count=100
dd: closing output file ‘/mnt/b’: No space left on device
then creates a another new file
dd if=/dev/zero of=/mnt/c bs=20M count=100
it hangs for around 6 hours(now)

Here there the nfs clients sends lot write reqs to ganesha server and all those request are in wait state(all the 16 worker threads) and client never got replied.

15 out of 16 threads are in wait state just before cache_inode_close() in cache_inode_rdwr()
gdb) bt
#0  0x00007f8b47e9dee0 in futex_wait (private=<optimized out>, expected=118, futex_word=0x7f8abc00bd8c)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:61
#1  futex_wait_simple (private=<optimized out>, expected=118, futex_word=0x7f8abc00bd8c) at ../sysdeps/nptl/futex-internal.h:135
#2  __pthread_rwlock_wrlock_slow (rwlock=0x7f8abc00bd80) at pthread_rwlock_wrlock.c:67
#3  0x00000000004e493f in cache_inode_rdwr (entry=0x7f8abc00bc40, io_direction=CACHE_INODE_WRITE, offset=145752064, io_size=1048576, 
    bytes_moved=0x7f8b13f02ea8, buffer=0x7f8aac904230, eof=0x7f8b13f02ea7, sync=0x7f8b13f02ea6, info=0x0)
    at /root/nfs-ganesha/src/cache_inode/cache_inode_rdwr.c:226
#4  0x000000000045e6f2 in nfs3_write (arg=0x7f8aac903c50, req=0x7f8aac903a90, res=0x7f8ab4000b20)
    at /root/nfs-ganesha/src/Protocols/NFS/nfs3_write.c:230
#5  0x0000000000444567 in nfs_rpc_execute (reqdata=0x7f8aac903a60) at /root/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1288
#6  0x0000000000444ea5 in worker_run (ctx=0x1fa2eb0) at /root/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1548
#7  0x0000000000519f73 in fridgethr_start_routine (arg=0x1fa2eb0) at /root/nfs-ganesha/src/support/fridgethr.c:561
#8  0x00007f8b47e9960a in start_thread (arg=0x7f8b13f04700) at pthread_create.c:334
#9  0x00007f8b47566a4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) f 3
#3  0x00000000004e493f in cache_inode_rdwr (entry=0x7f8abc00bc40, io_direction=CACHE_INODE_WRITE, offset=145752064, io_size=1048576, 
    bytes_moved=0x7f8b13f02ea8, buffer=0x7f8aac904230, eof=0x7f8b13f02ea7, sync=0x7f8b13f02ea6, info=0x0)
    at /root/nfs-ganesha/src/cache_inode/cache_inode_rdwr.c:226
226				PTHREAD_RWLOCK_wrlock(&entry->content_lock);
(gdb) l
221	
222				LogCrit (COMPONENT_CACHE_INODE,
223					     "cache_inode_rdwr: CLOSING entry %p",
224					     entry);
225				PTHREAD_RWLOCK_unlock(&entry->content_lock);
226				PTHREAD_RWLOCK_wrlock(&entry->content_lock);
227	
228				cstatus =
229				    cache_inode_close(entry,
230						      (CACHE_INODE_FLAG_REALLYCLOSE 

 
other one thread waits on syncop_writev() call
(gdb) t 19
[Switching to thread 19 (Thread 0x7f8b16f0a700 (LWP 8588))]

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185	62:	movl	(%rsp), %edi
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f8b43cbb372 in syncop_writev (subvol=0x7f8b280163f0, fd=0x7f8ad0000f8c, vector=0x7f8b16f08870, count=1, offset=155189248, 
    iobref=0x7f8acc004a50, flags=0, xdata_in=0x0, xdata_out=0x0) at syncop.c:1975
#2  0x00007f8b43f451eb in pub_glfs_pwritev (glfd=0x7f8ad0000eb0, iovec=0x7f8b16f08900, iovcnt=1, offset=155189248, flags=0) at glfs-fops.c:1124
#3  0x00007f8b43f453cc in pub_glfs_pwrite (glfd=0x7f8ad0000eb0, buf=0x7f8aad208a90, count=1048576, offset=155189248, flags=0) at glfs-fops.c:1194
#4  0x00007f8b44162c98 in file_write (obj_hdl=0x7f8abc00b5c8, seek_descriptor=155189248, buffer_size=1048576, buffer=0x7f8aad208a90, 
    write_amount=0x7f8b16f08ea8, fsal_stable=0x7f8b16f089cf) at /root/nfs-ganesha/src/FSAL/FSAL_GLUSTER/handle.c:1129
#5  0x00000000004e45b2 in cache_inode_rdwr (entry=0x7f8abc00bc40, io_direction=CACHE_INODE_WRITE, offset=155189248, io_size=1048576, 
    bytes_moved=0x7f8b16f08ea8, buffer=0x7f8aad208a90, eof=0x7f8b16f08ea7, sync=0x7f8b16f08ea6, info=0x0)
    at /root/nfs-ganesha/src/cache_inode/cache_inode_rdwr.c:173
#6  0x000000000045e6f2 in nfs3_write (arg=0x7f8aad208480, req=0x7f8aad2082c0, res=0x7f8acc00a7e0)
    at /root/nfs-ganesha/src/Protocols/NFS/nfs3_write.c:230
#7  0x0000000000444567 in nfs_rpc_execute (reqdata=0x7f8aad208290) at /root/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1288
#8  0x0000000000444ea5 in worker_run (ctx=0x1fa17d0) at /root/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1548
#9  0x0000000000519f73 in fridgethr_start_routine (arg=0x1fa17d0) at /root/nfs-ganesha/src/support/fridgethr.c:561
#10 0x00007f8b47e9960a in start_thread (arg=0x7f8b16f0a700) at pthread_create.c:334
#11 0x00007f8b47566a4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

In the packet trace one write request does not reply from glusterfs server (only call is there, no reply)
I took the statedump but it didn't help much

And also I am a little confused regarding the way in nfs3_write() in ganesha returns the value.

In the code at the end, we can see
.
.
        res->res_write3.status = nfs3_Errno(cache_status);

        nfs_SetWccData(NULL, entry,
                       &res->res_write3.WRITE3res_u.resfail.file_wcc);

        rc = NFS_REQ_OK;

 out:
        /* return references */
        if (entry)
                cache_inode_put(entry);

        server_stats_io_done(size, written_size,
                             (rc == NFS_REQ_OK) ? true : false,
                             true);
        return rc;
}

ganesha stores status of write op in res->res_write3.status, but generally it returns NFS_REQ_OK even if write failed.
the rfc1813 says store the status in res->res_write3.status , but still wondering whether it is right to send "OK"
Adding to that , I can't find any errors in the write reply calls in packet trace which I collected.(although there are errors in ganesha logs)

for v4, it always sends status as a return value

Comment 5 Jiffin 2016-04-12 11:50:36 UTC
Adding one more comment , restarting ganesha will clear up pending requests and i/o resumes in the client.

Comment 6 Shashank Raj 2016-04-15 13:48:24 UTC
Similar behavior is seen when we try to write a file larger than the set quota limit. it gives "remote operation failed : Disk quota exceeded" error message in logs on server side but the client remains in hang state.

snippet from ganesha-afapi.log

[2016-04-15 02:50:03.081780] W [MSGID: 114031] [client-rpc-fops.c:907:client3_3_writev_cbk] 0-newvolume-client-15: remote operation failed [Disk quota exceeded]
The message "W [MSGID: 114031] [client-rpc-fops.c:907:client3_3_writev_cbk] 0-newvolume-client-15: remote operation failed [Disk quota exceeded]" repeated 2 times between [2016-04-15 02:50:03.081780] and [2016-04-15 02:50:03.111949]

root@dhcp37-180 /]# gluster volume quota newvolume list
                  Path                   Hard-limit  Soft-limit      Used  Available  Soft-limit exceeded? Hard-limit exceeded?
-------------------------------------------------------------------------------------------------------------------------------
/dir                                      25.0GB     80%(20.0GB)   25.0GB  0Bytes             Yes                  Yes
[root@dhcp37-180 /]# 

mounted with v4:

10.70.36.217:/newvolume on /mnt/nfs1 type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.70.37.206,local_lock=none,addr=10.70.36.217)

Comment 7 Shashank Raj 2016-09-29 13:46:09 UTC
Jiffin,

Can we confirm this behavior?