This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours

Bug 802243

Summary: NFS: mount point unresponsive when disks get full
Product: [Community] GlusterFS Reporter: Sachidananda Urs <sac>
Component: nfsAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: unspecified Docs Contact:
Priority: medium    
Version: pre-releaseCC: bugs, gluster-bugs, pkarampu, redhat.bugs, rgowdapp, tru, vbellur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 852573 (view as bug list) Environment:
Last Closed: 2015-10-22 11:40:20 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 852573    

Description Sachidananda Urs 2012-03-12 04:07:48 EDT
Description of problem:

Mount point becomes unresponsive and processes on the mount point go into `D' state when the bricks become full. And takes a long time to bring back operations to normal.


Version-Release number of selected component (if applicable):

3.3.0qa26

How reproducible:

Always

Steps to Reproduce:
1. 4 Node distribute, write data till the disk becomes full. The process writing data will fail with `No space left on device'.
2. Try operations like ls/ps/grep/rm ... the processes go into `D' state 
  
Actual results:


Expected results:


Additional info:

[2012-03-10 09:17:31.452413] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-03-10 09:47:34.777302] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-03-10 10:00:46.921226] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-03-10 10:06:46.986000] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-03-10 10:31:59.259639] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now

Some of the DHT errors that might help. Please find attached logs.

=====

[2012-03-12 08:16:50.509023] I [dht-layout.c:600:dht_layout_normalize] 0-nfs-test-1-dht: found anomalies in <gfid:00000000-0000-0000-0000-000000000000>. holes=1 overlaps=0
[2012-03-12 08:16:50.509061] E [nfs3-helpers.c:3768:nfs3_fh_resolve_inode_lookup_cbk] 0-nfs-nfsv3: Lookup failed: <gfid:00000000-0000-0000-0000-000000000000>: Invalid argument
[2012-03-12 08:16:50.509096] E [nfs3.c:739:nfs3_getattr_resume] 0-nfs-nfsv3: Unable to resolve FH: nfs-test-1 : 00000000-0000-0000-0000-000000000000
[2012-03-12 08:16:50.509115] W [nfs3-helpers.c:3392:nfs3_log_common_res] 0-nfs-nfsv3: XID: e2e0a1d1, GETATTR: NFS: 22(Invalid argument for operation), POSIX: 14(Bad address)
[2012-03-12 08:16:50.510457] I [dht-layout.c:600:dht_layout_normalize] 0-nfs-test-1-dht: found anomalies in /user17/linux-3.2.9/Documentation/sound/alsa. holes=1 overlaps=0
[2012-03-12 08:16:50.510807] W [client3_1-fops.c:296:client3_1_mkdir_cbk] 0-nfs-test-1-client-1: remote operation failed: No such file or directory. Path: /user17/linux-3.2.9/Documentation/sound/alsa
[2012-03-12 08:16:50.510846] W [dht-selfheal.c:366:dht_selfheal_dir_mkdir_cbk] 0-nfs-test-1-dht: selfhealing directory /user17/linux-3.2.9/Documentation/sound/alsa failed: No such file or directory
[2012-03-12 08:16:50.600839] W [client3_1-fops.c:1189:client3_1_fstat_cbk] 0-nfs-test-1-client-1: remote operation failed: No such file or directory
Comment 1 Pranith Kumar K 2013-05-02 05:51:47 EDT
Followed the given steps and the mount is responsive. Could you please let me know if there is any other way to re-create this issue.

root - /mnt/vol 
15:16:45 :) ⚡ df -h
Filesystem          Size  Used Avail Use% Mounted on
rootfs               50G   17G   32G  35% /
devtmpfs            3.9G     0  3.9G   0% /dev
tmpfs               3.9G  960K  3.9G   1% /dev/shm
tmpfs               3.9G  2.2M  3.9G   1% /run
/dev/sda2            50G   17G   32G  35% /
tmpfs               3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs               3.9G     0  3.9G   0% /media
/dev/sda1           240G  211G   30G  88% /home
10.70.42.235:/vol/   30G   30G   64K 100% /mnt/vol <<---- is the mount

root - /mnt/vol 
15:16:50 :) ⚡ ls
1   11  13  15  17  19  20  22  24  26  28  3   4  6  8
10  12  14  16  18  2   21  23  25  27  29  30  5  7  9

root - /mnt/vol 
15:17:40 :) ⚡ ls
1   11  13  15  17  19  20  22  24  26  28  3   4  6  8
10  12  14  16  18  2   21  23  25  27  29  30  5  7  9

root - /mnt/vol 
15:18:31 :( ⚡ cp 12 13

root - /mnt/vol 
15:19:03 :) ⚡ cp 12 131
cp: closing `131': No space left on device

root - /mnt/vol 
15:19:17 :( ⚡ mkdir 123

root - /mnt/vol 
15:20:06 :) ⚡ 

root - /mnt/vol 
15:20:06 :) ⚡ ls
1   11  123  131  15  17  19  20  22  24  26  28  3   4  6  8  a
10  12  13   14   16  18  2   21  23  25  27  29  30  5  7  9

root - /mnt/vol 
15:22:25 :( ⚡ ls
1   11  123  131  15  17  19  20  22  24  26  28  3   4  6  8  a
10  12  13   14   16  18  2   21  23  25  27  29  30  5  7  9

root - /mnt/vol 
15:22:29 :) ⚡ ls
1   11  123  131  15  17  19  20  22  24  26  28  3   4  6  8  a
10  12  13   14   16  18  2   21  23  25  27  29  30  5  7  9

root - /mnt/vol 
15:23:11 :) ⚡ ps
  PID TTY          TIME CMD
20774 pts/1    00:00:00 sudo
20775 pts/1    00:00:00 su
20778 pts/1    00:00:00 bash
25801 pts/1    00:00:00 ps
Comment 2 Raghavendra G 2013-06-20 01:42:08 EDT
Seems like a duplicate of bug #851601. To confirm, can you please check the cpu usage?
Comment 3 Richard 2013-07-22 06:59:49 EDT
This seems to still be a problem with the latest production 3.4 release ;-(

When mounting gluster via Fuse and using cp to copy a 100MB file the cp command hangs when the volume is full, you can't kill (or kill -9) the cp command and the cpu usage is high for gluster:

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
 4259 root       20   0  377M  128M  2596 S  0.0 12.9  0:15.41 /usr/sbin/glusterfs --volfile-id=/md0 --volfile-server=127.0.0.1 /home/glusterfs

I also have an issue with NFS freezing too which lead me to test with FUSE too.

Rich
Comment 4 Kaleb KEITHLEY 2015-10-22 11:40:20 EDT
pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.