Bug 802243 - NFS: mount point unresponsive when disks get full
NFS: mount point unresponsive when disks get full
Status: CLOSED EOL
Product: GlusterFS
Classification: Community
Component: nfs (Show other bugs)
pre-release
Unspecified Unspecified
medium Severity unspecified
: ---
: ---
Assigned To: bugs@gluster.org
: Triaged
Depends On:
Blocks: 852573
  Show dependency treegraph
 
Reported: 2012-03-12 04:07 EDT by Sachidananda Urs
Modified: 2015-10-22 11:40 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 852573 (view as bug list)
Environment:
Last Closed: 2015-10-22 11:40:20 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Sachidananda Urs 2012-03-12 04:07:48 EDT
Description of problem:

Mount point becomes unresponsive and processes on the mount point go into `D' state when the bricks become full. And takes a long time to bring back operations to normal.


Version-Release number of selected component (if applicable):

3.3.0qa26

How reproducible:

Always

Steps to Reproduce:
1. 4 Node distribute, write data till the disk becomes full. The process writing data will fail with `No space left on device'.
2. Try operations like ls/ps/grep/rm ... the processes go into `D' state 
  
Actual results:


Expected results:


Additional info:

[2012-03-10 09:17:31.452413] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-03-10 09:47:34.777302] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-03-10 10:00:46.921226] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-03-10 10:06:46.986000] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-03-10 10:31:59.259639] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now

Some of the DHT errors that might help. Please find attached logs.

=====

[2012-03-12 08:16:50.509023] I [dht-layout.c:600:dht_layout_normalize] 0-nfs-test-1-dht: found anomalies in <gfid:00000000-0000-0000-0000-000000000000>. holes=1 overlaps=0
[2012-03-12 08:16:50.509061] E [nfs3-helpers.c:3768:nfs3_fh_resolve_inode_lookup_cbk] 0-nfs-nfsv3: Lookup failed: <gfid:00000000-0000-0000-0000-000000000000>: Invalid argument
[2012-03-12 08:16:50.509096] E [nfs3.c:739:nfs3_getattr_resume] 0-nfs-nfsv3: Unable to resolve FH: nfs-test-1 : 00000000-0000-0000-0000-000000000000
[2012-03-12 08:16:50.509115] W [nfs3-helpers.c:3392:nfs3_log_common_res] 0-nfs-nfsv3: XID: e2e0a1d1, GETATTR: NFS: 22(Invalid argument for operation), POSIX: 14(Bad address)
[2012-03-12 08:16:50.510457] I [dht-layout.c:600:dht_layout_normalize] 0-nfs-test-1-dht: found anomalies in /user17/linux-3.2.9/Documentation/sound/alsa. holes=1 overlaps=0
[2012-03-12 08:16:50.510807] W [client3_1-fops.c:296:client3_1_mkdir_cbk] 0-nfs-test-1-client-1: remote operation failed: No such file or directory. Path: /user17/linux-3.2.9/Documentation/sound/alsa
[2012-03-12 08:16:50.510846] W [dht-selfheal.c:366:dht_selfheal_dir_mkdir_cbk] 0-nfs-test-1-dht: selfhealing directory /user17/linux-3.2.9/Documentation/sound/alsa failed: No such file or directory
[2012-03-12 08:16:50.600839] W [client3_1-fops.c:1189:client3_1_fstat_cbk] 0-nfs-test-1-client-1: remote operation failed: No such file or directory
Comment 1 Pranith Kumar K 2013-05-02 05:51:47 EDT
Followed the given steps and the mount is responsive. Could you please let me know if there is any other way to re-create this issue.

root - /mnt/vol 
15:16:45 :) ⚡ df -h
Filesystem          Size  Used Avail Use% Mounted on
rootfs               50G   17G   32G  35% /
devtmpfs            3.9G     0  3.9G   0% /dev
tmpfs               3.9G  960K  3.9G   1% /dev/shm
tmpfs               3.9G  2.2M  3.9G   1% /run
/dev/sda2            50G   17G   32G  35% /
tmpfs               3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs               3.9G     0  3.9G   0% /media
/dev/sda1           240G  211G   30G  88% /home
10.70.42.235:/vol/   30G   30G   64K 100% /mnt/vol <<---- is the mount

root - /mnt/vol 
15:16:50 :) ⚡ ls
1   11  13  15  17  19  20  22  24  26  28  3   4  6  8
10  12  14  16  18  2   21  23  25  27  29  30  5  7  9

root - /mnt/vol 
15:17:40 :) ⚡ ls
1   11  13  15  17  19  20  22  24  26  28  3   4  6  8
10  12  14  16  18  2   21  23  25  27  29  30  5  7  9

root - /mnt/vol 
15:18:31 :( ⚡ cp 12 13

root - /mnt/vol 
15:19:03 :) ⚡ cp 12 131
cp: closing `131': No space left on device

root - /mnt/vol 
15:19:17 :( ⚡ mkdir 123

root - /mnt/vol 
15:20:06 :) ⚡ 

root - /mnt/vol 
15:20:06 :) ⚡ ls
1   11  123  131  15  17  19  20  22  24  26  28  3   4  6  8  a
10  12  13   14   16  18  2   21  23  25  27  29  30  5  7  9

root - /mnt/vol 
15:22:25 :( ⚡ ls
1   11  123  131  15  17  19  20  22  24  26  28  3   4  6  8  a
10  12  13   14   16  18  2   21  23  25  27  29  30  5  7  9

root - /mnt/vol 
15:22:29 :) ⚡ ls
1   11  123  131  15  17  19  20  22  24  26  28  3   4  6  8  a
10  12  13   14   16  18  2   21  23  25  27  29  30  5  7  9

root - /mnt/vol 
15:23:11 :) ⚡ ps
  PID TTY          TIME CMD
20774 pts/1    00:00:00 sudo
20775 pts/1    00:00:00 su
20778 pts/1    00:00:00 bash
25801 pts/1    00:00:00 ps
Comment 2 Raghavendra G 2013-06-20 01:42:08 EDT
Seems like a duplicate of bug #851601. To confirm, can you please check the cpu usage?
Comment 3 Richard 2013-07-22 06:59:49 EDT
This seems to still be a problem with the latest production 3.4 release ;-(

When mounting gluster via Fuse and using cp to copy a 100MB file the cp command hangs when the volume is full, you can't kill (or kill -9) the cp command and the cpu usage is high for gluster:

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
 4259 root       20   0  377M  128M  2596 S  0.0 12.9  0:15.41 /usr/sbin/glusterfs --volfile-id=/md0 --volfile-server=127.0.0.1 /home/glusterfs

I also have an issue with NFS freezing too which lead me to test with FUSE too.

Rich
Comment 4 Kaleb KEITHLEY 2015-10-22 11:40:20 EDT
pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.

Note You need to log in before you can comment on or make changes to this bug.