Bug 802243

Summary: NFS: mount point unresponsive when disks get full
Product: [Community] GlusterFS Reporter: Sachidananda Urs <sac>
Component: nfsAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: unspecified Docs Contact:
Priority: medium    
Version: pre-releaseCC: bugs, gluster-bugs, jianwei1216, pasteur, pkarampu, redhat.bugs, rgowdapp, vbellur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 852573 (view as bug list) Environment:
Last Closed: 2015-10-22 15:40:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 852573    

Description Sachidananda Urs 2012-03-12 08:07:48 UTC
Description of problem:

Mount point becomes unresponsive and processes on the mount point go into `D' state when the bricks become full. And takes a long time to bring back operations to normal.


Version-Release number of selected component (if applicable):

3.3.0qa26

How reproducible:

Always

Steps to Reproduce:
1. 4 Node distribute, write data till the disk becomes full. The process writing data will fail with `No space left on device'.
2. Try operations like ls/ps/grep/rm ... the processes go into `D' state 
  
Actual results:


Expected results:


Additional info:

[2012-03-10 09:17:31.452413] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-03-10 09:47:34.777302] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-03-10 10:00:46.921226] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-03-10 10:06:46.986000] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now
[2012-03-10 10:31:59.259639] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now

Some of the DHT errors that might help. Please find attached logs.

=====

[2012-03-12 08:16:50.509023] I [dht-layout.c:600:dht_layout_normalize] 0-nfs-test-1-dht: found anomalies in <gfid:00000000-0000-0000-0000-000000000000>. holes=1 overlaps=0
[2012-03-12 08:16:50.509061] E [nfs3-helpers.c:3768:nfs3_fh_resolve_inode_lookup_cbk] 0-nfs-nfsv3: Lookup failed: <gfid:00000000-0000-0000-0000-000000000000>: Invalid argument
[2012-03-12 08:16:50.509096] E [nfs3.c:739:nfs3_getattr_resume] 0-nfs-nfsv3: Unable to resolve FH: nfs-test-1 : 00000000-0000-0000-0000-000000000000
[2012-03-12 08:16:50.509115] W [nfs3-helpers.c:3392:nfs3_log_common_res] 0-nfs-nfsv3: XID: e2e0a1d1, GETATTR: NFS: 22(Invalid argument for operation), POSIX: 14(Bad address)
[2012-03-12 08:16:50.510457] I [dht-layout.c:600:dht_layout_normalize] 0-nfs-test-1-dht: found anomalies in /user17/linux-3.2.9/Documentation/sound/alsa. holes=1 overlaps=0
[2012-03-12 08:16:50.510807] W [client3_1-fops.c:296:client3_1_mkdir_cbk] 0-nfs-test-1-client-1: remote operation failed: No such file or directory. Path: /user17/linux-3.2.9/Documentation/sound/alsa
[2012-03-12 08:16:50.510846] W [dht-selfheal.c:366:dht_selfheal_dir_mkdir_cbk] 0-nfs-test-1-dht: selfhealing directory /user17/linux-3.2.9/Documentation/sound/alsa failed: No such file or directory
[2012-03-12 08:16:50.600839] W [client3_1-fops.c:1189:client3_1_fstat_cbk] 0-nfs-test-1-client-1: remote operation failed: No such file or directory

Comment 1 Pranith Kumar K 2013-05-02 09:51:47 UTC
Followed the given steps and the mount is responsive. Could you please let me know if there is any other way to re-create this issue.

root - /mnt/vol 
15:16:45 :) ⚡ df -h
Filesystem          Size  Used Avail Use% Mounted on
rootfs               50G   17G   32G  35% /
devtmpfs            3.9G     0  3.9G   0% /dev
tmpfs               3.9G  960K  3.9G   1% /dev/shm
tmpfs               3.9G  2.2M  3.9G   1% /run
/dev/sda2            50G   17G   32G  35% /
tmpfs               3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs               3.9G     0  3.9G   0% /media
/dev/sda1           240G  211G   30G  88% /home
10.70.42.235:/vol/   30G   30G   64K 100% /mnt/vol <<---- is the mount

root - /mnt/vol 
15:16:50 :) ⚡ ls
1   11  13  15  17  19  20  22  24  26  28  3   4  6  8
10  12  14  16  18  2   21  23  25  27  29  30  5  7  9

root - /mnt/vol 
15:17:40 :) ⚡ ls
1   11  13  15  17  19  20  22  24  26  28  3   4  6  8
10  12  14  16  18  2   21  23  25  27  29  30  5  7  9

root - /mnt/vol 
15:18:31 :( ⚡ cp 12 13

root - /mnt/vol 
15:19:03 :) ⚡ cp 12 131
cp: closing `131': No space left on device

root - /mnt/vol 
15:19:17 :( ⚡ mkdir 123

root - /mnt/vol 
15:20:06 :) ⚡ 

root - /mnt/vol 
15:20:06 :) ⚡ ls
1   11  123  131  15  17  19  20  22  24  26  28  3   4  6  8  a
10  12  13   14   16  18  2   21  23  25  27  29  30  5  7  9

root - /mnt/vol 
15:22:25 :( ⚡ ls
1   11  123  131  15  17  19  20  22  24  26  28  3   4  6  8  a
10  12  13   14   16  18  2   21  23  25  27  29  30  5  7  9

root - /mnt/vol 
15:22:29 :) ⚡ ls
1   11  123  131  15  17  19  20  22  24  26  28  3   4  6  8  a
10  12  13   14   16  18  2   21  23  25  27  29  30  5  7  9

root - /mnt/vol 
15:23:11 :) ⚡ ps
  PID TTY          TIME CMD
20774 pts/1    00:00:00 sudo
20775 pts/1    00:00:00 su
20778 pts/1    00:00:00 bash
25801 pts/1    00:00:00 ps

Comment 2 Raghavendra G 2013-06-20 05:42:08 UTC
Seems like a duplicate of bug #851601. To confirm, can you please check the cpu usage?

Comment 3 Richard 2013-07-22 10:59:49 UTC
This seems to still be a problem with the latest production 3.4 release ;-(

When mounting gluster via Fuse and using cp to copy a 100MB file the cp command hangs when the volume is full, you can't kill (or kill -9) the cp command and the cpu usage is high for gluster:

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
 4259 root       20   0  377M  128M  2596 S  0.0 12.9  0:15.41 /usr/sbin/glusterfs --volfile-id=/md0 --volfile-server=127.0.0.1 /home/glusterfs

I also have an issue with NFS freezing too which lead me to test with FUSE too.

Rich

Comment 4 Kaleb KEITHLEY 2015-10-22 15:40:20 UTC
pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.

Comment 5 Jianwei Zhang 2017-11-08 10:26:20 UTC
I encountered the same problem

glusterfs(3.7.11)


[2017-11-01 16:41:35.792653] E [dht-helper.c:1102:dht_subvol_get_hashed] (-->/lib64/libglusterfs.so.0(default_lookup+0x5b) [0x7f20c3bdb74b] -->/usr/lib64/glusterfs/3.7xt.39/xlator/cluster/distribute.so(dht_lookup+0xa81) [0x7f20bb32a051] -->/usr/lib64/glusterfs/3.7xt.39/xlator/cluster/distribute.so(dht_subvol_get_hashed+0x1b8) [0x7f20bb2f23f8] ) 0-xtvol-dht: invalid argument: loc->parent [Invalid argument]
[2017-11-01 16:41:35.793214] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-4: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.793381] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-7: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.793359] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-6: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.793402] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-0: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.793485] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-10: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.793288] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-3: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.793466] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-9: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.793552] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-14: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.793576] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-11: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.793485] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-5: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.793640] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-13: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.793293] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-2: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.793687] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-15: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.793520] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-8: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.793724] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-1: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.793737] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-16: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.793781] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-12: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.793832] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-17: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.794104] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-18: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.794081] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-19: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument]
[2017-11-01 16:41:35.794202] E [MSGID: 112198] [nfs3-helpers.c:3695:nfs3_fh_resolve_inode_lookup_cbk] 0-nfs-nfsv3: Lookup failed: <gfid:00000000-0000-0000-0000-000000000000>: Invalid argument [Invalid argument]
[2017-11-01 16:41:35.794262] E [MSGID: 112069] [nfs3.c:816:nfs3_getattr_resume] 0-nfs-nfsv3: Invalid argument: (10.1.50.253:943) xtvol : 00000000-0000-0000-0000-000000000000
[2017-11-01 16:41:35.794295] W [MSGID: 112199] [nfs3-helpers.c:3418:nfs3_log_common_res] 0-nfs-nfsv3: <gfid:00000000-0000-0000-0000-000000000000> => (XID: 2e26ff2f, GETATTR: NFS: 22(Invalid argument for operation), POSIX: 14(Bad address)) [Resource temporarily unavailable]
[2017-11-01 16:41:37.890355] I [socket.c:1168:socket_event_poll_err] 0-socket.nfs-server: socket handler poll err notify RPC_TRANSPORT_DISCONNECT

Comment 6 Jianwei Zhang 2017-11-09 09:21:04 UTC
I have resolved this problem