Description of problem: Mount point becomes unresponsive and processes on the mount point go into `D' state when the bricks become full. And takes a long time to bring back operations to normal. Version-Release number of selected component (if applicable): 3.3.0qa26 How reproducible: Always Steps to Reproduce: 1. 4 Node distribute, write data till the disk becomes full. The process writing data will fail with `No space left on device'. 2. Try operations like ls/ps/grep/rm ... the processes go into `D' state Actual results: Expected results: Additional info: [2012-03-10 09:17:31.452413] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now [2012-03-10 09:47:34.777302] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now [2012-03-10 10:00:46.921226] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now [2012-03-10 10:06:46.986000] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now [2012-03-10 10:31:59.259639] I [socket.c:1807:socket_event_handler] 0-transport: disconnecting now Some of the DHT errors that might help. Please find attached logs. ===== [2012-03-12 08:16:50.509023] I [dht-layout.c:600:dht_layout_normalize] 0-nfs-test-1-dht: found anomalies in <gfid:00000000-0000-0000-0000-000000000000>. holes=1 overlaps=0 [2012-03-12 08:16:50.509061] E [nfs3-helpers.c:3768:nfs3_fh_resolve_inode_lookup_cbk] 0-nfs-nfsv3: Lookup failed: <gfid:00000000-0000-0000-0000-000000000000>: Invalid argument [2012-03-12 08:16:50.509096] E [nfs3.c:739:nfs3_getattr_resume] 0-nfs-nfsv3: Unable to resolve FH: nfs-test-1 : 00000000-0000-0000-0000-000000000000 [2012-03-12 08:16:50.509115] W [nfs3-helpers.c:3392:nfs3_log_common_res] 0-nfs-nfsv3: XID: e2e0a1d1, GETATTR: NFS: 22(Invalid argument for operation), POSIX: 14(Bad address) [2012-03-12 08:16:50.510457] I [dht-layout.c:600:dht_layout_normalize] 0-nfs-test-1-dht: found anomalies in /user17/linux-3.2.9/Documentation/sound/alsa. holes=1 overlaps=0 [2012-03-12 08:16:50.510807] W [client3_1-fops.c:296:client3_1_mkdir_cbk] 0-nfs-test-1-client-1: remote operation failed: No such file or directory. Path: /user17/linux-3.2.9/Documentation/sound/alsa [2012-03-12 08:16:50.510846] W [dht-selfheal.c:366:dht_selfheal_dir_mkdir_cbk] 0-nfs-test-1-dht: selfhealing directory /user17/linux-3.2.9/Documentation/sound/alsa failed: No such file or directory [2012-03-12 08:16:50.600839] W [client3_1-fops.c:1189:client3_1_fstat_cbk] 0-nfs-test-1-client-1: remote operation failed: No such file or directory
Followed the given steps and the mount is responsive. Could you please let me know if there is any other way to re-create this issue. root - /mnt/vol 15:16:45 :) ⚡ df -h Filesystem Size Used Avail Use% Mounted on rootfs 50G 17G 32G 35% / devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 960K 3.9G 1% /dev/shm tmpfs 3.9G 2.2M 3.9G 1% /run /dev/sda2 50G 17G 32G 35% / tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup tmpfs 3.9G 0 3.9G 0% /media /dev/sda1 240G 211G 30G 88% /home 10.70.42.235:/vol/ 30G 30G 64K 100% /mnt/vol <<---- is the mount root - /mnt/vol 15:16:50 :) ⚡ ls 1 11 13 15 17 19 20 22 24 26 28 3 4 6 8 10 12 14 16 18 2 21 23 25 27 29 30 5 7 9 root - /mnt/vol 15:17:40 :) ⚡ ls 1 11 13 15 17 19 20 22 24 26 28 3 4 6 8 10 12 14 16 18 2 21 23 25 27 29 30 5 7 9 root - /mnt/vol 15:18:31 :( ⚡ cp 12 13 root - /mnt/vol 15:19:03 :) ⚡ cp 12 131 cp: closing `131': No space left on device root - /mnt/vol 15:19:17 :( ⚡ mkdir 123 root - /mnt/vol 15:20:06 :) ⚡ root - /mnt/vol 15:20:06 :) ⚡ ls 1 11 123 131 15 17 19 20 22 24 26 28 3 4 6 8 a 10 12 13 14 16 18 2 21 23 25 27 29 30 5 7 9 root - /mnt/vol 15:22:25 :( ⚡ ls 1 11 123 131 15 17 19 20 22 24 26 28 3 4 6 8 a 10 12 13 14 16 18 2 21 23 25 27 29 30 5 7 9 root - /mnt/vol 15:22:29 :) ⚡ ls 1 11 123 131 15 17 19 20 22 24 26 28 3 4 6 8 a 10 12 13 14 16 18 2 21 23 25 27 29 30 5 7 9 root - /mnt/vol 15:23:11 :) ⚡ ps PID TTY TIME CMD 20774 pts/1 00:00:00 sudo 20775 pts/1 00:00:00 su 20778 pts/1 00:00:00 bash 25801 pts/1 00:00:00 ps
Seems like a duplicate of bug #851601. To confirm, can you please check the cpu usage?
This seems to still be a problem with the latest production 3.4 release ;-( When mounting gluster via Fuse and using cp to copy a 100MB file the cp command hangs when the volume is full, you can't kill (or kill -9) the cp command and the cpu usage is high for gluster: PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 4259 root 20 0 377M 128M 2596 S 0.0 12.9 0:15.41 /usr/sbin/glusterfs --volfile-id=/md0 --volfile-server=127.0.0.1 /home/glusterfs I also have an issue with NFS freezing too which lead me to test with FUSE too. Rich
pre-release version is ambiguous and about to be removed as a choice. If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.
I encountered the same problem glusterfs(3.7.11) [2017-11-01 16:41:35.792653] E [dht-helper.c:1102:dht_subvol_get_hashed] (-->/lib64/libglusterfs.so.0(default_lookup+0x5b) [0x7f20c3bdb74b] -->/usr/lib64/glusterfs/3.7xt.39/xlator/cluster/distribute.so(dht_lookup+0xa81) [0x7f20bb32a051] -->/usr/lib64/glusterfs/3.7xt.39/xlator/cluster/distribute.so(dht_subvol_get_hashed+0x1b8) [0x7f20bb2f23f8] ) 0-xtvol-dht: invalid argument: loc->parent [Invalid argument] [2017-11-01 16:41:35.793214] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-4: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.793381] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-7: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.793359] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-6: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.793402] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-0: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.793485] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-10: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.793288] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-3: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.793466] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-9: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.793552] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-14: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.793576] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-11: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.793485] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-5: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.793640] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-13: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.793293] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-2: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.793687] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-15: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.793520] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-8: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.793724] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-1: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.793737] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-16: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.793781] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-12: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.793832] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-17: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.794104] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-18: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.794081] W [MSGID: 114031] [client-rpc-fops.c:3015:client3_3_lookup_cbk] 0-xtvol-client-19: remote operation failed: Invalid argument, conn->connected is 1, client conf->connected is 1,Path: <gfid:00000000-0000-0000-0000-000000000000> (00000000-0000-0000-0000-000000000000) [Invalid argument] [2017-11-01 16:41:35.794202] E [MSGID: 112198] [nfs3-helpers.c:3695:nfs3_fh_resolve_inode_lookup_cbk] 0-nfs-nfsv3: Lookup failed: <gfid:00000000-0000-0000-0000-000000000000>: Invalid argument [Invalid argument] [2017-11-01 16:41:35.794262] E [MSGID: 112069] [nfs3.c:816:nfs3_getattr_resume] 0-nfs-nfsv3: Invalid argument: (10.1.50.253:943) xtvol : 00000000-0000-0000-0000-000000000000 [2017-11-01 16:41:35.794295] W [MSGID: 112199] [nfs3-helpers.c:3418:nfs3_log_common_res] 0-nfs-nfsv3: <gfid:00000000-0000-0000-0000-000000000000> => (XID: 2e26ff2f, GETATTR: NFS: 22(Invalid argument for operation), POSIX: 14(Bad address)) [Resource temporarily unavailable] [2017-11-01 16:41:37.890355] I [socket.c:1168:socket_event_poll_err] 0-socket.nfs-server: socket handler poll err notify RPC_TRANSPORT_DISCONNECT
I have resolved this problem