Description of problem: Using glusterfs-3.3.1 (plus a few patches, see git://github.com/Supermathie/glusterfs.git branch release-3.3-oracle, latest commit is 3bebd52812906f9b48677779fa15f91341437cb8) my setup gets into a state such that GlusterFS no longer is able to respond to RPCs Version-Release number of selected component (if applicable): 3.3.1+ How reproducible: 100% Steps to Reproduce: 1. Run heavy workload against gluster 2. NFS stops responding: [2013-05-05 01:28:56.165044] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-gv0-replicate-5: no active sinks for performing self-heal on file /fleming4/db0/ALTUS_data/users04.dbf [2013-05-05 01:28:56.218572] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-gv0-replicate-4: no active sinks for performing self-heal on file /fleming2/db0/ALTUS_data/users01.dbf [2013-05-05 01:31:16.259641] W [socket.c:195:__socket_rwv] 0-socket.nfs-server: writev failed (Connection reset by peer) [2013-05-05 01:31:16.315405] I [socket.c:1798:socket_event_handler] 0-transport: disconnecting now [2013-05-05 01:31:16.421480] I [socket.c:2387:socket_submit_reply] 0-socket.nfs-server: not connected (priv->connected = 255) [2013-05-05 01:31:16.421507] E [rpcsvc.c:1080:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3705786983x, Program: NFS3, ProgVers: 3, Proc: 6) to rpc-transport (socket.nfs-server) [2013-05-05 01:31:16.421528] E [nfs3.c:627:nfs3svc_submit_vector_reply] 0-nfs-nfsv3: Reply submission failed [2013-05-05 01:31:16.966684] E [rpcsvc.c:1080:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3705788008x, Program: NFS3, ProgVers: 3, Proc: 6) to rpc-transport (socket.nfs-server) [2013-05-05 01:31:16.966725] E [nfs3.c:627:nfs3svc_submit_vector_reply] 0-nfs-nfsv3: Reply submission failed [2013-05-05 01:31:16.999766] E [rpcsvc.c:1080:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3705795067x, Program: NFS3, ProgVers: 3, Proc: 6) to rpc-transport (socket.nfs-server) [2013-05-05 01:31:16.999794] E [nfs3.c:627:nfs3svc_submit_vector_reply] 0-nfs-nfsv3: Reply submission failed [2013-05-05 01:31:17.191748] E [rpcsvc.c:1080:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3705795452x, Program: NFS3, ProgVers: 3, Proc: 6) to rpc-transport (socket.nfs-server) 3. Restart NFS (or all of GlusterFS itself) and the same thing happens Actual results: [2013-05-05 01:31:16.421507] E [rpcsvc.c:1080:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3705786983x, Program: NFS3, ProgVers: 3, Proc: 6) to rpc-transport (socket.nfs-server) [2013-05-05 01:31:16.421528] E [nfs3.c:627:nfs3svc_submit_vector_reply] 0-nfs-nfsv3: Reply submission failed Expected results: NFS works Additional info: The NFS daemon also gets rather large - I suspect it's storing up data for all these RPCs: PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 30674 root 20 0 37.6G 37.3G 2288 R 99.0 29.6 5:29.88 /usr/local/glusterfs/sbin/glusterfs
Same situation, with DEBUG on.SimilarSimilar [2013-05-06 11:53:56.971002] D [mem-pool.c:457:mem_get] (-->/usr/local/glusterfs/lib/glusterfs/3.3git/xlator/nfs/server.so(nfs3_read_fd_resume+0x86) [0x7f9ea1a29e66] (-->/usr/local/glusterfs/lib/glusterfs/3.3git/xlator/nfs/server.so(nfs_fop_read+0x86) [0x7f9ea1a11996] (-->/usr/local/glusterfs/lib/glusterfs/3.3git/xlator/nfs/server.so(nfs_fop_local_init+0x1b) [0x7f9ea1a112bb]))) 0-mem-pool: Mem pool is full. Callocing mem [2013-05-06 11:53:56.971052] D [mem-pool.c:457:mem_get] (-->/usr/local/glusterfs/lib/glusterfs/3.3git/xlator/nfs/server.so(nfs3_read_fd_resume+0x86) [0x7f9ea1a29e66] (-->/usr/local/glusterfs/lib/glusterfs/3.3git/xlator/nfs/server.so(nfs_fop_read+0xe1) [0x7f9ea1a119f1] (-->/usr/local/glusterfs/lib/libglusterfs.so.0(mem_get0+0x1b) [0x7f9ea6b05feb]))) 0-mem-pool: Mem pool is full. Callocing mem [2013-05-06 11:53:56.971094] D [mem-pool.c:457:mem_get] (-->/usr/local/glusterfs/lib/glusterfs/3.3git/xlator/nfs/server.so(nfs_fop_read+0x1cd) [0x7f9ea1a11add] (-->/usr/local/glusterfs/lib/glusterfs/3.3git/xlator/debug/io-stats.so(io_stats_readv+0x75) [0x7f9ea1c5b4d5] (-->/usr/local/glusterfs/lib/libglusterfs.so.0(mem_get0+0x1b) [0x7f9ea6b05feb]))) 0-mem-pool: Mem pool is full. Callocing mem [2013-05-06 11:53:56.971132] D [mem-pool.c:457:mem_get] (-->/usr/local/glusterfs/lib/glusterfs/3.3git/xlator/debug/io-stats.so(io_stats_readv+0x164) [0x7f9ea1c5b5c4] (-->/usr/local/glusterfs/lib/glusterfs/3.3git/xlator/performance/write-beh ind.so(wb_readv+0x1a2) [0x7f9ea1e77cf2] (-->/usr/local/glusterfs/lib/libglusterfs.so.0(mem_get0+0x1b)Similar [0x7f9ea6b05feb]))) 0-mem-pool: Mem pool is full. Callocing mem [2013-05-06 11:53:56.971188] D [mem-pool.c:457:mem_get] (-->/usr/local/glusterfs/lib/libglusterfs.so.0(fop_readv_stub+0x45) [0x7f9ea6afa565] (-->/usr/local/glusterfs/lib/libglusterfs.so.0(+0x312cd) [0x7f9ea6af72cd] (-->/usr/local/glusterfs/ lib/libglusterfs.so.0(mem_get0+0x1b) [0x7f9ea6b05feb]))) 0-mem-pool: Mem pool is full. Callocing mem [2013-05-06 11:53:56.973938] D [nfs3-helpers.c:3421:nfs3_log_read_res] 0-nfs-nfsv3: XID: 73f5c6e8, READ: NFS: 0(Call completed successfully.), POSIX: 0(Success), count: 8192, is_eof: 0, vector: count: 1, len: 8192 [2013-05-06 11:53:56.973970] E [rpcsvc.c:1080:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1945487080x, Program: NFS3, ProgVers: 3, Proc: 6) to rpc-transport (socket.nfs-server) [2013-05-06 11:53:56.973985] E [nfs3.c:627:nfs3svc_submit_vector_reply] 0-nfs-nfsv3: Reply submission failed [2013-05-06 11:53:56.974320] D [nfs3-helpers.c:3421:nfs3_log_read_res] 0-nfs-nfsv3: XID: 740b0400, READ: NFS: 0(Call completed successfully.), POSIX: 0(Success), count: 8192, is_eof: 0, vector: count: 1, len: 8192 [2013-05-06 11:53:56.975805] D [nfs3-helpers.c:3421:nfs3_log_read_res] 0-nfs-nfsv3: XID: 740adb6e, READ: NFS: 0(Call completed successfully.), POSIX: 0(Success), count: 8192, is_eof: 0, vector: count: 1, len: 8192 [2013-05-06 11:53:56.975830] E [rpcsvc.c:1080:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1946868590x, Program: NFS3, ProgVers: 3, Proc: 6) to rpc-transport (socket.nfs-server) [2013-05-06 11:53:56.975844] E [nfs3.c:627:nfs3svc_submit_vector_reply] 0-nfs-nfsv3: Reply submission failed [2013-05-06 11:53:56.975878] D [afr-lk-common.c:403:transaction_lk_op] 0-gv0-replicate-1: lk op is for a transaction [2013-05-06 11:53:56.975962] D [mem-pool.c:457:mem_get] (-->/usr/local/glusterfs/lib/libgfrpc.so.0(rpcsvc_notify+0x93) [0x7f9ea68b8463] (-->/usr/local/glusterfs/lib/libgfrpc.so.0(rpcsvc_handle_rpc_call+0xee) [0x7f9ea68b814e] (-->/usr/local/glusterfs/lib/libgfrpc.so.0(rpcsvc_request_create+0x79) [0x7f9ea68b7d39]))) 0-mem-pool: Mem pool is full. Callocing mem [2013-05-06 11:53:56.975990] D [nfs3-helpers.c:1757:nfs3_log_rw_call] 0-nfs-nfsv3: XID: 7436fcc8, READ: args: FH: exportid 52b830b4-f6b0-4103-b90b-ce665dfbafec, gfid f4aca6a7-9210-435c-b2ca-0008f0650c39, offset: 422511616, count: 65536 [2013-05-06 11:53:56.976036] D [mem-pool.c:457:mem_get] (-->/usr/local/glusterfs/lib/glusterfs/3.3git/xlator/nfs/server.so(nfs3svc_read+0x92) [0x7f9ea1a29cf2] (-->/usr/local/glusterfs/lib/glusterfs/3.3git/xlator/nfs/server.so(nfs3_read+0x31e) [0x7f9ea1a29aae] (-->/usr/local/glusterfs/lib/glusterfs/3.3git/xlator/nfs/server.so(nfs3_call_state_init+0x45) [0x7f9ea1a25435]))) 0-mem-pool: Mem pool is full. Callocing mem
Trace output: [2013-05-05 02:49:16.628001] D [mem-pool.c:457:mem_get] (-->/usr/local/glusterfs/lib/libglusterfs.so.0(fop_writev_stub+0x52) [0x7f1c05dc7d22] (-->/usr/local/glusterfs/lib/libglusterfs.so.0(+0x312cd) [0x7f1c05dbf2cd] (-->/usr/local/glusterfs/lib/libglusterfs.so.0(mem_get0+0x1b) [0x7f1c05dcdfeb]))) 0-mem-pool: Mem pool is full. Callocing mem [2013-05-05 02:49:16.628045] D [nfs3-helpers.c:3441:nfs3_log_write_res] 0-nfs-nfsv3: XID: 72786bb1, WRITE: NFS: 0(Call completed successfully.), POSIX: 0(Success), count: 8192, STABLE,wverf: 1367736515 [2013-05-05 02:49:16.628065] T [rpcsvc.c:1050:rpcsvc_submit_generic] 0-rpc-service: Tx message: 28 [2013-05-05 02:49:16.628079] T [rpcsvc.c:676:rpcsvc_record_build_header] 0-rpc-service: Reply fraglen 52, payload: 28, rpc hdr: 24 [2013-05-05 02:49:16.628107] T [rpcsvc.c:1087:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0x1920494513x, Program: NFS3, ProgVers: 3, Proc: 7) to rpc-transport (socket.nfs-server) [2013-05-05 02:49:16.628124] T [nfs3.c:447:nfs3_call_state_wipe] 0-nfs-nfsv3: fd 0x264f8dc ref: 321 [2013-05-05 02:49:16.628139] T [write-behind.c:456:wb_sync] 0-gv0-write-behind: no vectors are to besynced [2013-05-05 02:49:16.628169] T [rpc-clnt.c:685:rpc_clnt_reply_init] 0-gv0-client-8: received rpc message (RPC XID: 0x2495x Program: GlusterFS 3.1, ProgVers: 330, Proc: 12) from rpc-transport (gv0-client-8) [2013-05-05 02:49:16.629067] T [write-behind.c:456:wb_sync] 0-gv0-write-behind: no vectors are to besynced [2013-05-05 02:49:16.629091] D [nfs3-helpers.c:3421:nfs3_log_read_res] 0-nfs-nfsv3: XID: 73f42f15, READ: NFS: 0(Call completed successfully.), POSIX: 0(Success), count: 8192, is_eof: 0, vector: count: 1, len: 8192 [2013-05-05 02:49:16.629109] T [rpcsvc.c:1050:rpcsvc_submit_generic] 0-rpc-service: Tx message: 8296 [2013-05-05 02:49:16.629123] T [rpcsvc.c:676:rpcsvc_record_build_header] 0-rpc-service: Reply fraglen 8320, payload: 8296, rpc hdr: 24 [2013-05-05 02:49:16.629136] E [rpcsvc.c:1080:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1945382677x, Program: NFS3, ProgVers: 3, Proc: 6) to rpc-transport (socket.nfs-server) [2013-05-05 02:49:16.629150] E [nfs3.c:627:nfs3svc_submit_vector_reply] 0-nfs-nfsv3: Reply submission failed [2013-05-05 02:49:16.629162] T [nfs3.c:447:nfs3_call_state_wipe] 0-nfs-nfsv3: fd 0x264f7b0 ref: 44403
Looks like NFS is still responding to *some* requests: [root@fearless1 ~]# tshark -n -i eth2 -R "rpc" -c 800 Running as user "root" and group "root". This could be dangerous. Capturing on eth2 0.144428 192.168.10.4 -> 192.168.10.1 NFS V3 NULL Call 0.575769 192.168.10.1 -> 192.168.10.4 NFS V3 NULL Reply (Call In 3) 1.076045 192.168.10.6 -> 192.168.10.1 NFS V3 NULL Call 1.076527 192.168.10.3 -> 192.168.10.1 NFS V3 NULL Call 1.138787 192.168.10.1 -> 192.168.10.4 NFS [RPC duplicate of #17]V3 NULL Reply (Call In 3) 1.139004 192.168.10.6 -> 192.168.10.1 NFS V3 GETATTR Call, FH:0x821472b3 1.226347 192.168.10.6 -> 192.168.10.1 NFS [TCP ZeroWindow] V3 READ Call, FH:0xb45a3c21 Offset:1593188352 Len:8192 ; V 3 READ Call, FH:0xb45a3c21 Offset:1593196544 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1802338304 Len:8192 ; V3 RE AD Call, FH:0xb45a3c21 Offset:1593204736 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1593212928 Len:8192 ; V3 READ C all, FH:0xb45a3c21 Offset:1593221120 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1593229312 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1593237504 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1593245696 Len:8192 ; V3 READ Call, FH: 0xb45a3c21 Offset:1800503296 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1593253888 Len:8192 ; V3 READ Call, FH:0xb4 5a3c21 Offset:1917083648 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:2022531072 Len:8192 ; V3 READ Call, FH:0x10d045 d8 Offset:4241022976 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1918623744 Len:8192 ; V3 READ Call, FH:0xb45a3c21 O ffset:1591746560 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1918279680 Len:8192 ; V3 NULL Call ; V3 READ Call, FH: 0x10d045d8 Offset:4241055744 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1590812672 Len:8192 ; V3 READ Call, FH:0xb4 5a3c21 Offset:2022137856 Len:8192 ; V3 READ Call, FH:0x10d045d8 Offset:4241063936 Len:8192 ; V3 READ Call, FH:0xb45a3c 21 Offset:1917304832 Len:8192 ; V3 READ Call, FH:0x10d045d8 Offset:4241072128 Len:8192 ; V3 READ Call, FH:0xb45a3c21 O ffset:1591279616 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1591754752 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offse t:2022146048 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1591296000 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:15 90870016 Len:8192 ; V3 READ Call, FH:0x99b731c7 Offset:121511936 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1590181 888 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1918525440 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1590820864 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1917313024 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1590190080 Len: 8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1918763008 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1917100032 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1590198272 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1591762944 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1918844928 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1590206464 Len:8192 ; V3 R EAD Call, FH:0xb45a3c21 Offset:1918533632 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1590214656 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1918771200 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1591287808 Len:8192 ; V3 READ Call , FH:0xb45a3c21 Offset:1920507904 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:2022555648 Len:8192 ; V3 READ Call, FH :0xb45a3c21 Offset:1590878208 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1917108224 Len:8192 ; V3 READ Call, FH:0xb 45a3c21 Offset:1590222848 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1920516096 Len:8192 ; V3 READ Call, FH:0xb45a3 c21 Offset:1591304192 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1591984128 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1918853120 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1919655936 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offs et:1590231040 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1590370304 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:1 590902784 Len:8192 ; V3 READ Call, FH:0x99b731c7 Offset:121610240 Len:8192 ; V3 READ Call, FH:0xb45a3c21 Offset:159023 9232 Len:8192 1.438041 192.168.10.6 -> 192.168.10.1 RPC [TCP ZeroWindow] Continuation 1.681157 192.168.10.6 -> 192.168.10.1 NFS V3 WRITE Call, FH:0x86e03073 Offset:39110 Len:3030 UNSTABLE 1.722757 192.168.10.1 -> 192.168.10.6 NFS V3 NULL Reply (Call In 43) 1.722784 192.168.10.1 -> 192.168.10.3 NFS V3 NULL Reply (Call In 45) 2.817842 192.168.10.1 -> 192.168.10.6 NFS V3 WRITE Reply (Call In 82) Len:3030 UNSTABLE 2.817944 192.168.10.6 -> 192.168.10.1 NFS V3 WRITE Call, FH:0x7444543b Offset:31672 Len:3001 UNSTABLE 3.318538 192.168.10.1 -> 192.168.10.6 NFS V3 GETATTR Reply (Call In 56) Regular File mode:0640 uid:500 gid:1000 3.318613 192.168.10.6 -> 192.168.10.1 NFS V3 WRITE Call, FH:0xcbdf97b5 Offset:1618 Len:90 UNSTABLE 3.850941 192.168.10.1 -> 192.168.10.6 NFS V3 WRITE Reply (Call In 111) Len:3001 UNSTABLE
I'm starting to suspect that Gluster is unable to properly handle and respond to RPC packets with a large number of calls bundled together. Notice the packet at t=1.226347 above - there's around 50 READ calls all bundled together, none of which seem to get responded to. Is it possible that constructing the reply/replies is overwhelming Gluster somehow?
GlusterFS is definitely getting into this state (not responding to bulk READ calls but still responding to other calls, glusterfs/nfs server at 99% CPU) *before* the errors start appearing in the log.
(there's probably going to be a TON of wrapped lines, so you can see the unwrapped version at http://paste.fedoraproject.org/11223/80643136/) OK, after taking a much more serious look at this, I've got: * a full packet capture of a situation that causes this problem (358MB compressed) * logs from gluster, truncated before starting, at TRACE level (88MB compressed) * a corefile from the gluster/nfs process generated near the end of the test (10GB, compressible to 133MB or so) My configuration: Volume Name: gv0 Type: Distributed-Replicate Volume ID: 52b830b4-f6b0-4103-b90b-ce665dfbafec Status: Stopped Number of Bricks: 8 x 2 = 16 Transport-type: tcp Bricks: Brick1: fearless1:/export/bricks/500117310007a6d8/glusterdata Brick2: fearless2:/export/bricks/500117310007a674/glusterdata Brick3: fearless1:/export/bricks/500117310007a714/glusterdata Brick4: fearless2:/export/bricks/500117310007a684/glusterdata Brick5: fearless1:/export/bricks/500117310007a7dc/glusterdata Brick6: fearless2:/export/bricks/500117310007a694/glusterdata Brick7: fearless1:/export/bricks/500117310007a7e4/glusterdata Brick8: fearless2:/export/bricks/500117310007a720/glusterdata Brick9: fearless1:/export/bricks/500117310007a7ec/glusterdata Brick10: fearless2:/export/bricks/500117310007a74c/glusterdata Brick11: fearless1:/export/bricks/500117310007a838/glusterdata Brick12: fearless2:/export/bricks/500117310007a814/glusterdata Brick13: fearless1:/export/bricks/500117310007a850/glusterdata Brick14: fearless2:/export/bricks/500117310007a84c/glusterdata Brick15: fearless1:/export/bricks/500117310007a858/glusterdata Brick16: fearless2:/export/bricks/500117310007a8f8/glusterdata Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on performance.nfs.io-threads: on performance.client-io-threads: on performance.io-cache: on performance.write-behind-window-size: 4MB performance.nfs.write-behind: on nfs.ports-insecure: on diagnostics.client-log-level: INFO Notable "problems": RPC context pointer, requested by JoeJulian: (gdb) print *this $22 = {ops = 0x7f05d50d2580, listener = 0x0, private = 0x27b6d90, xl_private = 0x0, xl = 0x261ffb0, mydata = 0x27a70d0, lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, refcount = 2, ctx = 0x25dd010, options = 0x7f05d70ba3c8, name = 0x26215e0 "gv0-client-9", dnscache = 0x28fda50, buf = 0x0, init = 0x7f05d4ec5650 <init>, fini = 0x7f05d4ec6780 <fini>, reconfigure = 0x7f05d4ec56d0 <reconfigure>, notify = 0x7f05d83f2bd0 <rpc_clnt_notify>, notify_data = 0x0, peerinfo = {sockaddr = {ss_family = 2, __ss_align = 0, __ss_padding = '\000' <repeats 111 times>}, sockaddr_len = 16, identifier = "192.168.11.2:24037", '\000' <repeats 89 times>}, myinfo = {sockaddr = {ss_family = 2, __ss_align = 0, __ss_padding = '\000' <repeats 111 times>}, sockaddr_len = 16, identifier = "192.168.11.1:933", '\000' <repeats 91 times>}, total_bytes_read = 134672272, total_bytes_write = 3678711, list = {next = 0x0, prev = 0x0}, bind_insecure = 0} 40 million oustanding frames at this point: (gdb) print ((call_pool_t)this->ctx->pool) $23 = {{all_frames = {next = 0x25f96c0, prev = 0x25f9620}, all_stacks = {next_call = 0x25f96c0, prev_call = 0x25f9620}}, cnt = 39705952, lock = 0, frame_mem_pool = 0x0, stack_mem_pool = 0x0} gluster/nfs gets into a state (very quickly) where it's spending all of it's time spinning somewhere: PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 26240 root 20 0 4287M 2062M 2736 R 1687 1.6 21:53.95 /usr/local/glusterfs/sbin/glusterfs -s localhost --volfile-id gluster/nfs 26656 root 20 0 4287M 2062M 2736 R 99.0 1.6 1:06.76 /usr/local/glusterfs/sbin/glusterfs -s localhost --volfile-id gluster/nfs 26649 root 20 0 4287M 2062M 2736 R 99.0 1.6 1:07.32 /usr/local/glusterfs/sbin/glusterfs -s localhost --volfile-id gluster/nfs 26684 root 20 0 4287M 2062M 2736 R 99.0 1.6 1:08.14 /usr/local/glusterfs/sbin/glusterfs -s localhost --volfile-id gluster/nfs 26682 root 20 0 4287M 2062M 2736 R 99.0 1.6 1:10.35 /usr/local/glusterfs/sbin/glusterfs -s localhost --volfile-id gluster/nfs 26678 root 20 0 4287M 2062M 2736 R 99.0 1.6 1:07.98 /usr/local/glusterfs/sbin/glusterfs -s localhost --volfile-id gluster/nfs 26675 root 20 0 4287M 2062M 2736 R 99.0 1.6 1:07.45 /usr/local/glusterfs/sbin/glusterfs -s localhost --volfile-id gluster/nfs 26679 root 20 0 4287M 2062M 2736 R 99.0 1.6 1:10.32 /usr/local/glusterfs/sbin/glusterfs -s localhost --volfile-id gluster/nfs 26676 root 20 0 4287M 2062M 2736 R 99.0 1.6 1:10.89 /usr/local/glusterfs/sbin/glusterfs -s localhost --volfile-id gluster/nfs 26681 root 20 0 4287M 2062M 2736 R 99.0 1.6 1:06.87 /usr/local/glusterfs/sbin/glusterfs -s localhost --volfile-id gluster/nfs 26677 root 20 0 4287M 2062M 2736 R 99.0 1.6 1:06.87 /usr/local/glusterfs/sbin/glusterfs -s localhost --volfile-id gluster/nfs 26672 root 20 0 4287M 2062M 2736 R 99.0 1.6 1:11.07 /usr/local/glusterfs/sbin/glusterfs -s localhost --volfile-id gluster/nfs 26674 root 20 0 4287M 2062M 2736 R 99.0 1.6 1:07.54 /usr/local/glusterfs/sbin/glusterfs -s localhost --volfile-id gluster/nfs 26680 root 20 0 4287M 2062M 2736 R 99.0 1.6 1:10.12 /usr/local/glusterfs/sbin/glusterfs -s localhost --volfile-id gluster/nfs 26280 root 20 0 4287M 2062M 2736 R 99.0 1.6 1:10.37 /usr/local/glusterfs/sbin/glusterfs -s localhost --volfile-id gluster/nfs 26683 root 20 0 4287M 2062M 2736 R 99.0 1.6 1:08.90 /usr/local/glusterfs/sbin/glusterfs -s localhost --volfile-id gluster/nfs 26652 root 20 0 4287M 2062M 2736 R 98.0 1.6 1:09.92 /usr/local/glusterfs/sbin/glusterfs -s localhost --volfile-id gluster/nfs 6580 michael 20 0 111M 3112 1232 R 3.0 0.0 6:56.91 htop 1611 root 20 0 0 0 0 S 0.0 0.0 0:25.56 fio-wq/0 1647 root 20 0 0 0 0 S 0.0 0.0 0:21.33 fioa-scan 26228 root 20 0 1386M 57112 2172 S 0.0 0.0 0:09.38 /usr/local/glusterfs/sbin/glusterfsd -s localhost --volfile-id gv0.fearles 26267 root 20 0 1386M 57112 2172 S 0.0 0.0 0:01.46 /usr/local/glusterfs/sbin/glusterfsd -s localhost --volfile-id gv0.fearles 26646 root 20 0 1386M 57112 2172 S 0.0 0.0 0:00.59 /usr/local/glusterfs/sbin/glusterfsd -s localhost --volfile-id gv0.fearles 26671 root 20 0 1386M 57112 2172 S 0.0 0.0 0:00.57 /usr/local/glusterfs/sbin/glusterfsd -s localhost --volfile-id gv0.fearles 26685 root 20 0 1386M 57112 2172 S 0.0 0.0 0:00.46 /usr/local/glusterfs/sbin/glusterfsd -s localhost --volfile-id gv0.fearles 26686 root 20 0 1386M 57112 2172 S 0.0 0.0 0:00.25 /usr/local/glusterfs/sbin/glusterfsd -s localhost --volfile-id gv0.fearles yet the disks are idle: avg-cpu: %user %nice %system %iowait %steal %idle 18.41 0.00 0.26 0.00 0.00 81.33 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sdf 0.00 0.00 0.00 0.67 0.00 6.17 18.50 0.00 2.00 2.00 0.13 sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda 0.00 0.00 0.00 0.33 0.00 1.33 8.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd 0.00 0.00 0.00 0.67 0.00 6.17 18.50 0.00 2.00 2.00 0.13 sdi 0.00 0.00 0.00 0.33 0.00 1.33 8.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 fioa 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 and volume profile says that operations aren't being responded to: NFS Server : localhost ---------------------- Cumulative Stats: Block Size: 2b+ 8b+ 16b+ No. of Reads: 0 0 0 No. of Writes: 2 6 18 Block Size: 32b+ 64b+ 128b+ No. of Reads: 0 0 0 No. of Writes: 85 30 19 Block Size: 256b+ 512b+ 1024b+ No. of Reads: 0 23 18 No. of Writes: 81 28 54 Block Size: 2048b+ 4096b+ 8192b+ No. of Reads: 6 28 385676 No. of Writes: 21 16 7 Block Size: 16384b+ 32768b+ 65536b+ No. of Reads: 289 135 31300 No. of Writes: 235 0 0 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 1 FORGET 0.00 0.00 us 0.00 us 0.00 us 81 RELEASE 0.00 671.00 us 671.00 us 671.00 us 1 UNLINK 0.00 151.88 us 139.00 us 166.00 us 8 OPEN 0.00 1622.00 us 1622.00 us 1622.00 us 1 RENAME 0.00 411.88 us 345.00 us 502.00 us 8 FSTAT 0.00 1515.50 us 549.00 us 3835.00 us 8 READDIRP 0.00 488.23 us 155.00 us 4467.00 us 30 LK 0.00 832.00 us 211.00 us 3206.00 us 119 SETATTR 0.00 167.32 us 115.00 us 5326.00 us 677 STATFS 0.00 11469.73 us 464.00 us 672289.00 us 74 CREATE 0.00 3335.09 us 60.00 us 3806186.00 us 1200 ACCESS 0.00 130144.47 us 235.00 us 26674232.00 us 256 LOOKUP 0.00 223278.53 us 5.00 us 5554879.00 us 602 WRITE 0.00 582000.45 us 51.00 us 75297085.00 us 278 FLUSH 0.01 129560.57 us 65.00 us 70316547.00 us 3446 STAT 99.98 8454338.20 us 99.00 us 115469515.00 us 417475 READ Duration: 6021 seconds Data Read: 5221816212 bytes Data Written: 4168883 bytes glusterfs really is taking a long time to respond: rpc.xid == 0x8879128a [2013-05-08 21:16:00.564817] T [rpcsvc.c:382:rpcsvc_request_create] 0-rpc-service: received rpc-message (XID: 0x8879128a, Ver: 2, Program: 100003, ProgVers: 3, Proc: 6) from rpc-transport (socket.nfs-server) [2013-05-08 21:16:00.564860] D [nfs3-helpers.c:1757:nfs3_log_rw_call] 0-nfs-nfsv3: XID: 8879128a, READ: args: FH: exportid 52b830b4-f6b0-4103-b90b-ce665dfbafec, gfid 77c67b0a-57a1-4b56-a6dd-15899529facd, offset: 7456178176, count: 8192 [2013-05-08 21:16:03.161708] T [rpcsvc.c:382:rpcsvc_request_create] 0-rpc-service: received rpc-message (XID: 0x8879128a, Ver: 2, Program: 100003, ProgVers: 3, Proc: 6) from rpc-transport (socket.nfs-server) [2013-05-08 21:16:03.161784] D [nfs3-helpers.c:1757:nfs3_log_rw_call] 0-nfs-nfsv3: XID: 8879128a, READ: args: FH: exportid 52b830b4-f6b0-4103-b90b-ce665dfbafec, gfid 77c67b0a-57a1-4b56-a6dd-15899529facd, offset: 7456178176, count: 8192 [2013-05-08 21:16:13.940617] T [rpcsvc.c:382:rpcsvc_request_create] 0-rpc-service: received rpc-message (XID: 0x8879128a, Ver: 2, Program: 100003, ProgVers: 3, Proc: 6) from rpc-transport (socket.nfs-server) [2013-05-08 21:16:13.940662] D [nfs3-helpers.c:1757:nfs3_log_rw_call] 0-nfs-nfsv3: XID: 8879128a, READ: args: FH: exportid 52b830b4-f6b0-4103-b90b-ce665dfbafec, gfid 77c67b0a-57a1-4b56-a6dd-15899529facd, offset: 7456178176, count: 8192 [2013-05-08 21:16:41.028161] T [rpcsvc.c:382:rpcsvc_request_create] 0-rpc-service: received rpc-message (XID: 0x8879128a, Ver: 2, Program: 100003, ProgVers: 3, Proc: 6) from rpc-transport (socket.nfs-server) [2013-05-08 21:16:41.028205] D [nfs3-helpers.c:1757:nfs3_log_rw_call] 0-nfs-nfsv3: XID: 8879128a, READ: args: FH: exportid 52b830b4-f6b0-4103-b90b-ce665dfbafec, gfid 77c67b0a-57a1-4b56-a6dd-15899529facd, offset: 7456178176, count: 8192 [2013-05-08 21:17:03.172976] D [nfs3-helpers.c:3421:nfs3_log_read_res] 0-nfs-nfsv3: XID: 8879128a, READ: NFS: 0(Call completed successfully.), POSIX: 0(Success), count: 8192, is_eof: 0, vector: count: 1, len: 8192 [2013-05-08 21:17:03.173042] T [rpcsvc.c:1087:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0x8879128a, Program: NFS3, ProgVers: 3, Proc: 6) to rpc-transport (socket.nfs-server) [2013-05-08 21:17:40.996580] D [nfs3-helpers.c:3421:nfs3_log_read_res] 0-nfs-nfsv3: XID: 8879128a, READ: NFS: 0(Call completed successfully.), POSIX: 0(Success), count: 8192, is_eof: 0, vector: count: 1, len: 8192 [2013-05-08 21:17:40.996628] T [rpcsvc.c:1087:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0x8879128a, Program: NFS3, ProgVers: 3, Proc: 6) to rpc-transport (socket.nfs-server) [2013-05-08 21:17:44.372915] T [rpcsvc.c:382:rpcsvc_request_create] 0-rpc-service: received rpc-message (XID: 0x8879128a, Ver: 2, Program: 100003, ProgVers: 3, Proc: 6) from rpc-transport (socket.nfs-server) [2013-05-08 21:17:44.372956] D [nfs3-helpers.c:1757:nfs3_log_rw_call] 0-nfs-nfsv3: XID: 8879128a, READ: args: FH: exportid 52b830b4-f6b0-4103-b90b-ce665dfbafec, gfid 77c67b0a-57a1-4b56-a6dd-15899529facd, offset: 7456178176, count: 8192 [2013-05-08 21:18:12.439583] T [rpcsvc.c:382:rpcsvc_request_create] 0-rpc-service: received rpc-message (XID: 0x8879128a, Ver: 2, Program: 100003, ProgVers: 3, Proc: 6) from rpc-transport (socket.nfs-server) [2013-05-08 21:18:12.439628] D [nfs3-helpers.c:1757:nfs3_log_rw_call] 0-nfs-nfsv3: XID: 8879128a, READ: args: FH: exportid 52b830b4-f6b0-4103-b90b-ce665dfbafec, gfid 77c67b0a-57a1-4b56-a6dd-15899529facd, offset: 7456178176, count: 8192 [2013-05-08 21:19:29.949901] D [nfs3-helpers.c:3421:nfs3_log_read_res] 0-nfs-nfsv3: XID: 8879128a, READ: NFS: 0(Call completed successfully.), POSIX: 0(Success), count: 8192, is_eof: 0, vector: count: 1, len: 8192 [2013-05-08 21:19:29.949953] T [rpcsvc.c:1087:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0x8879128a, Program: NFS3, ProgVers: 3, Proc: 6) to rpc-transport (socket.nfs-server) [2013-05-08 21:23:02.353017] D [nfs3-helpers.c:3421:nfs3_log_read_res] 0-nfs-nfsv3: XID: 8879128a, READ: NFS: 0(Call completed successfully.), POSIX: 0(Success), count: 8192, is_eof: 0, vector: count: 1, len: 8192 [2013-05-08 21:23:02.353066] T [rpcsvc.c:1087:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0x8879128a, Program: NFS3, ProgVers: 3, Proc: 6) to rpc-transport (socket.nfs-server)
I have a coredump, binaries, pcap and gluster trace available for download... I don't mind passing it around the "gluster devs" but I don't want it public.
This situation also gets files on disk into a bad state. Looks like some writew made it *ONLY TO ONE REPLICA*. I have many files reported as needing healing, but gluster doesn't detect them at split-brained. Looking at the files on disk: /fleming4/db0/ALTUS_config/diagnostics/diag/rdbms/altus/ALTUS/trace/ALTUS_ora_17473.trc FILE=/export/bricks/5*/glusterdata/fleming4/db0/ALTUS_config/diagnostics/diag/rdbms/altus/ALTUS/trace/ALTUS_ora_17473.trc [root@fearless1 michael]# ls -al $FILE -rw-r-----. 2 500 1000 8409 May 6 12:02 /export/bricks/500117310007a858/glusterdata/fleming4/db0/ALTUS_config/diagnostics/diag/rdbms/altus/ALTUS/trace/ALTUS_ora_17473.trc [root@fearless2 michael]# ls -al $FILE -rw-r-----. 2 500 1000 8409 May 6 12:02 /export/bricks/500117310007a8f8/glusterdata/fleming4/db0/ALTUS_config/diagnostics/diag/rdbms/altus/ALTUS/trace/ALTUS_ora_17473.trc [root@fearless1 michael]# stat $FILE File: `/export/bricks/500117310007a858/glusterdata/fleming4/db0/ALTUS_config/diagnostics/diag/rdbms/altus/ALTUS/trace/ALTUS_ora_17473.trc' Size: 8409 Blocks: 32 IO Block: 4096 regular file Device: 801h/2049d Inode: 6543 Links: 2 Access: (0640/-rw-r-----) Uid: ( 500/ UNKNOWN) Gid: ( 1000/ UNKNOWN) Access: 2013-05-14 10:16:59.213714324 -0400 Modify: 2013-05-06 12:02:39.010328111 -0400 Change: 2013-05-09 13:22:13.544569304 -0400 [root@fearless2 michael]# stat $FILE File: `/export/bricks/500117310007a8f8/glusterdata/fleming4/db0/ALTUS_config/diagnostics/diag/rdbms/altus/ALTUS/trace/ALTUS_ora_17473.trc' Size: 8409 Blocks: 32 IO Block: 4096 regular file Device: 861h/2145d Inode: 6607 Links: 2 Access: (0640/-rw-r-----) Uid: ( 500/ UNKNOWN) Gid: ( 1000/ UNKNOWN) Access: 2013-05-14 10:16:59.216179634 -0400 Modify: 2013-05-06 12:02:39.014695431 -0400 Change: 2013-05-09 13:22:13.160553133 -0400 [root@fearless1 michael]# getfattr -m - -d -e hex $FILE # file: export/bricks/500117310007a858/glusterdata/fleming4/db0/ALTUS_config/diagnostics/diag/rdbms/altus/ALTUS/trace/ALTUS_ora_17473.trc security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv0-client-14=0x000000000000000000000000 trusted.afr.gv0-client-15=0x000000000000000000000000 trusted.gfid=0x0ec8adb2caa447e59b6e4b845bc414a2 [root@fearless2 michael]# getfattr -m - -d -e hex $FILE # file: export/bricks/500117310007a8f8/glusterdata/fleming4/db0/ALTUS_config/diagnostics/diag/rdbms/altus/ALTUS/trace/ALTUS_ora_17473.trc security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv0-client-14=0x000000000000000000000000 trusted.afr.gv0-client-15=0x000000000000000000000000 trusted.gfid=0x0ec8adb2caa447e59b6e4b845bc414a2 [root@fearless1 michael]# md5sum $FILE b100f12f152f51a3b7594b4b1f465b77 /export/bricks/500117310007a858/glusterdata/fleming4/db0/ALTUS_config/diagnostics/diag/rdbms/altus/ALTUS/trace/ALTUS_ora_17473.trc [root@fearless2 michael]# md5sum $FILE ab8eeccb15a9b26faeb44fba73903d00 /export/bricks/500117310007a8f8/glusterdata/fleming4/db0/ALTUS_config/diagnostics/diag/rdbms/altus/ALTUS/trace/ALTUS_ora_17473.trc Ah! There's the difference. fearless1: … *** 2013-05-05 02:08:38.152 Recovery of Online Redo Log: Thread 1 Group 15 Seq 804 Reading mem 0 *** 2013-05-05 02:09:00.409 Recovery of Online Redo Log: Thread 1 Group 16 Seq 805 Reading mem 0 *** 2013-05-05 02:09:20.836 [1367734160836420] kgnfs_check_for_timedout_requests:5463: assert ch->pops_kgnfschnl at kgnfs.c:kgnfs_check_for_timedout_requests:5463 [1367735871359808] kgnfs_check_for_timedout_requests:5463: assert ch->pops_kgnfschnl at kgnfs.c:kgnfs_check_for_timedout_requests:5463 … fearless2: … *** 2013-05-05 02:08:38.152 Recovery of Online Redo Log: Thread 1 Group 15 Seq 804 Reading mem 0 ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ [1367735871359808] kgnfs_check_for_timedout_requests:5463: assert ch->pops_kgnfschnl at kgnfs.c:kgnfs_check_for_timedout_requests:5463 … With another file (/fleming1/db0/ALTUS_config/diagnostics/diag/rdbms/altus/ALTUS/trace/ALTUS_p020_6803.trm), however, all the tests above show the files as identical. Yet according to 'volume heal gv0 info' it still requires healing and a 'volume heal gv0 full' doesn't process it.
We are experiencing the exact same issue. Gluster under high load (we use it to store Virtual Machine Images) locks up. nfs.log reports the following error: [2013-05-20 19:53:04.491580] E [nfs3.c:627:nfs3svc_submit_vector_reply] 0-nfs-nfsv3: Reply submission failed [2013-05-20 19:54:53.004965] I [socket.c:2378:socket_submit_reply] 0-socket.nfs-server: not connected (priv->connected = 255) [2013-05-20 19:54:53.005037] E [rpcsvc.c:1080:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x3192027547x, Program: NFS3, ProgVers: 3, Proc: 6) to rpc-transport (socket.nfs-server) Machine is a Dual Core Xeon X5560 with 96GB RAM, and works fine under normal loads, but once CPU hits 90%+ we see the issue pop up. Interestingly we note other gluster volumes continue to function normally whilst the high load volume simply stalls.
The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug. If there has been no update before 9 December 2014, this bug will get automatocally closed.