Created attachment 1130004 [details] client side log Description of problem: We are using GlusterFS 3.5.5. The server-end is deployed on a 26-node cluster. Each node has one brick. The client-end is a 32-node cluster (including the 26 server node) which runs distributed video transcoding. GFS is the file share between the 32 servers, mounted with FUSE. We found that when workload is high, client often hangs on file operations on gfs. The client log indicates that the client losts ping from the server and leads to a bunch of "Transport point is not connected" in the log. Version-Release number of selected component (if applicable): 3.5.5 How reproducible: Like dozens of times per hour Steps to Reproduce: 1. 2. 3. Actual results: All file operations runs correctly Expected results: No hang Additional info: OS: debian 8.2 Kernel: 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04) x86_64 GNU/Linux The TCP ping during the hang period is working correctly. Our volume info: Volume Name: hzsq_encode_02 Type: Distributed-Replicate Volume ID: 653b554b-47aa-4f25-a102-7ac6858f41e1 Status: Started Number of Bricks: 13 x 2 = 26 Transport-type: tcp Bricks: Brick1: hzsq-encode-33:/data/gfs-brk Brick2: hzsq-encode-34:/data/gfs-brk Brick3: hzsq-encode-41:/data/gfs-brk Brick4: hzsq-encode-42:/data/gfs-brk Brick5: hzsq-encode-43:/data/gfs-brk Brick6: hzsq-encode-44:/data/gfs-brk Brick7: hzsq-encode-45:/data/gfs-brk Brick8: hzsq-encode-46:/data/gfs-brk Brick9: hzsq-encode-47:/data/gfs-brk Brick10: hzsq-encode-48:/data/gfs-brk Brick11: hzsq-encode-49:/data/gfs-brk Brick12: hzsq-encode-50:/data/gfs-brk Brick13: hzsq-encode-51:/data/gfs-brk Brick14: hzsq-encode-52:/data/gfs-brk Brick15: hzsq-encode-53:/data/gfs-brk Brick16: hzsq-encode-54:/data/gfs-brk Brick17: hzsq-encode-55:/data/gfs-brk Brick18: hzsq-encode-56:/data/gfs-brk Brick19: hzsq-encode-57:/data/gfs-brk Brick20: hzsq-encode-58:/data/gfs-brk Brick21: hzsq-encode-59:/data/gfs-brk Brick22: hzsq-encode-60:/data/gfs-brk Brick23: hzsq-encode-61:/data/gfs-brk Brick24: hzsq-encode-62:/data/gfs-brk Brick25: hzsq-encode-63:/data/gfs-brk Brick26: hzsq-encode-64:/data/gfs-brk Options Reconfigured: nfs.disable: On performance.io-thread-count: 32 performance.cache-refresh-timeout: 1 performance.write-behind-window-size: 1MB performance.cache-size: 128MB performance.flush-behind: On server.outstanding-rpc-limit: 0 performance.read-ahead: On performance.io-cache: On performance.quick-read: off nfs.outstanding-rpc-limit: 0 network.ping-timeout: 20 server.statedump-path: /tmp
This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.