Description of problem: when running a one liner recursive find with awks, gluster client gets unstable Version-Release number of selected component (if applicable): 12.6.1 and 12.9.1 on CentOS 7.4 How reproducible: Steps to Reproduce: 1. mount gluster:/volume /mnt/gluster 2. cd /mnt/gluster/FOLDER 3. find . -type f -print0 | xargs -0 ls -l | awk '{ n=int(log($5)/log(2)); if (n<10) { n=10; } size[n]++ } END { for (i in size) printf("%d %d\n", 2^i, size[i]) }' | sort -n | awk 'function human(x) { x[1]/=1024; if (x[1]>=1024) { x[2]++; human(x) } } { a[1]=$1; a[2]=0; human(a); printf("%3d%s: %6d\n", a[1],substr("kMGTEPYZ",a[2]+1,1),$2) }' 4. it runs find for once or twice. but then we get the following errors: ls: cannot access ./short/GF1/j1/xc_work/xc_make: Transport endpoint is not connected ls: cannot access ./short/GF1/j1/xc_work/test23.log: Transport endpoint is not connected ls: cannot access ./short/GF1/j1/xc_work/pbc.vlog: Transport endpoint is not connected ls: cannot access ./short/GF1/j1/xc_work/hdl.var: Transport endpoint is not connected ls: cannot access ./short/GF1/j1/xc_work/xc_top.xdb: Transport endpoint is not connected ls: cannot access ./short/GF1/jl/xc_work/xc_top.ccf: Transport endpoint is not connected ls: cannot access ./short/GF1/j1/xc_work/pdb.ucdb: Transport endpoint is not connected ls: cannot access ./short/GF1/j1/1cl-1cpu/.design: Transport endpoint is not connected then, any "ls" or "cd" gives the following : ls ls: cannot open directory .: Transport endpoint is not connected only umount and mount solves the problem, until we run the find command again Actual results: Expected results:(file size histogram) 1k: 79 2k: 7 4k: 1402 8k: 3552 16k: 3534 32k: 3973 64k: 1648 128k: 1313 256k: 684 512k: 485 1M: 195 2M: 79 4M: 25 8M: 20 16M: 27 32M: 13 64M: 6 128M: 1 256M: 5 Additional info: gluster distributed-replicated (9x2) with 18 ec2 instances. lots of cpu, ram and EBS bandwidth. typical workload : small files volume optimized for small file: performance.client-io-threads: on nfs.disable: on transport.address-family: inet performance.parallel-readdir: on performance.cache-size: 1024MB performance.io-thread-count: 32 cluster.lookup-optimize: on network.inode-lru-limit: 90000 performance.stat-prefetch: on client.event-threads: 6 server.event-threads: 6 features.cache-invalidation: on features.cache-invalidation-timeout: 900 performance.cache-invalidation: on performance.md-cache-timeout: 600
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.
This bug seems like a crash in client code. Since this problem doesn't seem to happen on latest versions, I'll close this bug. Feel free to reopen it if the problem persists.