Bug 1576411

Summary: when running a one liner recursive find with awks, gluster client gets unstable
Product: [Community] GlusterFS Reporter: avishay.aton
Component: fuseAssignee: Kotresh HR <khiremat>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: mainlineCC: bugs, jahernan
Target Milestone: ---Keywords: Performance
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-27 10:36:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description avishay.aton 2018-05-09 11:59:20 UTC
Description of problem: when running a one liner recursive find with awks, gluster client gets unstable


Version-Release number of selected component (if applicable): 12.6.1 and 12.9.1 on CentOS 7.4



How reproducible:


Steps to Reproduce:
1. mount gluster:/volume /mnt/gluster
2. cd /mnt/gluster/FOLDER
3. find . -type f -print0 | xargs -0 ls -l | awk '{ n=int(log($5)/log(2)); if (n<10) { n=10; } size[n]++ } END { for (i in size) printf("%d %d\n", 2^i, size[i]) }' | sort -n | awk 'function human(x) { x[1]/=1024; if (x[1]>=1024) { x[2]++; human(x) } } { a[1]=$1; a[2]=0; human(a); printf("%3d%s: %6d\n", a[1],substr("kMGTEPYZ",a[2]+1,1),$2) }'

4. it runs find for once or twice. but then we get the following errors:


ls: cannot access ./short/GF1/j1/xc_work/xc_make: Transport endpoint is not connected
ls: cannot access ./short/GF1/j1/xc_work/test23.log: Transport endpoint is not connected
ls: cannot access ./short/GF1/j1/xc_work/pbc.vlog: Transport endpoint is not connected
ls: cannot access ./short/GF1/j1/xc_work/hdl.var: Transport endpoint is not connected
ls: cannot access ./short/GF1/j1/xc_work/xc_top.xdb: Transport endpoint is not connected
ls: cannot access ./short/GF1/jl/xc_work/xc_top.ccf: Transport endpoint is not connected
ls: cannot access ./short/GF1/j1/xc_work/pdb.ucdb: Transport endpoint is not connected
ls: cannot access ./short/GF1/j1/1cl-1cpu/.design: Transport endpoint is not connected


then, any "ls" or "cd" gives the following :
 ls
ls: cannot open directory .: Transport endpoint is not connected


only umount and mount solves the problem, until we run the find command again




Actual results:


Expected results:(file size histogram)

 1k:     79
  2k:      7
  4k:   1402
  8k:   3552
 16k:   3534
 32k:   3973
 64k:   1648
128k:   1313
256k:    684
512k:    485
  1M:    195
  2M:     79
  4M:     25
  8M:     20
 16M:     27
 32M:     13
 64M:      6
128M:      1
256M:      5



Additional info:
gluster distributed-replicated (9x2) with 18 ec2 instances. lots of cpu, ram and EBS bandwidth.
typical workload : small files
volume optimized for small file:

performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
performance.parallel-readdir: on
performance.cache-size: 1024MB
performance.io-thread-count: 32
cluster.lookup-optimize: on
network.inode-lru-limit: 90000
performance.stat-prefetch: on
client.event-threads: 6
server.event-threads: 6
features.cache-invalidation: on
features.cache-invalidation-timeout: 900
performance.cache-invalidation: on
performance.md-cache-timeout: 600

Comment 1 Shyamsundar 2018-10-23 14:55:30 UTC
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.

Comment 2 Xavi Hernandez 2020-01-27 10:36:30 UTC
This bug seems like a crash in client code. Since this problem doesn't seem to happen on latest versions, I'll close this bug. Feel free to reopen it if the problem persists.