Bug 1576411 - when running a one liner recursive find with awks, gluster client gets unstable
Summary: when running a one liner recursive find with awks, gluster client gets unstable
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: fuse
Version: mainline
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ---
Assignee: Kotresh HR
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-09 11:59 UTC by avishay.aton
Modified: 2020-01-27 10:36 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-27 10:36:30 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description avishay.aton 2018-05-09 11:59:20 UTC
Description of problem: when running a one liner recursive find with awks, gluster client gets unstable


Version-Release number of selected component (if applicable): 12.6.1 and 12.9.1 on CentOS 7.4



How reproducible:


Steps to Reproduce:
1. mount gluster:/volume /mnt/gluster
2. cd /mnt/gluster/FOLDER
3. find . -type f -print0 | xargs -0 ls -l | awk '{ n=int(log($5)/log(2)); if (n<10) { n=10; } size[n]++ } END { for (i in size) printf("%d %d\n", 2^i, size[i]) }' | sort -n | awk 'function human(x) { x[1]/=1024; if (x[1]>=1024) { x[2]++; human(x) } } { a[1]=$1; a[2]=0; human(a); printf("%3d%s: %6d\n", a[1],substr("kMGTEPYZ",a[2]+1,1),$2) }'

4. it runs find for once or twice. but then we get the following errors:


ls: cannot access ./short/GF1/j1/xc_work/xc_make: Transport endpoint is not connected
ls: cannot access ./short/GF1/j1/xc_work/test23.log: Transport endpoint is not connected
ls: cannot access ./short/GF1/j1/xc_work/pbc.vlog: Transport endpoint is not connected
ls: cannot access ./short/GF1/j1/xc_work/hdl.var: Transport endpoint is not connected
ls: cannot access ./short/GF1/j1/xc_work/xc_top.xdb: Transport endpoint is not connected
ls: cannot access ./short/GF1/jl/xc_work/xc_top.ccf: Transport endpoint is not connected
ls: cannot access ./short/GF1/j1/xc_work/pdb.ucdb: Transport endpoint is not connected
ls: cannot access ./short/GF1/j1/1cl-1cpu/.design: Transport endpoint is not connected


then, any "ls" or "cd" gives the following :
 ls
ls: cannot open directory .: Transport endpoint is not connected


only umount and mount solves the problem, until we run the find command again




Actual results:


Expected results:(file size histogram)

 1k:     79
  2k:      7
  4k:   1402
  8k:   3552
 16k:   3534
 32k:   3973
 64k:   1648
128k:   1313
256k:    684
512k:    485
  1M:    195
  2M:     79
  4M:     25
  8M:     20
 16M:     27
 32M:     13
 64M:      6
128M:      1
256M:      5



Additional info:
gluster distributed-replicated (9x2) with 18 ec2 instances. lots of cpu, ram and EBS bandwidth.
typical workload : small files
volume optimized for small file:

performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
performance.parallel-readdir: on
performance.cache-size: 1024MB
performance.io-thread-count: 32
cluster.lookup-optimize: on
network.inode-lru-limit: 90000
performance.stat-prefetch: on
client.event-threads: 6
server.event-threads: 6
features.cache-invalidation: on
features.cache-invalidation-timeout: 900
performance.cache-invalidation: on
performance.md-cache-timeout: 600

Comment 1 Shyamsundar 2018-10-23 14:55:30 UTC
Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.

Comment 2 Xavi Hernandez 2020-01-27 10:36:30 UTC
This bug seems like a crash in client code. Since this problem doesn't seem to happen on latest versions, I'll close this bug. Feel free to reopen it if the problem persists.


Note You need to log in before you can comment on or make changes to this bug.