Bug 1594342
Summary: | growth of client connections when writing of a file fails on unavailable free space on a brick, file remains on the brick even after removal from the volume | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Martin Bukatovic <mbukatov> |
Component: | glusterfs | Assignee: | Amar Tumballi <atumball> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Bala Konda Reddy M <bmekala> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.4 | CC: | amukherj, ravishankar, rhinduja, rhs-bugs, sankarshan, shtripat, vbellur |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-09-10 07:51:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Martin Bukatovic
2018-06-22 16:51:05 UTC
Created attachment 1453784 [details]
screenshot 1: Default dashboard with "Connection" chart
Created attachment 1453785 [details]
screenshot 2: Connection chart (when running over night)
Screenshot 2 notes: the growth stops there when the file is removed. Created attachment 1453787 [details]
tarball with the output files from gluster commans
Thank you, Martin! I observe a lot of client connections coming in from 10.37.169.62. Is 10.37.169.62 a client machine? Would it be possible to attach sosreport from that machine? 10.37.169.62 is storage machine mbukatov-usm1-gl3, which seems to host arbiter brick for the volume: ``` [root@mbukatov-usm1-gl1 ~]# gluster volume info volume_beta_arbiter_2_plus_1x2 | grep gl3 Brick3: mbukatov-usm1-gl3:/mnt/brick_beta_arbiter_1/1 (arbiter) Brick9: mbukatov-usm1-gl3:/mnt/brick_beta_arbiter_2/2 (arbiter) Brick15: mbukatov-usm1-gl3:/mnt/brick_beta_arbiter_3/3 (arbiter) ``` Would it be possible to collect sosreport from the arbiter node? Thx! Created attachment 1453838 [details]
screenshot 3: host dashboard for gl3 machine
While I'm waiting for the sosreport to finish, I noticed high memory usage
on that node, which is likely caused by lots of glfsheal processes:
```
# ps aux | grep glfsheal | wc -l
121
```
The memory spike clearly correlates with the failed upload of the big file,
as can be seen on the screenshot 3 from WA host dashboard attached.
@Ravi, can you please check and comment as no of glfsheal processes is going up in this case. glfsheal is a binary that is launched when you run 'gluster volume heal $VOLNAME info'. It is basically a client process (AFR + protocol/client translator) that connects to the bricks of $VOLNAME, gets the list of files that need heal, print it to stdout and then terminates (thus closing the connections it established with the bricks). comment #8 shows 121 glfsheal processes running, which probably means we ran the heal-info command multiple (121) times and each time the process did not terminate successfully. If this is not a transient state, we would need to find out if and why 'gluster volume heal $VOLNAME info' is not completing successfully. Note: I'm currently engaged in RHGS stop gap related work and can take a look at the setup next week (not clearing the need-info until then). In the meantime, if anyone wants to take a stab at this you can run 'glfsheal $VOLNAME' and see if and why its hung using gdb, looking at the log file (/var/log/glusterfs/glfsheal-$VOLNAME.log) etc. Created attachment 1476204 [details] screenshot 4: retrying with recent gluster build, with fix for heal-info hang (In reply to Ravishankar N from comment #14) > There was a recent fix for heal-info hang > https://bugzilla.redhat.com/show_bug.cgi?id=1597654#c3 which should be > available in the next downstream build of rhgs-3.4.0 (v3.12.2-14?). It might > be worth seeing if the issue is reproducible with that build. I retried the scenario during testing of related RHGS WA BZ 1594383 and I no longer see the problem, so it seems that this problem has been addressed by this recent fix of heal-info as well. See screenshot 4. Version of gluster used: [root@mbukatov-usm1-gl1 ~]# rpm -qa | grep gluster | sort glusterfs-3.12.2-16.el7rhgs.x86_64 glusterfs-api-3.12.2-16.el7rhgs.x86_64 glusterfs-cli-3.12.2-16.el7rhgs.x86_64 glusterfs-client-xlators-3.12.2-16.el7rhgs.x86_64 glusterfs-events-3.12.2-16.el7rhgs.x86_64 glusterfs-fuse-3.12.2-16.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-16.el7rhgs.x86_64 glusterfs-libs-3.12.2-16.el7rhgs.x86_64 glusterfs-rdma-3.12.2-16.el7rhgs.x86_64 glusterfs-server-3.12.2-16.el7rhgs.x86_64 gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.6.x86_64 python2-gluster-3.12.2-16.el7rhgs.x86_64 tendrl-gluster-integration-1.6.3-9.el7rhgs.noarch vdsm-gluster-4.19.43-2.3.el7rhgs.noarch Additional details ================== This is more a note about RHGS WA and data reported on Brick Dashboard for this use case. Nothing described here is a bug. While the file is no longer stuck in .glusterfs/unlink directory: ``` [root@mbukatov-usm1-gl1 ~]# find /mnt/brick_* -name unlink | xargs tree /mnt/brick_beta_arbiter_1/1/.glusterfs/unlink /mnt/brick_beta_arbiter_2/2/.glusterfs/unlink /mnt/brick_beta_arbiter_3/3/.glusterfs/unlink /mnt/brick_gama_disperse_1/1/.glusterfs/unlink /mnt/brick_gama_disperse_2/2/.glusterfs/unlink 0 directories, 0 files ``` The underlying thin volume reports 99% utilization: ``` [root@mbukatov-usm1-gl1 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv_beta_arbiter_1 vg_beta_arbiter_1 Vwi-aot--- <9.95g pool_beta_arbiter_1 0.72 pool_beta_arbiter_1 vg_beta_arbiter_1 twi-aot--- <9.95g 0.72 0.11 lv_beta_arbiter_2 vg_beta_arbiter_2 Vwi-aot--- <9.95g pool_beta_arbiter_2 0.74 pool_beta_arbiter_2 vg_beta_arbiter_2 twi-aot--- <9.95g 0.74 0.11 lv_beta_arbiter_3 vg_beta_arbiter_3 Vwi-aot--- <9.95g pool_beta_arbiter_3 98.76 pool_beta_arbiter_3 vg_beta_arbiter_3 twi-aot--- <9.95g 98.76 2.43 lv_gama_disperse_1 vg_gama_disperse_1 Vwi-aot--- <9.95g pool_gama_disperse_1 0.17 pool_gama_disperse_1 vg_gama_disperse_1 twi-aot--- <9.95g 0.17 0.09 lv_gama_disperse_2 vg_gama_disperse_2 Vwi-aot--- 19.89g pool_gama_disperse_2 0.09 pool_gama_disperse_2 vg_gama_disperse_2 twi-aot--- 19.89g 0.09 0.06 ``` But when I run `fstrim /mnt/brick_beta_arbiter_3`, the space is reclaimed: ``` [root@mbukatov-usm1-gl1 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv_beta_arbiter_1 vg_beta_arbiter_1 Vwi-aot--- <9.95g pool_beta_arbiter_1 0.72 pool_beta_arbiter_1 vg_beta_arbiter_1 twi-aot--- <9.95g 0.72 0.11 lv_beta_arbiter_2 vg_beta_arbiter_2 Vwi-aot--- <9.95g pool_beta_arbiter_2 0.74 pool_beta_arbiter_2 vg_beta_arbiter_2 twi-aot--- <9.95g 0.74 0.11 lv_beta_arbiter_3 vg_beta_arbiter_3 Vwi-aot--- <9.95g pool_beta_arbiter_3 0.70 pool_beta_arbiter_3 vg_beta_arbiter_3 twi-aot--- <9.95g 0.70 0.11 lv_gama_disperse_1 vg_gama_disperse_1 Vwi-aot--- <9.95g pool_gama_disperse_1 0.17 pool_gama_disperse_1 vg_gama_disperse_1 twi-aot--- <9.95g 0.17 0.09 lv_gama_disperse_2 vg_gama_disperse_2 Vwi-aot--- 19.89g pool_gama_disperse_2 0.09 pool_gama_disperse_2 vg_gama_disperse_2 twi-aot--- 19.89g 0.09 0.06 ``` This is expected behavior, but I note it here as people could be concerned with RHGS WA reporting 99 % utilization for the thin pool (panel LVM Thin Pool Data Usage). |