| Summary: | Thin pool monitoring shows false alerts on brick utilization from nagios. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Triveni Rao <trao> | ||||
| Component: | nagios-server-addons | Assignee: | Ramesh N <rnachimu> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | RHS-C QE <rhsc-qe-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | rhgs-3.1 | CC: | mpillai, rcyriac, rnachimu, sabose, sankarshan, sashinde | ||||
| Target Milestone: | --- | Keywords: | ZStream | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1311837 (view as bug list) | Environment: | |||||
| Last Closed: | 2016-02-25 07:27:23 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Created attachment 1116898 [details]
screen shot 1
This is an issue with LVM. Blocks allocated to Thinpool is not freed after deleting the file. As a result though the files got moved to different brick by rebalance thin pool utilization shown high in LVM. Customer can run fstrim to free the blocks. There is already an RFS in RHGS (bz#1100511) to document fstrim on brick mounts. May we should document this as a known issue. (In reply to Ramesh N from comment #2) > This is an issue with LVM. Blocks allocated to Thinpool is not freed after > deleting the file. Just to be clear -- that is the expected behaviour with thin-p. > As a result though the files got moved to different brick > by rebalance thin pool utilization shown high in LVM. Customer can run > fstrim to free the blocks. There is already an RFS in RHGS (bz#1100511) to > document fstrim on brick mounts. May we should document this as a known > issue. The problem is that fstrim on the current RHEL 7.2 release is not working for thin LVs. bz #1284833. But looks like it will get fixed in the next z-stream update. (In reply to Manoj Pillai from comment #4) > (In reply to Ramesh N from comment #2) > > This is an issue with LVM. Blocks allocated to Thinpool is not freed after > > deleting the file. > > Just to be clear -- that is the expected behaviour with thin-p. > > > As a result though the files got moved to different brick > > by rebalance thin pool utilization shown high in LVM. Customer can run > > fstrim to free the blocks. There is already an RFS in RHGS (bz#1100511) to > > document fstrim on brick mounts. May we should document this as a known > > issue. > > The problem is that fstrim on the current RHEL 7.2 release is not working > for thin LVs. bz #1284833. But looks like it will get fixed in the next > z-stream update. Running fstrim is not yet documented in RHS guide. So when customer receives an alert mail from nagios saying thinpool utilization is high, then he may be confused. It gives a wrong impression by saying thin-pool is highly used though its not. We have to change our RHS guide to ask the user to run fstrim or at least we have to doc text this bug. This is not a bug in nagios monitoring, but a bug in underlying components. RameshN, can you raise a doc bug and close this? We can close this bug. bz#1100511 is already raised to document fstrim in RHGS guide. I have raised bz#1311837 for RHGS-C Admin guide to document the same. |
Description of problem: ====================== Thin pool monitoring shows false alerts on brick utilization from nagios. Lvm blocks allocated to thin pool are not released, writes to the brick is possible but nagios shows that thin pool is filled. Version-Release number of selected component (if applicable): ============================================================== rhsc-3.1.2-0.70.el6.noarch nagios-server-addons-0.2.3-1.el6rhs.noarch How reproducible: ================== easily Steps to Reproduce: 1. create a single brick distribute volume. 2. mount it on client and fill the brick 3. Now add one more brick and run rebalance 4. once rebalance completed check the status of brick utilization. 5. You can see that thin pool utilization is still high. Actual results: ================ even though writes are successful to the brick but it still shows brick full message. Expected results: ================= Monitoring should be proper no false alerts should be given. Additional info: ================= [root@dhcp46-33 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhgs-root 18G 1.7G 16G 10% / devtmpfs 1.9G 0 1.9G 0% /dev tmpfs 1.9G 0 1.9G 0% /dev/shm tmpfs 1.9G 129M 1.8G 7% /run tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup /dev/vda1 497M 89M 409M 18% /boot tmpfs 389M 0 389M 0% /run/user/0 /dev/mapper/vg--brick1-brick1 20G 20G 433M 98% /rhgs/brick1 /dev/mapper/vg--brick2-brick2 20G 18G 3.0G 86% /rhgs/brick2 /dev/mapper/vg--brick3-brick3 20G 33M 20G 1% /rhgs/brick3 [root@dhcp46-33 ~]# gluster v rebalance Distri status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 2 19.5GB 3 0 0 completed 856.00 volume rebalance: Distri: success [root@dhcp46-33 ~]# [root@dhcp46-33 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhgs-root 18G 1.7G 16G 10% / devtmpfs 1.9G 0 1.9G 0% /dev tmpfs 1.9G 0 1.9G 0% /dev/shm tmpfs 1.9G 129M 1.8G 7% /run tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup /dev/vda1 497M 89M 409M 18% /boot tmpfs 389M 0 389M 0% /run/user/0 /dev/mapper/vg--brick1-brick1 20G 34M 20G 1% /rhgs/brick1 /dev/mapper/vg--brick2-brick2 20G 20G 433M 98% /rhgs/brick2 /dev/mapper/vg--brick3-brick3 20G 33M 20G 1% /rhgs/brick3 [root@dhcp46-33 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root rhgs -wi-ao---- 17.47g swap rhgs -wi-ao---- 2.00g brick1 vg-brick1 Vwi-aot--- 20.00g pool-brick1 97.75 pool-brick1 vg-brick1 twi-aot--- 19.90g 98.24 2.50 brick2 vg-brick2 Vwi-aot--- 20.00g pool-brick2 97.82 pool-brick2 vg-brick2 twi-aot--- 19.90g 98.31 2.40 brick3 vg-brick3 Vwi-aot--- 20.00g pool-brick3 0.07 pool-brick3 vg-brick3 twi-aot--- 19.90g 0.07 0.06 [root@dhcp46-33 ~]# lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root rhgs -wi-ao---- 17.47g swap rhgs -wi-ao---- 2.00g brick1 vg-brick1 Vwi-aot--- 20.00g pool-brick1 97.75 pool-brick1 vg-brick1 twi-aot--- 19.90g 98.24 2.50 [pool-brick1_tdata] vg-brick1 Twi-ao---- 19.90g [pool-brick1_tmeta] vg-brick1 ewi-ao---- 102.00m brick2 vg-brick2 Vwi-aot--- 20.00g pool-brick2 97.82 pool-brick2 vg-brick2 twi-aot--- 19.90g 98.31 2.40 [pool-brick2_tdata] vg-brick2 Twi-ao---- 19.90g [pool-brick2_tmeta] vg-brick2 ewi-ao---- 102.00m brick3 vg-brick3 Vwi-aot--- 20.00g pool-brick3 0.07 pool-brick3 vg-brick3 twi-aot--- 19.90g 0.07 0.06 [pool-brick3_tdata] vg-brick3 Twi-ao---- 19.90g [pool-brick3_tmeta] vg-brick3 ewi-ao---- 102.00m [root@dhcp46-33 ~]# [root@dhcp46-33 ~]# [root@dhcp46-33 ~]# [root@dhcp46-33 ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root rhgs -wi-ao---- 17.47g swap rhgs -wi-ao---- 2.00g brick1 vg-brick1 Vwi-aot--- 20.00g pool-brick1 97.75 pool-brick1 vg-brick1 twi-aot--- 19.90g 98.24 2.50 brick2 vg-brick2 Vwi-aot--- 20.00g pool-brick2 97.82 pool-brick2 vg-brick2 twi-aot--- 19.90g 98.31 2.40 brick3 vg-brick3 Vwi-aot--- 20.00g pool-brick3 0.07 pool-brick3 vg-brick3 twi-aot--- 19.90g 0.07 0.06 [root@dhcp46-33 ~]# [root@dhcp46-33 ~]# [root@dhcp46-33 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhgs-root 18G 1.7G 16G 10% / devtmpfs 1.9G 0 1.9G 0% /dev tmpfs 1.9G 0 1.9G 0% /dev/shm tmpfs 1.9G 129M 1.8G 7% /run tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup /dev/vda1 497M 89M 409M 18% /boot tmpfs 389M 0 389M 0% /run/user/0 /dev/mapper/vg--brick1-brick1 20G 34M 20G 1% /rhgs/brick1 /dev/mapper/vg--brick2-brick2 20G 20G 433M 98% /rhgs/brick2 /dev/mapper/vg--brick3-brick3 20G 33M 20G 1% /rhgs/brick3