Bug 1300653 - Thin pool monitoring shows false alerts on brick utilization from nagios.
Thin pool monitoring shows false alerts on brick utilization from nagios.
Status: CLOSED NOTABUG
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: nagios-server-addons (Show other bugs)
3.1
x86_64 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Ramesh N
RHS-C QE
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-21 06:17 EST by Triveni Rao
Modified: 2016-05-16 00:39 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1311837 (view as bug list)
Environment:
Last Closed: 2016-02-25 02:27:23 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
screen shot 1 (1.44 MB, image/png)
2016-01-21 06:17 EST, Triveni Rao
no flags Details

  None (edit)
Description Triveni Rao 2016-01-21 06:17:20 EST
Description of problem:
======================
Thin pool monitoring shows false alerts on brick utilization from nagios.
Lvm blocks allocated to thin pool are not released, writes to the brick is possible but nagios shows that thin pool is filled.

Version-Release number of selected component (if applicable):
==============================================================
rhsc-3.1.2-0.70.el6.noarch
nagios-server-addons-0.2.3-1.el6rhs.noarch


How reproducible:
==================
easily


Steps to Reproduce:
1. create a single brick distribute volume.
2. mount it on client and fill the brick
3. Now add one more brick and run rebalance
4. once rebalance completed check the status of brick utilization.
5. You can see that thin pool utilization is still high.


Actual results:
================
even though writes are successful to the brick but it still shows brick full message.

Expected results:
=================
Monitoring should be proper no false alerts should be given.

Additional info:
=================

[root@dhcp46-33 ~]# df -h
Filesystem                     Size  Used Avail Use% Mounted on
/dev/mapper/rhgs-root           18G  1.7G   16G  10% /
devtmpfs                       1.9G     0  1.9G   0% /dev
tmpfs                          1.9G     0  1.9G   0% /dev/shm
tmpfs                          1.9G  129M  1.8G   7% /run
tmpfs                          1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/vda1                      497M   89M  409M  18% /boot
tmpfs                          389M     0  389M   0% /run/user/0
/dev/mapper/vg--brick1-brick1   20G   20G  433M  98% /rhgs/brick1
/dev/mapper/vg--brick2-brick2   20G   18G  3.0G  86% /rhgs/brick2
/dev/mapper/vg--brick3-brick3   20G   33M   20G   1% /rhgs/brick3
[root@dhcp46-33 ~]# gluster v rebalance Distri status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                2        19.5GB             3             0             0            completed             856.00
volume rebalance: Distri: success
[root@dhcp46-33 ~]# 
[root@dhcp46-33 ~]# df -h
Filesystem                     Size  Used Avail Use% Mounted on
/dev/mapper/rhgs-root           18G  1.7G   16G  10% /
devtmpfs                       1.9G     0  1.9G   0% /dev
tmpfs                          1.9G     0  1.9G   0% /dev/shm
tmpfs                          1.9G  129M  1.8G   7% /run
tmpfs                          1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/vda1                      497M   89M  409M  18% /boot
tmpfs                          389M     0  389M   0% /run/user/0
/dev/mapper/vg--brick1-brick1   20G   34M   20G   1% /rhgs/brick1
/dev/mapper/vg--brick2-brick2   20G   20G  433M  98% /rhgs/brick2
/dev/mapper/vg--brick3-brick3   20G   33M   20G   1% /rhgs/brick3
[root@dhcp46-33 ~]# lvs
  LV          VG        Attr       LSize  Pool        Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root        rhgs      -wi-ao---- 17.47g                                                           
  swap        rhgs      -wi-ao----  2.00g                                                           
  brick1      vg-brick1 Vwi-aot--- 20.00g pool-brick1        97.75                                  
  pool-brick1 vg-brick1 twi-aot--- 19.90g                    98.24  2.50                            
  brick2      vg-brick2 Vwi-aot--- 20.00g pool-brick2        97.82                                  
  pool-brick2 vg-brick2 twi-aot--- 19.90g                    98.31  2.40                            
  brick3      vg-brick3 Vwi-aot--- 20.00g pool-brick3        0.07                                   
  pool-brick3 vg-brick3 twi-aot--- 19.90g                    0.07   0.06                            
[root@dhcp46-33 ~]# lvs -a
  LV                  VG        Attr       LSize   Pool        Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root                rhgs      -wi-ao----  17.47g                                                           
  swap                rhgs      -wi-ao----   2.00g                                                           
  brick1              vg-brick1 Vwi-aot---  20.00g pool-brick1        97.75                                  
  pool-brick1         vg-brick1 twi-aot---  19.90g                    98.24  2.50                            
  [pool-brick1_tdata] vg-brick1 Twi-ao----  19.90g                                                           
  [pool-brick1_tmeta] vg-brick1 ewi-ao---- 102.00m                                                           
  brick2              vg-brick2 Vwi-aot---  20.00g pool-brick2        97.82                                  
  pool-brick2         vg-brick2 twi-aot---  19.90g                    98.31  2.40                            
  [pool-brick2_tdata] vg-brick2 Twi-ao----  19.90g                                                           
  [pool-brick2_tmeta] vg-brick2 ewi-ao---- 102.00m                                                           
  brick3              vg-brick3 Vwi-aot---  20.00g pool-brick3        0.07                                   
  pool-brick3         vg-brick3 twi-aot---  19.90g                    0.07   0.06                            
  [pool-brick3_tdata] vg-brick3 Twi-ao----  19.90g                                                           
  [pool-brick3_tmeta] vg-brick3 ewi-ao---- 102.00m                                                           
[root@dhcp46-33 ~]# 
[root@dhcp46-33 ~]# 
[root@dhcp46-33 ~]# 
[root@dhcp46-33 ~]# lvs
  LV          VG        Attr       LSize  Pool        Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root        rhgs      -wi-ao---- 17.47g                                                           
  swap        rhgs      -wi-ao----  2.00g                                                           
  brick1      vg-brick1 Vwi-aot--- 20.00g pool-brick1        97.75                                  
  pool-brick1 vg-brick1 twi-aot--- 19.90g                    98.24  2.50                            
  brick2      vg-brick2 Vwi-aot--- 20.00g pool-brick2        97.82                                  
  pool-brick2 vg-brick2 twi-aot--- 19.90g                    98.31  2.40                            
  brick3      vg-brick3 Vwi-aot--- 20.00g pool-brick3        0.07                                   
  pool-brick3 vg-brick3 twi-aot--- 19.90g                    0.07   0.06                            
[root@dhcp46-33 ~]# 
[root@dhcp46-33 ~]# 
[root@dhcp46-33 ~]# df -h
Filesystem                     Size  Used Avail Use% Mounted on
/dev/mapper/rhgs-root           18G  1.7G   16G  10% /
devtmpfs                       1.9G     0  1.9G   0% /dev
tmpfs                          1.9G     0  1.9G   0% /dev/shm
tmpfs                          1.9G  129M  1.8G   7% /run
tmpfs                          1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/vda1                      497M   89M  409M  18% /boot
tmpfs                          389M     0  389M   0% /run/user/0
/dev/mapper/vg--brick1-brick1   20G   34M   20G   1% /rhgs/brick1
/dev/mapper/vg--brick2-brick2   20G   20G  433M  98% /rhgs/brick2
/dev/mapper/vg--brick3-brick3   20G   33M   20G   1% /rhgs/brick3
Comment 1 Triveni Rao 2016-01-21 06:17 EST
Created attachment 1116898 [details]
screen shot 1
Comment 2 Ramesh N 2016-01-21 06:36:28 EST
This is an issue with LVM. Blocks allocated to Thinpool is not freed after deleting the file. As a result though the files got moved to different brick by rebalance thin pool utilization shown high in LVM. Customer can run fstrim to free the blocks. There is already an RFS in RHGS (bz#1100511) to document fstrim on brick mounts. May we should document this as a known issue.
Comment 4 Manoj Pillai 2016-01-22 03:26:54 EST
(In reply to Ramesh N from comment #2)
> This is an issue with LVM. Blocks allocated to Thinpool is not freed after
> deleting the file. 

Just to be clear -- that is the expected behaviour with thin-p.

> As a result though the files got moved to different brick
> by rebalance thin pool utilization shown high in LVM. Customer can run
> fstrim to free the blocks. There is already an RFS in RHGS (bz#1100511) to
> document fstrim on brick mounts. May we should document this as a known
> issue.

The problem is that fstrim on the current RHEL 7.2 release is not working for thin LVs. bz #1284833. But looks like it will get fixed in the next z-stream update.
Comment 5 Ramesh N 2016-01-22 03:51:41 EST
(In reply to Manoj Pillai from comment #4)
> (In reply to Ramesh N from comment #2)
> > This is an issue with LVM. Blocks allocated to Thinpool is not freed after
> > deleting the file. 
> 
> Just to be clear -- that is the expected behaviour with thin-p.
> 
> > As a result though the files got moved to different brick
> > by rebalance thin pool utilization shown high in LVM. Customer can run
> > fstrim to free the blocks. There is already an RFS in RHGS (bz#1100511) to
> > document fstrim on brick mounts. May we should document this as a known
> > issue.
> 
> The problem is that fstrim on the current RHEL 7.2 release is not working
> for thin LVs. bz #1284833. But looks like it will get fixed in the next
> z-stream update.

Running fstrim is not yet documented in RHS guide. So when customer receives an alert mail from nagios saying thinpool utilization is high, then he may be confused. It gives a wrong impression by saying thin-pool is highly used though its not. 

We have to change our RHS guide to ask the user to run fstrim or at least we have to doc text this bug.
Comment 6 Sahina Bose 2016-02-24 01:15:35 EST
This is not a bug in nagios monitoring, but a bug in underlying components.
RameshN, can you raise a doc bug and close this?
Comment 8 Ramesh N 2016-02-25 02:27:23 EST
We can close this bug. bz#1100511 is already raised to document fstrim in RHGS guide. I have raised bz#1311837 for RHGS-C Admin guide to document the same.

Note You need to log in before you can comment on or make changes to this bug.