Bug 1284174 - TRIM/discards not functional on thin volume
Summary: TRIM/discards not functional on thin volume
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 22
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Mike Snitzer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-21 13:34 UTC by Mike Gerber
Modified: 2016-03-27 14:33 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-07 14:31:33 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
passing log from /discard_with_fstrim_passdown_false/ with XFS (9.15 KB, text/plain)
2015-11-24 19:51 UTC, Mike Snitzer
no flags Details
failing log from /discard_with_fstrim_passdown_false/ with ext4 (11.30 KB, text/plain)
2015-11-24 19:52 UTC, Mike Snitzer
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1284833 1 None None None 2021-09-06 15:00:26 UTC

Internal Links: 1284833

Description Mike Gerber 2015-11-21 13:34:44 UTC
Description of problem:

I noticed that TRIM/discards is not functional anymore on my thin volume:


# df -h /home
Filesystem                        Size  Used Avail Use% Mounted on
/dev/mapper/vg--waschsauger-home   69G   39G   30G  57% /home

# mount | grep /home
/dev/mapper/vg--waschsauger-home on /home type ext4 (rw,relatime,seclabel,stripe=16,data=ordered)

# fstrim -v /home
/home: 158.8 MiB (166445056 bytes) trimmed


# lvs -o lv_name,pool_lv,discards,data_percent
  LV          Pool   Discards Data% 
  docker-pool        passdown 30.54 
  home        pool00 passdown 90.52 
  mir_radio   pool00 passdown 84.66 
  mutationen  pool00 passdown 68.01 
  pool00             passdown 92.21 
  root        pool00 passdown 59.03 
  srv_docker  pool00 passdown 21.89 
  swap    

Notice that /home is 57% full while the free space is not passed down to the thin volume (90.5% Data) after the fstrim.

fstrim -av also has no output.

Any ideas?

Version-Release number of selected component (if applicable):

# uname -r
4.2.6-200.fc22.x86_64
# rpm -q lvm2
lvm2-2.02.116-3.fc22.x86_64



How reproducible:

Use thin pool on a F22, delete files on a thin volume, call fstrim on the thin volume and compare data_percent in lvs output vs df output. Also fstrim -av does not output all filesystems as expected.

Actual results:

data_percent stays at 90%.

Expected results:

data_percent should go down to 57%.

Additional info:

I tested this on another F22 system: Same problem. On F23 it works as expected.

Comment 1 Mike Gerber 2015-11-21 19:48:20 UTC
I dug a little deeper: fstrim -av ignores the filesystems as discard_granularity is zero, e.g.:

# cat /sys/devices/virtual/block/dm-*/queue/discard_granularity
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0


Also I updated the second F22 system to F23 and have the same problem there, using 4.2.6-300.fc23.x86_64.

Comment 2 Mike Gerber 2015-11-22 20:31:34 UTC
I did some bisecting over the weekend, between 64291f7  v4.2 (bad) and b953c0d  v4.1 (good):


34fbcf6257eb3f39a5b78a4f51b40f881b82033b is the first bad commit
commit 34fbcf6257eb3f39a5b78a4f51b40f881b82033b
Author: Joe Thornber <ejt>                                                                    
Date:   Thu Apr 16 12:58:35 2015 +0100
                                                                  
    dm thin: range discard support

However, with the bad commits between and including this and 64291f7 v4.2, fstrim -av still does something (process the filesystems), the discards just aren't processed by the thin pool.

Tested on the second system that I updated to F23.

Comment 3 Zdenek Kabelac 2015-11-23 07:46:53 UTC
Yep - the issue has been already noticed.

As a 'hotfix' - could you try to use 4.3 kernel (e.g. from koji)?

Comment 4 Mike Gerber 2015-11-23 08:17:56 UTC
Testing using 4.3.0 from koji:

# fstrim -av
# uname -r
4.3.0-1.fc24.x86_64

Still broken after an explicit fstrim -v /srv/vms:


# fstrim -v /srv/vms
/srv/vms: 77 GiB (82648084480 bytes) trimmed
# df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms 
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   27G   77G  26% /srv/vms
  LV         Discards   Data% 
  lv_srv_vms nopassdown 34.12 


Haven't tested 4.4.0-rc yet.

Comment 5 Mike Gerber 2015-11-23 08:34:01 UTC
Same problem using:

# uname -r
4.4.0-0.rc1.git3.1.fc24.x86_64

Comment 6 Zdenek Kabelac 2015-11-23 08:45:08 UTC
Hmm - are you sure 'trim' doesn't work.

While I'm aware it shows '0' in places where it shouldn't, the trim itself
seemed to work.


So - please -  'consume couple GB' in filesystem & sync.
Watch for  pool fullness.
Remove file and run  'fstrim'
Pool size should have dropped.

Is this working for you with 4.3 kernel ?

Comment 7 Mike Gerber 2015-11-23 09:13:25 UTC
Yes, I'm sure :) Exact sequence of commands using 4.3:

~ # uname -r
4.3.0-1.fc24.x86_64
~ # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   27G   77G  26% /srv/vms
  LV         Discards   Data%
  lv_srv_vms nopassdown 34.12
~ # cd /srv/vms/
vms # mkdir TEST-DELETE-ME
vms # cp -r centos4/ TEST-DELETE-ME/
vms # sync
vms # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   28G   76G  27% /srv/vms
  LV         Discards   Data%
  lv_srv_vms nopassdown 34.16
vms # rm -rf TEST-DELETE-ME/
vms # sync
vms # fstrim -av
vms # fstrim -v /srv/vms
/srv/vms: 77 GiB (82648084480 bytes) trimmed
vms # sync
vms # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   27G   77G  26% /srv/vms
  LV         Discards   Data%
  lv_srv_vms nopassdown 34.12

Comment 8 Zdenek Kabelac 2015-11-23 09:18:15 UTC
Looking at your results:

Before 'cp -r'  pool usage  34.12%

After copy    34.16%  (0.04% difference)

After 'rm -r'  & fstrim goes back to  34.12%

So IMHO TRIM does work in your case.
(And I assume with 4.2 this will not happen - value will not drop down)

What else do you expect here ?

Comment 9 Mike Gerber 2015-11-23 09:24:24 UTC
No, it should drop down to approximately 26% - as it does with 4.1.

Comment 10 Mike Gerber 2015-11-23 09:34:12 UTC
To make it clearer:

vms # cp -r centos4/ debian-* TEST-DELETE-ME/
vms # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   30G   74G  29% /srv/vms
  LV         Discards   Data%
  lv_srv_vms nopassdown 34.13
vms # mkdir TEST-DELETE-ME-2
vms # cp -r centos4/ debian-* TEST-DELETE-ME-2/
vms # sync
vms # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   34G   70G  33% /srv/vms
  LV         Discards   Data%
  lv_srv_vms nopassdown 36.05
vms # rm -rf TEST-DELETE-ME*
vms # sync
vms # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   27G   77G  26% /srv/vms
  LV         Discards   Data%
  lv_srv_vms nopassdown 36.05
vms # fstrim -v /srv/vms
/srv/vms: 7.5 GiB (8030244864 bytes) trimmed
vms # sync
vms # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   27G   77G  26% /srv/vms
  LV         Discards   Data%
  lv_srv_vms nopassdown 35.86

Comment 11 Zdenek Kabelac 2015-11-23 09:37:00 UTC
I think you miss the 'filesystem' layout here.

If you start with  34.12% - you can't expect it will drop to 26%.

What 'fstrim' does is it is finding 'empty' areas in filesystem - and sending there  'trim' - then it heavily depends on alignment of filesystem meta/data and thin-pool  chunksize (blocksize) -  when it's finished - it reports total
amount of trimmed space - but this does not need to match what thin-pool
has trimmed.

Thin-pool does not  support  any 'partial' trimmming (and you have not actually shown yet which chunksize/blocksize your thinpool is using ?
(lvs -o+chunksize)

Example of problem - if you use  4M chunk size - and  filesystem will trim just about 3MB from this single chunk - thin-pool is still not able to show you and space reduction - simply because chunk has not been 'fully' released - so it still appears as allocated.

So while  'fstrim' on 4.2 kernel should report an ioctl error, on 4.3 it's working and thin-pool does it's work and 'drop' data usage.

Comment 12 Mike Gerber 2015-11-23 09:53:41 UTC
# lvs -o+chunksize
  LV             VG         Attr       LSize   Pool     Origin Data%  Meta%  Move Log Cpy%Sync Convert Chunk  
  docker-pool    vg_jelinek twi-a-t---  19.41g                 4.15   0.08                             512.00k
  lv_fedora_root vg_jelinek -wi-ao----  12.00g                                                              0 
  lv_halde       vg_jelinek Vwi-aotz-- 571.00g thinpool        95.96                                        0 
  lv_home        vg_jelinek Vwi-aotz-- 120.00g thinpool        88.50                                        0 
  lv_moody       vg_jelinek Vwi-a-tz--  20.00g thinpool        42.22                                        0 
  lv_srv_vms     vg_jelinek Vwi-aotz-- 105.00g thinpool        35.86                                        0 
  lv_swap        vg_jelinek -wi-ao----   8.00g                                                              0 
  thinpool       vg_jelinek twi-aotz-- 861.05g                 81.32  62.91                             64.00k

Comment 13 Zdenek Kabelac 2015-11-23 09:58:41 UTC
So IMHO -  discard 'does' something - but it's nowhere near as good as it's supposed to be - and it's most likely related to your bisected  'range support' commit.

So leaving to Joe for further comments here (I assume he already does work on this issue)

Comment 14 Mike Gerber 2015-11-23 10:09:43 UTC
The thin LV started at 34% because TRIM already didn't work in an earlier try. Also, 4.1 *does* TRIM it down to approximately 26% (approximately %full on the fs).

To test this further I filled the fs with some data such that the LV is 50% full:

# df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms 
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   39G   65G  37% /srv/vms
  LV         Discards   Data% 
  lv_srv_vms nopassdown 50.13 

Then I delete the data again and TRIM:

# rm -rf TEST-DELETE-ME*
# sync
# fstrim -v /srv/vms
/srv/vms: 17.8 GiB (19052351488 bytes) trimmed
# df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms 
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   27G   77G  26% /srv/vms
  LV         Discards   Data% 
  lv_srv_vms nopassdown 49.29 

So it's up to 49.29% now. Can this still be an alignment issue?

Comment 15 Mike Snitzer 2015-11-23 17:36:07 UTC
(In reply to Zdenek Kabelac from comment #3)
> Yep - the issue has been already noticed.
> 
> As a 'hotfix' - could you try to use 4.3 kernel (e.g. from koji)?

The fix made it into v4.2 via this commit http://git.kernel.org/linus/aa0cd28d057fd4

(In reply to Mike Gerber from comment #1)
> I dug a little deeper: fstrim -av ignores the filesystems as
> discard_granularity is zero, e.g.:
> 
> # cat /sys/devices/virtual/block/dm-*/queue/discard_granularity
> 0
...
> 0

Did you configure lvm to disable discards, e.g.?:
 lvchange --discards ignore <pool>

There is no other way that discard_granularity would be 0 for the thin-pool and thin devices.

Comment 16 Mike Snitzer 2015-11-23 17:40:20 UTC
(In reply to Zdenek Kabelac from comment #11)

> So while  'fstrim' on 4.2 kernel should report an ioctl error, on 4.3 it's
> working and thin-pool does it's work and 'drop' data usage.

What error do you think the ioctl should report?  Partial thinp block discards won't _ever_ error.

The comments in this BZ speak to confusion about how discards work.

Comment 17 Mike Snitzer 2015-11-23 17:47:20 UTC
Mike Gerber, please provide the output of: lsblk -D

Comment 18 Mike Gerber 2015-11-23 17:56:08 UTC
(In reply to Mike Snitzer from comment #15)
> Did you configure lvm to disable discards, e.g.?:
>  lvchange --discards ignore <pool>
> 
> There is no other way that discard_granularity would be 0 for the thin-pool
> and thin devices.

No, it's "nopassdown" as the underlying PV (a LUKS volume on a MD RAID1) does not TRIM. So, as posted before ("Discards" column):

# df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms 
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   27G   77G  26% /srv/vms
  LV         Discards   Data% 
  lv_srv_vms nopassdown 49.29 
# cat /sys/devices/virtual/block/dm-*/queue/discard_granularity
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
# uname -r
4.3.0-1.fc24.x86_64
# lsblk -D
NAME                                            DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda                                                    0        0B       0B         0
├─sda1                                                 0        0B       0B         0
│ └─md0                                                0        0B       0B         0
├─sda2                                                 0        0B       0B         0
└─sda5                                                 0        0B       0B         0
  └─md3                                                0        0B       0B         0
    └─luks-438be9cc-ddef-48b2-bf01-88958f0d74db        0        0B       0B         0
      ├─vg_jelinek-lv_fedora_root                      0        0B       0B         0
      ├─vg_jelinek-lv_swap                             0        0B       0B         0
      ├─vg_jelinek-thinpool_tmeta                      0        0B       0B         0
      │ └─vg_jelinek-thinpool-tpool                    0        0B       0B         0
      │   ├─vg_jelinek-thinpool                        0        0B       0B         0
      │   ├─vg_jelinek-lv_moody                        0        0B       0B         0
      │   ├─vg_jelinek-lv_srv_vms                      0        0B       0B         0
      │   ├─vg_jelinek-lv_home                         0        0B       0B         0
      │   └─vg_jelinek-lv_halde                        0        0B       0B         0
      ├─vg_jelinek-thinpool_tdata                      0        0B       0B         0
      │ └─vg_jelinek-thinpool-tpool                    0        0B       0B         0
      │   ├─vg_jelinek-thinpool                        0        0B       0B         0
      │   ├─vg_jelinek-lv_moody                        0        0B       0B         0
      │   ├─vg_jelinek-lv_srv_vms                      0        0B       0B         0
      │   ├─vg_jelinek-lv_home                         0        0B       0B         0
      │   └─vg_jelinek-lv_halde                        0        0B       0B         0
      ├─vg_jelinek-docker--pool_tmeta                  0        0B       0B         0
      │ └─vg_jelinek-docker--pool                      0        0B       0B         0
      └─vg_jelinek-docker--pool_tdata                  0        0B       0B         0
        └─vg_jelinek-docker--pool                      0        0B       0B         0
sdb                                                    0        0B       0B         0
├─sdb1                                                 0        0B       0B         0
│ └─md0                                                0        0B       0B         0
├─sdb2                                                 0        0B       0B         0
└─sdb5                                                 0        0B       0B         0
  └─md3                                                0        0B       0B         0
    └─luks-438be9cc-ddef-48b2-bf01-88958f0d74db        0        0B       0B         0
      ├─vg_jelinek-lv_fedora_root                      0        0B       0B         0
      ├─vg_jelinek-lv_swap                             0        0B       0B         0
      ├─vg_jelinek-thinpool_tmeta                      0        0B       0B         0
      │ └─vg_jelinek-thinpool-tpool                    0        0B       0B         0
      │   ├─vg_jelinek-thinpool                        0        0B       0B         0
      │   ├─vg_jelinek-lv_moody                        0        0B       0B         0
      │   ├─vg_jelinek-lv_srv_vms                      0        0B       0B         0
      │   ├─vg_jelinek-lv_home                         0        0B       0B         0
      │   └─vg_jelinek-lv_halde                        0        0B       0B         0
      ├─vg_jelinek-thinpool_tdata                      0        0B       0B         0
      │ └─vg_jelinek-thinpool-tpool                    0        0B       0B         0
      │   ├─vg_jelinek-thinpool                        0        0B       0B         0
      │   ├─vg_jelinek-lv_moody                        0        0B       0B         0
      │   ├─vg_jelinek-lv_srv_vms                      0        0B       0B         0
      │   ├─vg_jelinek-lv_home                         0        0B       0B         0
      │   └─vg_jelinek-lv_halde                        0        0B       0B         0
      ├─vg_jelinek-docker--pool_tmeta                  0        0B       0B         0
      │ └─vg_jelinek-docker--pool                      0        0B       0B         0
      └─vg_jelinek-docker--pool_tdata                  0        0B       0B         0
        └─vg_jelinek-docker--pool                      0        0B       0B         0

Comment 19 Mike Gerber 2015-11-23 18:01:35 UTC
(In reply to Mike Snitzer from comment #16)
> The comments in this BZ speak to confusion about how discards work.

I am not sure if I am confused: an fstrim on the same FS using 4.1.0 *does* reduce thin "Data%" to approximately "Use%" (df), just as I expect. A fstrim using a kernel after and including the bisected commit (mentioned above) does not. Unless the expected behaviour changed, it's a bug.

Comment 20 Mike Snitzer 2015-11-23 18:07:00 UTC
comment#18 shows that discards simply aren't enabled on _any_ of your devices.  That obviously isn't what 'no_discard_passdown' should do.

Please provide the output of: dmsetup table

(In reply to Mike Gerber from comment #19)
> (In reply to Mike Snitzer from comment #16)
> > The comments in this BZ speak to confusion about how discards work.
> 
> I am not sure if I am confused: an fstrim on the same FS using 4.1.0 *does*
> reduce thin "Data%" to approximately "Use%" (df), just as I expect. A fstrim
> using a kernel after and including the bisected commit (mentioned above)
> does not. Unless the expected behaviour changed, it's a bug.

I was saying Zdenek was confused in places.

I agree we need to get to the bottom of your issue.

Odd that fstrim is saying it discarded blocks yet the underlying device is _not_ advertising that it actually supports discards.

Comment 21 Mike Gerber 2015-11-23 18:17:36 UTC
(In reply to Mike Snitzer from comment #20)
> comment#18 shows that discards simply aren't enabled on _any_ of your
> devices.  That obviously isn't what 'no_discard_passdown' should do.
> 
> Please provide the output of: dmsetup table

# dmsetup table
vg_jelinek-lv_fedora_root: 0 25165824 linear 253:0 1827742080
vg_jelinek-lv_srv_vms: 0 220200960 thin 253:7 3
vg_jelinek-lv_home: 0 251658240 thin 253:7 4
vg_jelinek-docker--pool: 0 40706048 thin-pool 253:13 253:14 1024 0 2 skip_block_zeroing no_discard_passdown 
vg_jelinek-thinpool: 0 1805762560 linear 253:7 0
vg_jelinek-lv_swap: 0 16777216 linear 253:0 1652556160
vg_jelinek-docker--pool_tdata: 0 40706048 linear 253:0 1505755520
vg_jelinek-lv_halde: 0 1197473792 thin 253:7 5
vg_jelinek-docker--pool_tmeta: 0 1957888 linear 253:0 1806770560
vg_jelinek-lv_moody: 0 41943040 thin 253:7 2
luks-438be9cc-ddef-48b2-bf01-88958f0d74db: 0 1952492696 crypt aes-cbc-essiv:sha256 0000000000000000000000000000000000000000000000000000000000000000 0 9:3 2056
vg_jelinek-thinpool-tpool: 0 1805762560 thin-pool 253:3 253:4 128 0 1 no_discard_passdown 
vg_jelinek-thinpool_tdata: 0 104857600 linear 253:0 1547698560
vg_jelinek-thinpool_tdata: 104857600 40894464 linear 253:0 1048960
vg_jelinek-thinpool_tdata: 145752064 16818176 linear 253:0 1935671680
vg_jelinek-thinpool_tdata: 162570240 209715200 linear 253:0 1296040320
vg_jelinek-thinpool_tdata: 372285440 41943040 linear 253:0 1893728640
vg_jelinek-thinpool_tdata: 414228480 16777216 linear 253:0 230687104
vg_jelinek-thinpool_tdata: 431005696 188743680 linear 253:0 41943424
vg_jelinek-thinpool_tdata: 619749376 1048576000 linear 253:0 247464320
vg_jelinek-thinpool_tdata: 1668325376 137437184 linear 253:0 1669333376
vg_jelinek-thinpool_tmeta: 0 1048576 linear 253:0 384


> > I am not sure if I am confused: an fstrim on the same FS using 4.1.0 *does*
> > reduce thin "Data%" to approximately "Use%" (df), just as I expect. A fstrim
> > using a kernel after and including the bisected commit (mentioned above)
> > does not. Unless the expected behaviour changed, it's a bug.
> I was saying Zdenek was confused in places.

Ok, just making sure :)

Comment 23 Mike Snitzer 2015-11-23 19:23:10 UTC
(In reply to Mike Snitzer from comment #22)
> Mike, please try this fix:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/
> commit/?h=dm-4.4&id=313b68157b94366d056b8704f2b4c3e1d0d8fe9e

Strike that, I have to revise this patch.. it is close but not quite right.
Will have corrected fix shortly.

Comment 24 Mike Gerber 2015-11-23 20:12:33 UTC
Side note: Your commit message also explains why the two systems with LUKS on something without discard capabilities are "bad" and the third system with LUKS on a SSD is "good".

Comment 26 Mike Gerber 2015-11-24 07:36:44 UTC
(In reply to Mike Snitzer from comment #25)
> Please try this:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/
> commit/?h=dm-4.4&id=0fcb04d59351f790efb8da18edefd6ab4d9bbf3b

It looks a little bit better, fstrim -av does work on the fs again, but the thin LV is not trimmed down as expected (Use% 30% vs Data% 51.02). (The first fstrim -av on the fs did trim >70G, not shown here.) I'll test with more data, i.e. filling over the 51.02%, later.

# uname -r
4.4.0-rc1-DEBUGTHIN+

# fstrim -av
/srv/vms: 0 B (0 bytes) trimmed
/home: 0 B (0 bytes) trimmed
/halde: 0 B (0 bytes) trimmed

# df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms 
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   31G   73G  30% /srv/vms
  LV         Discards   Data% 
  lv_srv_vms nopassdown 51.02 

# lsblk -D
NAME                                            DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda                                                    0        0B       0B         0
├─sda1                                                 0        0B       0B         0
│ └─md0                                                0        0B       0B         0
├─sda2                                                 0        0B       0B         0
└─sda5                                                 0        0B       0B         0
  └─md3                                                0        0B       0B         0
    └─luks-438be9cc-ddef-48b2-bf01-88958f0d74db        0        0B       0B         0
      ├─vg_jelinek-lv_fedora_root                      0        0B       0B         0
      ├─vg_jelinek-lv_swap                             0        0B       0B         0
      ├─vg_jelinek-thinpool_tmeta                      0        0B       0B         0
      │ └─vg_jelinek-thinpool-tpool                    0        0B       0B         0
      │   ├─vg_jelinek-thinpool                        0        0B       0B         0
      │   ├─vg_jelinek-lv_moody                        0       64K      16G         0
      │   ├─vg_jelinek-lv_srv_vms                      0       64K      16G         0
      │   ├─vg_jelinek-lv_home                         0       64K      16G         0
      │   └─vg_jelinek-lv_halde                        0       64K      16G         0
      ├─vg_jelinek-thinpool_tdata                      0        0B       0B         0
      │ └─vg_jelinek-thinpool-tpool                    0        0B       0B         0
      │   ├─vg_jelinek-thinpool                        0        0B       0B         0
      │   ├─vg_jelinek-lv_moody                        0       64K      16G         0
      │   ├─vg_jelinek-lv_srv_vms                      0       64K      16G         0
      │   ├─vg_jelinek-lv_home                         0       64K      16G         0
      │   └─vg_jelinek-lv_halde                        0       64K      16G         0
      ├─vg_jelinek-docker--pool_tmeta                  0        0B       0B         0
      │ └─vg_jelinek-docker--pool                      0        0B       0B         0
      └─vg_jelinek-docker--pool_tdata                  0        0B       0B         0
        └─vg_jelinek-docker--pool                      0        0B       0B         0
sdb                                                    0        0B       0B         0
├─sdb1                                                 0        0B       0B         0
│ └─md0                                                0        0B       0B         0
├─sdb2                                                 0        0B       0B         0
└─sdb5                                                 0        0B       0B         0
  └─md3                                                0        0B       0B         0
    └─luks-438be9cc-ddef-48b2-bf01-88958f0d74db        0        0B       0B         0
      ├─vg_jelinek-lv_fedora_root                      0        0B       0B         0
      ├─vg_jelinek-lv_swap                             0        0B       0B         0
      ├─vg_jelinek-thinpool_tmeta                      0        0B       0B         0
      │ └─vg_jelinek-thinpool-tpool                    0        0B       0B         0
      │   ├─vg_jelinek-thinpool                        0        0B       0B         0
      │   ├─vg_jelinek-lv_moody                        0       64K      16G         0
      │   ├─vg_jelinek-lv_srv_vms                      0       64K      16G         0
      │   ├─vg_jelinek-lv_home                         0       64K      16G         0
      │   └─vg_jelinek-lv_halde                        0       64K      16G         0
      ├─vg_jelinek-thinpool_tdata                      0        0B       0B         0
      │ └─vg_jelinek-thinpool-tpool                    0        0B       0B         0
      │   ├─vg_jelinek-thinpool                        0        0B       0B         0
      │   ├─vg_jelinek-lv_moody                        0       64K      16G         0
      │   ├─vg_jelinek-lv_srv_vms                      0       64K      16G         0
      │   ├─vg_jelinek-lv_home                         0       64K      16G         0
      │   └─vg_jelinek-lv_halde                        0       64K      16G         0
      ├─vg_jelinek-docker--pool_tmeta                  0        0B       0B         0
      │ └─vg_jelinek-docker--pool                      0        0B       0B         0
      └─vg_jelinek-docker--pool_tdata                  0        0B       0B         0
        └─vg_jelinek-docker--pool                      0        0B       0B         0

# lvs -o+chunksize,discards
  LV             VG         Attr       LSize   Pool     Origin Data%  Meta%  Move Log Cpy%Sync Convert Chunk   Discards  
  docker-pool    vg_jelinek twi-a-t---  19.41g                 4.15   0.08                             512.00k nopassdown
  lv_fedora_root vg_jelinek -wi-ao----  12.00g                                                              0            
  lv_halde       vg_jelinek Vwi-aotz-- 571.00g thinpool        95.96                                        0  nopassdown
  lv_home        vg_jelinek Vwi-aotz-- 120.00g thinpool        88.52                                        0  nopassdown
  lv_moody       vg_jelinek Vwi-a-tz--  20.00g thinpool        42.22                                        0  nopassdown
  lv_srv_vms     vg_jelinek Vwi-aotz-- 105.00g thinpool        51.02                                        0  nopassdown
  lv_swap        vg_jelinek -wi-ao----   8.00g                                                              0            
  thinpool       vg_jelinek twi-aotz-- 861.05g                 83.17  64.44                             64.00k nopassdown

Comment 27 Mike Gerber 2015-11-24 07:48:21 UTC
lrwxrwxrwx. 1 root root       8 Nov 23 23:36 vg_jelinek-lv_srv_vms -> ../dm-10

==> /sys/devices/virtual/block/dm-10/queue/discard_granularity <==
65536

Comment 28 Mike Gerber 2015-11-24 10:54:53 UTC
#  df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms 
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   48G   56G  47% /srv/vms
  LV         Discards   Data% 
  lv_srv_vms nopassdown 54.70 
# du -sh TEST-DELETE-ME*
3.5G	TEST-DELETE-ME
3.5G	TEST-DELETE-ME-2
3.5G	TEST-DELETE-ME-3
3.5G	TEST-DELETE-ME-4
3.5G	TEST-DELETE-ME-5
# rm -rf TEST-DELETE-ME*
# sync
# fstrim -av
/srv/vms: 19.6 GiB (21076836352 bytes) trimmed
/home: 79.8 MiB (83632128 bytes) trimmed
/halde: 0 B (0 bytes) trimmed
# sync
#  df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms 
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   31G   73G  30% /srv/vms
  LV         Discards   Data% 
  lv_srv_vms nopassdown 53.47 
# uname -r
4.4.0-rc1-DEBUGTHIN+

Comment 29 Mike Gerber 2015-11-24 12:32:00 UTC
To reproduce:

* Minimal install of F23 Server using netinstall, in a VM
  * Manual partitioning: put root on Thin provisioning
* Note df -h and lvs
* Fill / FS with some data
* Delete data again
* fstrim -av (does nothing)
* fstrim -v / (does something)
* Look at df -h and lvs

Comment 30 Mike Snitzer 2015-11-24 19:47:25 UTC
We already have pretty extensive testing in place with device-mapper-test-suite.
(Need to make sure we add explicit coverage for underlying data device that doesn't support discards though...)
But in the following test I've made sure to use a data device that does _not_ support discards (and 'no_discard_passdown' is used to match the config used by reporter of this BZ).

I've modified the device-mapper-test-suite's /discard_with_fstrim_passdown_false/ test to be (ruby code):

  def discard_with_fstrim_passdown(passdown)
    dir = "./mnt1"
    @size = gig(4)
    file_size = @size / 20
    files = (0..9).reduce([]) {|memo, obj| memo << "file_#{obj}"}

    with_standard_pool(@size, :error_if_no_space => true, :discard_passdown => passdown) do |pool|
      with_new_thin(pool, @size * 2, 0) do |thin|
        ProcessControl.run("lsblk -D #{thin}") # record the discard limits

        fs = FS::file_system(:xfs, thin)
        fs.format
        fs.with_mount(dir, :discard => false) do
          s = PoolStatus.new(pool) # record "before" data usage in log
          data_before = s.used_data_blocks

          Dir.chdir(dir) do
            files.each do |f|
              ProcessControl.run("dd if=/dev/zero of=#{f} bs=1M count=#{file_size / meg(1)} oflag=direct")
            end

            PoolStatus.new(pool) # record "during" data usage in log

            files.each do |f|
              ProcessControl.run("rm #{f}")
            end
          end

          ProcessControl.run("fstrim -v #{dir}")

          s = PoolStatus.new(pool)
          # verify that all data blocks were recovered via fstrim (assumes FS alignment allows this)
          s.used_data_blocks.should == data_before
          s.options[:mode].should == :read_write
        end
      end
    end
  end

I'll attach test logs from running this test against a 4.4-rc1 kernel with the patch from comment#25 applied.  The logs show that fstrim against XFS works whereas fstrim against ext4 does _not_. (we may need to clone this BZ to chase why ext4 isn't working like we'd hope.. could be an alignment issue where every allocated block somehow gets some ext4 metadata sprinkled in it?, cc'ing Eric Sandeen)

Comment 31 Mike Snitzer 2015-11-24 19:51:46 UTC
Created attachment 1098331 [details]
passing log from /discard_with_fstrim_passdown_false/ with XFS

Comment 32 Mike Snitzer 2015-11-24 19:52:30 UTC
Created attachment 1098332 [details]
failing log from /discard_with_fstrim_passdown_false/ with ext4

Comment 33 Mike Gerber 2015-11-24 19:56:25 UTC
I am also using ext4.

Comment 34 Mike Gerber 2015-11-25 12:41:07 UTC
I do not know much about ext4 externals, but here is evidence against the "metadata sprinkling" theory: I booted 4.0.4 from F22 and ran fstrim against the same FS above -- Data% is now at approximately Use% of the FS.

Note the change in the lvs output.

# df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms 
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   31G   73G  30% /srv/vms
  LV         Discards   Data% 
  lv_srv_vms nopassdown 53.51 
# fstrim -av
/srv/vms: 72.3 GiB (77631696896 bytes) trimmed
/halde: 23.1 GiB (24801894400 bytes) trimmed
/home: 13.6 GiB (14594449408 bytes) trimmed
# df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms 
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   31G   73G  30% /srv/vms
  LV         Discards   Data% 
  lv_srv_vms nopassdown 31.26 
# uname -r
4.0.4-301.fc22.x86_64

Comment 35 Mike Snitzer 2015-12-03 16:38:40 UTC
Joe has fixed various issues with the range discard support in thinp.  But even before these fixes it became clear that your ext4 test is missing an important step: unlike XFS, you _must_ "sync" after deleting the files (before running fstrim).

But please feel free to try this kernel:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.5

Specifically these patches:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.5&id=993ceab91986e2e737ce9a3e23bebc8cce649240
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.5&id=a3bb7274526aa3146adbcdc9103cb0e67584b1be

(the first patch being the most critical)

Comment 36 Mike Gerber 2015-12-03 21:05:13 UTC
Looks good!

# df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms 
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   49G   55G  47% /srv/vms
  LV         Discards   Data% 
  lv_srv_vms nopassdown 52.03 
# du -sch TEST-DELETE-ME*
3.5G	TEST-DELETE-ME
3.5G	TEST-DELETE-ME-2
3.5G	TEST-DELETE-ME-3
3.5G	TEST-DELETE-ME-4
3.5G	TEST-DELETE-ME-5
18G	total
# sync
# rm -rf TEST-DELETE-ME*
# sync
# fstrim -av
/srv/vms: 72.1 GiB (77385101312 bytes) trimmed
/halde: 23.1 GiB (24801894400 bytes) trimmed
/home: 14.9 GiB (15980175360 bytes) trimmed
# df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms 
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/vg_jelinek-lv_srv_vms  104G   32G   72G  31% /srv/vms
  LV         Discards   Data% 
  lv_srv_vms nopassdown 31.58 

Using dm-4.5 (66acf19a).

Comment 37 Mike Gerber 2015-12-10 05:51:42 UTC
Tested on a second system using dm-4.5, also looks good there.

Comment 38 Mike Snitzer 2015-12-10 16:59:54 UTC
This fix will be sent to Linus for 4.4-rc5 inclusion tomorrow:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.4&id=993ceab91986e2e737ce9a3e23bebc8cce649240

Comment 39 Mike Gerber 2016-02-04 16:25:20 UTC
FYI

Using 4.3.4-200.fc22.x86_64 on F22:

* fstrim -av still does nothing
* fstrim /, fstrim /home frees space in the pool

Comment 40 Mike Gerber 2016-03-05 18:24:24 UTC
Fixed on F22 using 4.4.3-201.fc22.x86_64. Thanks!


Note You need to log in before you can comment on or make changes to this bug.