Description of problem: I noticed that TRIM/discards is not functional anymore on my thin volume: # df -h /home Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg--waschsauger-home 69G 39G 30G 57% /home # mount | grep /home /dev/mapper/vg--waschsauger-home on /home type ext4 (rw,relatime,seclabel,stripe=16,data=ordered) # fstrim -v /home /home: 158.8 MiB (166445056 bytes) trimmed # lvs -o lv_name,pool_lv,discards,data_percent LV Pool Discards Data% docker-pool passdown 30.54 home pool00 passdown 90.52 mir_radio pool00 passdown 84.66 mutationen pool00 passdown 68.01 pool00 passdown 92.21 root pool00 passdown 59.03 srv_docker pool00 passdown 21.89 swap Notice that /home is 57% full while the free space is not passed down to the thin volume (90.5% Data) after the fstrim. fstrim -av also has no output. Any ideas? Version-Release number of selected component (if applicable): # uname -r 4.2.6-200.fc22.x86_64 # rpm -q lvm2 lvm2-2.02.116-3.fc22.x86_64 How reproducible: Use thin pool on a F22, delete files on a thin volume, call fstrim on the thin volume and compare data_percent in lvs output vs df output. Also fstrim -av does not output all filesystems as expected. Actual results: data_percent stays at 90%. Expected results: data_percent should go down to 57%. Additional info: I tested this on another F22 system: Same problem. On F23 it works as expected.
I dug a little deeper: fstrim -av ignores the filesystems as discard_granularity is zero, e.g.: # cat /sys/devices/virtual/block/dm-*/queue/discard_granularity 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Also I updated the second F22 system to F23 and have the same problem there, using 4.2.6-300.fc23.x86_64.
I did some bisecting over the weekend, between 64291f7 v4.2 (bad) and b953c0d v4.1 (good): 34fbcf6257eb3f39a5b78a4f51b40f881b82033b is the first bad commit commit 34fbcf6257eb3f39a5b78a4f51b40f881b82033b Author: Joe Thornber <ejt> Date: Thu Apr 16 12:58:35 2015 +0100 dm thin: range discard support However, with the bad commits between and including this and 64291f7 v4.2, fstrim -av still does something (process the filesystems), the discards just aren't processed by the thin pool. Tested on the second system that I updated to F23.
Yep - the issue has been already noticed. As a 'hotfix' - could you try to use 4.3 kernel (e.g. from koji)?
Testing using 4.3.0 from koji: # fstrim -av # uname -r 4.3.0-1.fc24.x86_64 Still broken after an explicit fstrim -v /srv/vms: # fstrim -v /srv/vms /srv/vms: 77 GiB (82648084480 bytes) trimmed # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 27G 77G 26% /srv/vms LV Discards Data% lv_srv_vms nopassdown 34.12 Haven't tested 4.4.0-rc yet.
Same problem using: # uname -r 4.4.0-0.rc1.git3.1.fc24.x86_64
Hmm - are you sure 'trim' doesn't work. While I'm aware it shows '0' in places where it shouldn't, the trim itself seemed to work. So - please - 'consume couple GB' in filesystem & sync. Watch for pool fullness. Remove file and run 'fstrim' Pool size should have dropped. Is this working for you with 4.3 kernel ?
Yes, I'm sure :) Exact sequence of commands using 4.3: ~ # uname -r 4.3.0-1.fc24.x86_64 ~ # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 27G 77G 26% /srv/vms LV Discards Data% lv_srv_vms nopassdown 34.12 ~ # cd /srv/vms/ vms # mkdir TEST-DELETE-ME vms # cp -r centos4/ TEST-DELETE-ME/ vms # sync vms # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 28G 76G 27% /srv/vms LV Discards Data% lv_srv_vms nopassdown 34.16 vms # rm -rf TEST-DELETE-ME/ vms # sync vms # fstrim -av vms # fstrim -v /srv/vms /srv/vms: 77 GiB (82648084480 bytes) trimmed vms # sync vms # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 27G 77G 26% /srv/vms LV Discards Data% lv_srv_vms nopassdown 34.12
Looking at your results: Before 'cp -r' pool usage 34.12% After copy 34.16% (0.04% difference) After 'rm -r' & fstrim goes back to 34.12% So IMHO TRIM does work in your case. (And I assume with 4.2 this will not happen - value will not drop down) What else do you expect here ?
No, it should drop down to approximately 26% - as it does with 4.1.
To make it clearer: vms # cp -r centos4/ debian-* TEST-DELETE-ME/ vms # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 30G 74G 29% /srv/vms LV Discards Data% lv_srv_vms nopassdown 34.13 vms # mkdir TEST-DELETE-ME-2 vms # cp -r centos4/ debian-* TEST-DELETE-ME-2/ vms # sync vms # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 34G 70G 33% /srv/vms LV Discards Data% lv_srv_vms nopassdown 36.05 vms # rm -rf TEST-DELETE-ME* vms # sync vms # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 27G 77G 26% /srv/vms LV Discards Data% lv_srv_vms nopassdown 36.05 vms # fstrim -v /srv/vms /srv/vms: 7.5 GiB (8030244864 bytes) trimmed vms # sync vms # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 27G 77G 26% /srv/vms LV Discards Data% lv_srv_vms nopassdown 35.86
I think you miss the 'filesystem' layout here. If you start with 34.12% - you can't expect it will drop to 26%. What 'fstrim' does is it is finding 'empty' areas in filesystem - and sending there 'trim' - then it heavily depends on alignment of filesystem meta/data and thin-pool chunksize (blocksize) - when it's finished - it reports total amount of trimmed space - but this does not need to match what thin-pool has trimmed. Thin-pool does not support any 'partial' trimmming (and you have not actually shown yet which chunksize/blocksize your thinpool is using ? (lvs -o+chunksize) Example of problem - if you use 4M chunk size - and filesystem will trim just about 3MB from this single chunk - thin-pool is still not able to show you and space reduction - simply because chunk has not been 'fully' released - so it still appears as allocated. So while 'fstrim' on 4.2 kernel should report an ioctl error, on 4.3 it's working and thin-pool does it's work and 'drop' data usage.
# lvs -o+chunksize LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Chunk docker-pool vg_jelinek twi-a-t--- 19.41g 4.15 0.08 512.00k lv_fedora_root vg_jelinek -wi-ao---- 12.00g 0 lv_halde vg_jelinek Vwi-aotz-- 571.00g thinpool 95.96 0 lv_home vg_jelinek Vwi-aotz-- 120.00g thinpool 88.50 0 lv_moody vg_jelinek Vwi-a-tz-- 20.00g thinpool 42.22 0 lv_srv_vms vg_jelinek Vwi-aotz-- 105.00g thinpool 35.86 0 lv_swap vg_jelinek -wi-ao---- 8.00g 0 thinpool vg_jelinek twi-aotz-- 861.05g 81.32 62.91 64.00k
So IMHO - discard 'does' something - but it's nowhere near as good as it's supposed to be - and it's most likely related to your bisected 'range support' commit. So leaving to Joe for further comments here (I assume he already does work on this issue)
The thin LV started at 34% because TRIM already didn't work in an earlier try. Also, 4.1 *does* TRIM it down to approximately 26% (approximately %full on the fs). To test this further I filled the fs with some data such that the LV is 50% full: # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 39G 65G 37% /srv/vms LV Discards Data% lv_srv_vms nopassdown 50.13 Then I delete the data again and TRIM: # rm -rf TEST-DELETE-ME* # sync # fstrim -v /srv/vms /srv/vms: 17.8 GiB (19052351488 bytes) trimmed # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 27G 77G 26% /srv/vms LV Discards Data% lv_srv_vms nopassdown 49.29 So it's up to 49.29% now. Can this still be an alignment issue?
(In reply to Zdenek Kabelac from comment #3) > Yep - the issue has been already noticed. > > As a 'hotfix' - could you try to use 4.3 kernel (e.g. from koji)? The fix made it into v4.2 via this commit http://git.kernel.org/linus/aa0cd28d057fd4 (In reply to Mike Gerber from comment #1) > I dug a little deeper: fstrim -av ignores the filesystems as > discard_granularity is zero, e.g.: > > # cat /sys/devices/virtual/block/dm-*/queue/discard_granularity > 0 ... > 0 Did you configure lvm to disable discards, e.g.?: lvchange --discards ignore <pool> There is no other way that discard_granularity would be 0 for the thin-pool and thin devices.
(In reply to Zdenek Kabelac from comment #11) > So while 'fstrim' on 4.2 kernel should report an ioctl error, on 4.3 it's > working and thin-pool does it's work and 'drop' data usage. What error do you think the ioctl should report? Partial thinp block discards won't _ever_ error. The comments in this BZ speak to confusion about how discards work.
Mike Gerber, please provide the output of: lsblk -D
(In reply to Mike Snitzer from comment #15) > Did you configure lvm to disable discards, e.g.?: > lvchange --discards ignore <pool> > > There is no other way that discard_granularity would be 0 for the thin-pool > and thin devices. No, it's "nopassdown" as the underlying PV (a LUKS volume on a MD RAID1) does not TRIM. So, as posted before ("Discards" column): # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 27G 77G 26% /srv/vms LV Discards Data% lv_srv_vms nopassdown 49.29 # cat /sys/devices/virtual/block/dm-*/queue/discard_granularity 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # uname -r 4.3.0-1.fc24.x86_64 # lsblk -D NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO sda 0 0B 0B 0 ├─sda1 0 0B 0B 0 │ └─md0 0 0B 0B 0 ├─sda2 0 0B 0B 0 └─sda5 0 0B 0B 0 └─md3 0 0B 0B 0 └─luks-438be9cc-ddef-48b2-bf01-88958f0d74db 0 0B 0B 0 ├─vg_jelinek-lv_fedora_root 0 0B 0B 0 ├─vg_jelinek-lv_swap 0 0B 0B 0 ├─vg_jelinek-thinpool_tmeta 0 0B 0B 0 │ └─vg_jelinek-thinpool-tpool 0 0B 0B 0 │ ├─vg_jelinek-thinpool 0 0B 0B 0 │ ├─vg_jelinek-lv_moody 0 0B 0B 0 │ ├─vg_jelinek-lv_srv_vms 0 0B 0B 0 │ ├─vg_jelinek-lv_home 0 0B 0B 0 │ └─vg_jelinek-lv_halde 0 0B 0B 0 ├─vg_jelinek-thinpool_tdata 0 0B 0B 0 │ └─vg_jelinek-thinpool-tpool 0 0B 0B 0 │ ├─vg_jelinek-thinpool 0 0B 0B 0 │ ├─vg_jelinek-lv_moody 0 0B 0B 0 │ ├─vg_jelinek-lv_srv_vms 0 0B 0B 0 │ ├─vg_jelinek-lv_home 0 0B 0B 0 │ └─vg_jelinek-lv_halde 0 0B 0B 0 ├─vg_jelinek-docker--pool_tmeta 0 0B 0B 0 │ └─vg_jelinek-docker--pool 0 0B 0B 0 └─vg_jelinek-docker--pool_tdata 0 0B 0B 0 └─vg_jelinek-docker--pool 0 0B 0B 0 sdb 0 0B 0B 0 ├─sdb1 0 0B 0B 0 │ └─md0 0 0B 0B 0 ├─sdb2 0 0B 0B 0 └─sdb5 0 0B 0B 0 └─md3 0 0B 0B 0 └─luks-438be9cc-ddef-48b2-bf01-88958f0d74db 0 0B 0B 0 ├─vg_jelinek-lv_fedora_root 0 0B 0B 0 ├─vg_jelinek-lv_swap 0 0B 0B 0 ├─vg_jelinek-thinpool_tmeta 0 0B 0B 0 │ └─vg_jelinek-thinpool-tpool 0 0B 0B 0 │ ├─vg_jelinek-thinpool 0 0B 0B 0 │ ├─vg_jelinek-lv_moody 0 0B 0B 0 │ ├─vg_jelinek-lv_srv_vms 0 0B 0B 0 │ ├─vg_jelinek-lv_home 0 0B 0B 0 │ └─vg_jelinek-lv_halde 0 0B 0B 0 ├─vg_jelinek-thinpool_tdata 0 0B 0B 0 │ └─vg_jelinek-thinpool-tpool 0 0B 0B 0 │ ├─vg_jelinek-thinpool 0 0B 0B 0 │ ├─vg_jelinek-lv_moody 0 0B 0B 0 │ ├─vg_jelinek-lv_srv_vms 0 0B 0B 0 │ ├─vg_jelinek-lv_home 0 0B 0B 0 │ └─vg_jelinek-lv_halde 0 0B 0B 0 ├─vg_jelinek-docker--pool_tmeta 0 0B 0B 0 │ └─vg_jelinek-docker--pool 0 0B 0B 0 └─vg_jelinek-docker--pool_tdata 0 0B 0B 0 └─vg_jelinek-docker--pool 0 0B 0B 0
(In reply to Mike Snitzer from comment #16) > The comments in this BZ speak to confusion about how discards work. I am not sure if I am confused: an fstrim on the same FS using 4.1.0 *does* reduce thin "Data%" to approximately "Use%" (df), just as I expect. A fstrim using a kernel after and including the bisected commit (mentioned above) does not. Unless the expected behaviour changed, it's a bug.
comment#18 shows that discards simply aren't enabled on _any_ of your devices. That obviously isn't what 'no_discard_passdown' should do. Please provide the output of: dmsetup table (In reply to Mike Gerber from comment #19) > (In reply to Mike Snitzer from comment #16) > > The comments in this BZ speak to confusion about how discards work. > > I am not sure if I am confused: an fstrim on the same FS using 4.1.0 *does* > reduce thin "Data%" to approximately "Use%" (df), just as I expect. A fstrim > using a kernel after and including the bisected commit (mentioned above) > does not. Unless the expected behaviour changed, it's a bug. I was saying Zdenek was confused in places. I agree we need to get to the bottom of your issue. Odd that fstrim is saying it discarded blocks yet the underlying device is _not_ advertising that it actually supports discards.
(In reply to Mike Snitzer from comment #20) > comment#18 shows that discards simply aren't enabled on _any_ of your > devices. That obviously isn't what 'no_discard_passdown' should do. > > Please provide the output of: dmsetup table # dmsetup table vg_jelinek-lv_fedora_root: 0 25165824 linear 253:0 1827742080 vg_jelinek-lv_srv_vms: 0 220200960 thin 253:7 3 vg_jelinek-lv_home: 0 251658240 thin 253:7 4 vg_jelinek-docker--pool: 0 40706048 thin-pool 253:13 253:14 1024 0 2 skip_block_zeroing no_discard_passdown vg_jelinek-thinpool: 0 1805762560 linear 253:7 0 vg_jelinek-lv_swap: 0 16777216 linear 253:0 1652556160 vg_jelinek-docker--pool_tdata: 0 40706048 linear 253:0 1505755520 vg_jelinek-lv_halde: 0 1197473792 thin 253:7 5 vg_jelinek-docker--pool_tmeta: 0 1957888 linear 253:0 1806770560 vg_jelinek-lv_moody: 0 41943040 thin 253:7 2 luks-438be9cc-ddef-48b2-bf01-88958f0d74db: 0 1952492696 crypt aes-cbc-essiv:sha256 0000000000000000000000000000000000000000000000000000000000000000 0 9:3 2056 vg_jelinek-thinpool-tpool: 0 1805762560 thin-pool 253:3 253:4 128 0 1 no_discard_passdown vg_jelinek-thinpool_tdata: 0 104857600 linear 253:0 1547698560 vg_jelinek-thinpool_tdata: 104857600 40894464 linear 253:0 1048960 vg_jelinek-thinpool_tdata: 145752064 16818176 linear 253:0 1935671680 vg_jelinek-thinpool_tdata: 162570240 209715200 linear 253:0 1296040320 vg_jelinek-thinpool_tdata: 372285440 41943040 linear 253:0 1893728640 vg_jelinek-thinpool_tdata: 414228480 16777216 linear 253:0 230687104 vg_jelinek-thinpool_tdata: 431005696 188743680 linear 253:0 41943424 vg_jelinek-thinpool_tdata: 619749376 1048576000 linear 253:0 247464320 vg_jelinek-thinpool_tdata: 1668325376 137437184 linear 253:0 1669333376 vg_jelinek-thinpool_tmeta: 0 1048576 linear 253:0 384 > > I am not sure if I am confused: an fstrim on the same FS using 4.1.0 *does* > > reduce thin "Data%" to approximately "Use%" (df), just as I expect. A fstrim > > using a kernel after and including the bisected commit (mentioned above) > > does not. Unless the expected behaviour changed, it's a bug. > I was saying Zdenek was confused in places. Ok, just making sure :)
Mike, please try this fix: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.4&id=313b68157b94366d056b8704f2b4c3e1d0d8fe9e
(In reply to Mike Snitzer from comment #22) > Mike, please try this fix: > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/ > commit/?h=dm-4.4&id=313b68157b94366d056b8704f2b4c3e1d0d8fe9e Strike that, I have to revise this patch.. it is close but not quite right. Will have corrected fix shortly.
Side note: Your commit message also explains why the two systems with LUKS on something without discard capabilities are "bad" and the third system with LUKS on a SSD is "good".
Please try this: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.4&id=0fcb04d59351f790efb8da18edefd6ab4d9bbf3b Thanks!
(In reply to Mike Snitzer from comment #25) > Please try this: > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/ > commit/?h=dm-4.4&id=0fcb04d59351f790efb8da18edefd6ab4d9bbf3b It looks a little bit better, fstrim -av does work on the fs again, but the thin LV is not trimmed down as expected (Use% 30% vs Data% 51.02). (The first fstrim -av on the fs did trim >70G, not shown here.) I'll test with more data, i.e. filling over the 51.02%, later. # uname -r 4.4.0-rc1-DEBUGTHIN+ # fstrim -av /srv/vms: 0 B (0 bytes) trimmed /home: 0 B (0 bytes) trimmed /halde: 0 B (0 bytes) trimmed # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 31G 73G 30% /srv/vms LV Discards Data% lv_srv_vms nopassdown 51.02 # lsblk -D NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO sda 0 0B 0B 0 ├─sda1 0 0B 0B 0 │ └─md0 0 0B 0B 0 ├─sda2 0 0B 0B 0 └─sda5 0 0B 0B 0 └─md3 0 0B 0B 0 └─luks-438be9cc-ddef-48b2-bf01-88958f0d74db 0 0B 0B 0 ├─vg_jelinek-lv_fedora_root 0 0B 0B 0 ├─vg_jelinek-lv_swap 0 0B 0B 0 ├─vg_jelinek-thinpool_tmeta 0 0B 0B 0 │ └─vg_jelinek-thinpool-tpool 0 0B 0B 0 │ ├─vg_jelinek-thinpool 0 0B 0B 0 │ ├─vg_jelinek-lv_moody 0 64K 16G 0 │ ├─vg_jelinek-lv_srv_vms 0 64K 16G 0 │ ├─vg_jelinek-lv_home 0 64K 16G 0 │ └─vg_jelinek-lv_halde 0 64K 16G 0 ├─vg_jelinek-thinpool_tdata 0 0B 0B 0 │ └─vg_jelinek-thinpool-tpool 0 0B 0B 0 │ ├─vg_jelinek-thinpool 0 0B 0B 0 │ ├─vg_jelinek-lv_moody 0 64K 16G 0 │ ├─vg_jelinek-lv_srv_vms 0 64K 16G 0 │ ├─vg_jelinek-lv_home 0 64K 16G 0 │ └─vg_jelinek-lv_halde 0 64K 16G 0 ├─vg_jelinek-docker--pool_tmeta 0 0B 0B 0 │ └─vg_jelinek-docker--pool 0 0B 0B 0 └─vg_jelinek-docker--pool_tdata 0 0B 0B 0 └─vg_jelinek-docker--pool 0 0B 0B 0 sdb 0 0B 0B 0 ├─sdb1 0 0B 0B 0 │ └─md0 0 0B 0B 0 ├─sdb2 0 0B 0B 0 └─sdb5 0 0B 0B 0 └─md3 0 0B 0B 0 └─luks-438be9cc-ddef-48b2-bf01-88958f0d74db 0 0B 0B 0 ├─vg_jelinek-lv_fedora_root 0 0B 0B 0 ├─vg_jelinek-lv_swap 0 0B 0B 0 ├─vg_jelinek-thinpool_tmeta 0 0B 0B 0 │ └─vg_jelinek-thinpool-tpool 0 0B 0B 0 │ ├─vg_jelinek-thinpool 0 0B 0B 0 │ ├─vg_jelinek-lv_moody 0 64K 16G 0 │ ├─vg_jelinek-lv_srv_vms 0 64K 16G 0 │ ├─vg_jelinek-lv_home 0 64K 16G 0 │ └─vg_jelinek-lv_halde 0 64K 16G 0 ├─vg_jelinek-thinpool_tdata 0 0B 0B 0 │ └─vg_jelinek-thinpool-tpool 0 0B 0B 0 │ ├─vg_jelinek-thinpool 0 0B 0B 0 │ ├─vg_jelinek-lv_moody 0 64K 16G 0 │ ├─vg_jelinek-lv_srv_vms 0 64K 16G 0 │ ├─vg_jelinek-lv_home 0 64K 16G 0 │ └─vg_jelinek-lv_halde 0 64K 16G 0 ├─vg_jelinek-docker--pool_tmeta 0 0B 0B 0 │ └─vg_jelinek-docker--pool 0 0B 0B 0 └─vg_jelinek-docker--pool_tdata 0 0B 0B 0 └─vg_jelinek-docker--pool 0 0B 0B 0 # lvs -o+chunksize,discards LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Chunk Discards docker-pool vg_jelinek twi-a-t--- 19.41g 4.15 0.08 512.00k nopassdown lv_fedora_root vg_jelinek -wi-ao---- 12.00g 0 lv_halde vg_jelinek Vwi-aotz-- 571.00g thinpool 95.96 0 nopassdown lv_home vg_jelinek Vwi-aotz-- 120.00g thinpool 88.52 0 nopassdown lv_moody vg_jelinek Vwi-a-tz-- 20.00g thinpool 42.22 0 nopassdown lv_srv_vms vg_jelinek Vwi-aotz-- 105.00g thinpool 51.02 0 nopassdown lv_swap vg_jelinek -wi-ao---- 8.00g 0 thinpool vg_jelinek twi-aotz-- 861.05g 83.17 64.44 64.00k nopassdown
lrwxrwxrwx. 1 root root 8 Nov 23 23:36 vg_jelinek-lv_srv_vms -> ../dm-10 ==> /sys/devices/virtual/block/dm-10/queue/discard_granularity <== 65536
# df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 48G 56G 47% /srv/vms LV Discards Data% lv_srv_vms nopassdown 54.70 # du -sh TEST-DELETE-ME* 3.5G TEST-DELETE-ME 3.5G TEST-DELETE-ME-2 3.5G TEST-DELETE-ME-3 3.5G TEST-DELETE-ME-4 3.5G TEST-DELETE-ME-5 # rm -rf TEST-DELETE-ME* # sync # fstrim -av /srv/vms: 19.6 GiB (21076836352 bytes) trimmed /home: 79.8 MiB (83632128 bytes) trimmed /halde: 0 B (0 bytes) trimmed # sync # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 31G 73G 30% /srv/vms LV Discards Data% lv_srv_vms nopassdown 53.47 # uname -r 4.4.0-rc1-DEBUGTHIN+
To reproduce: * Minimal install of F23 Server using netinstall, in a VM * Manual partitioning: put root on Thin provisioning * Note df -h and lvs * Fill / FS with some data * Delete data again * fstrim -av (does nothing) * fstrim -v / (does something) * Look at df -h and lvs
We already have pretty extensive testing in place with device-mapper-test-suite. (Need to make sure we add explicit coverage for underlying data device that doesn't support discards though...) But in the following test I've made sure to use a data device that does _not_ support discards (and 'no_discard_passdown' is used to match the config used by reporter of this BZ). I've modified the device-mapper-test-suite's /discard_with_fstrim_passdown_false/ test to be (ruby code): def discard_with_fstrim_passdown(passdown) dir = "./mnt1" @size = gig(4) file_size = @size / 20 files = (0..9).reduce([]) {|memo, obj| memo << "file_#{obj}"} with_standard_pool(@size, :error_if_no_space => true, :discard_passdown => passdown) do |pool| with_new_thin(pool, @size * 2, 0) do |thin| ProcessControl.run("lsblk -D #{thin}") # record the discard limits fs = FS::file_system(:xfs, thin) fs.format fs.with_mount(dir, :discard => false) do s = PoolStatus.new(pool) # record "before" data usage in log data_before = s.used_data_blocks Dir.chdir(dir) do files.each do |f| ProcessControl.run("dd if=/dev/zero of=#{f} bs=1M count=#{file_size / meg(1)} oflag=direct") end PoolStatus.new(pool) # record "during" data usage in log files.each do |f| ProcessControl.run("rm #{f}") end end ProcessControl.run("fstrim -v #{dir}") s = PoolStatus.new(pool) # verify that all data blocks were recovered via fstrim (assumes FS alignment allows this) s.used_data_blocks.should == data_before s.options[:mode].should == :read_write end end end end I'll attach test logs from running this test against a 4.4-rc1 kernel with the patch from comment#25 applied. The logs show that fstrim against XFS works whereas fstrim against ext4 does _not_. (we may need to clone this BZ to chase why ext4 isn't working like we'd hope.. could be an alignment issue where every allocated block somehow gets some ext4 metadata sprinkled in it?, cc'ing Eric Sandeen)
Created attachment 1098331 [details] passing log from /discard_with_fstrim_passdown_false/ with XFS
Created attachment 1098332 [details] failing log from /discard_with_fstrim_passdown_false/ with ext4
I am also using ext4.
I do not know much about ext4 externals, but here is evidence against the "metadata sprinkling" theory: I booted 4.0.4 from F22 and ran fstrim against the same FS above -- Data% is now at approximately Use% of the FS. Note the change in the lvs output. # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 31G 73G 30% /srv/vms LV Discards Data% lv_srv_vms nopassdown 53.51 # fstrim -av /srv/vms: 72.3 GiB (77631696896 bytes) trimmed /halde: 23.1 GiB (24801894400 bytes) trimmed /home: 13.6 GiB (14594449408 bytes) trimmed # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 31G 73G 30% /srv/vms LV Discards Data% lv_srv_vms nopassdown 31.26 # uname -r 4.0.4-301.fc22.x86_64
Joe has fixed various issues with the range discard support in thinp. But even before these fixes it became clear that your ext4 test is missing an important step: unlike XFS, you _must_ "sync" after deleting the files (before running fstrim). But please feel free to try this kernel: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.5 Specifically these patches: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.5&id=993ceab91986e2e737ce9a3e23bebc8cce649240 https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.5&id=a3bb7274526aa3146adbcdc9103cb0e67584b1be (the first patch being the most critical)
Looks good! # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 49G 55G 47% /srv/vms LV Discards Data% lv_srv_vms nopassdown 52.03 # du -sch TEST-DELETE-ME* 3.5G TEST-DELETE-ME 3.5G TEST-DELETE-ME-2 3.5G TEST-DELETE-ME-3 3.5G TEST-DELETE-ME-4 3.5G TEST-DELETE-ME-5 18G total # sync # rm -rf TEST-DELETE-ME* # sync # fstrim -av /srv/vms: 72.1 GiB (77385101312 bytes) trimmed /halde: 23.1 GiB (24801894400 bytes) trimmed /home: 14.9 GiB (15980175360 bytes) trimmed # df -h /srv/vms/; lvs -o lv_name,discards,data_percent /dev/vg_jelinek/lv_srv_vms Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jelinek-lv_srv_vms 104G 32G 72G 31% /srv/vms LV Discards Data% lv_srv_vms nopassdown 31.58 Using dm-4.5 (66acf19a).
Tested on a second system using dm-4.5, also looks good there.
This fix will be sent to Linus for 4.4-rc5 inclusion tomorrow: https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.4&id=993ceab91986e2e737ce9a3e23bebc8cce649240
FYI Using 4.3.4-200.fc22.x86_64 on F22: * fstrim -av still does nothing * fstrim /, fstrim /home frees space in the pool
Fixed on F22 using 4.4.3-201.fc22.x86_64. Thanks!