RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 998861 - [lvm-thinp] discards passdown does not work as promised
Summary: [lvm-thinp] discards passdown does not work as promised
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: LVM and device-mapper development team
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1003484
TreeView+ depends on / blocked
 
Reported: 2013-08-20 09:04 UTC by Xiaowei Li
Modified: 2023-03-08 07:26 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1003484 (view as bug list)
Environment:
Last Closed: 2015-04-10 23:42:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
testlog.txt (7.21 KB, text/plain)
2013-08-20 09:07 UTC, Xiaowei Li
no flags Details
blkparse of dm-3 (197.18 KB, text/plain)
2013-08-20 13:25 UTC, Zdenek Kabelac
no flags Details
Traces for underlaying virtual iscsi debug device (241.30 KB, text/plain)
2013-08-20 13:27 UTC, Zdenek Kabelac
no flags Details

Description Xiaowei Li 2013-08-20 09:04:15 UTC
Description of problem:


Version-Release number of selected component (if applicable):
lvm2-2.02.99-1.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. using scsi_debug to create the disk supports discard
# modprobe scsi-debug dev_size_mb=256 lbpu=1 lbpws10=1

2. create vg/thin pool/thin LV based on the devices then create filesystem on it
# lvcreate -V200m -l99%FREE -T tsvg/pool -n lv1 --discards passdown

3. create files on the filesystem then remove it
4. using fstrim to issue discard
5. check the data percentage of thin pool and thin LV

Actual results:
the data percentage is not changed after issuing fstrim. discards does not work as promised.


Expected results:


Additional info:

Comment 1 Xiaowei Li 2013-08-20 09:07:47 UTC
Created attachment 788388 [details]
testlog.txt

Comment 2 Xiaowei Li 2013-08-20 09:08:17 UTC
please refer to testlog.txt for details.

Comment 4 Zdenek Kabelac 2013-08-20 12:00:19 UTC
Hmm I've reproduced this with upstream kernel 3.10 as well.

Interestingly sometimes helps to deactivate & activate device and run fstrim again after mounting. I'd suspect kernel issue here.

Comment 5 Zdenek Kabelac 2013-08-20 13:25:29 UTC
Created attachment 788500 [details]
blkparse  of  dm-3

Traces from the moment discard/trim  does not work even though it should.
(FS on device has been just mounted  and  executed fstrim reported this:

# fstrim -v /mnt
/mnt: 91.3 MiB (95748096 bytes) trimmed




# dmsetup info -c
Name             Maj Min Stat Open Targ Event  UUID                                                                      
tvg-lv1          253   3 L--w    1    1      0 LVM-jLl3GpQlQ9bkcw199FQftRK5Y8ZALARdQfGaAfRb4Mbr8tPh2E92zYYFakgy6wCH      
tvg-pool-tpool   253   2 L--w    1    1      0 LVM-jLl3GpQlQ9bkcw199FQftRK5Y8ZALARderPCxP9HM7FNLn1uyqH4ZLAGbIMXzNDJ-tpool
tvg-pool_tdata   253   1 L--w    1    1      0 LVM-jLl3GpQlQ9bkcw199FQftRK5Y8ZALARd6j3CKjTUj3avruy5xN1oY0mg6m2CbYyE      
tvg-pool_tmeta   253   0 L--w    1    1      0 LVM-jLl3GpQlQ9bkcw199FQftRK5Y8ZALARduuOL1keIbQLYHGsKgaviC1w0dAj6h71S      


# dmsetup table
tvg-lv1: 0 204800 thin 253:2 1
tvg-pool-tpool: 0 106496 thin-pool 253:0 253:1 128 0 0 
tvg-pool_tdata: 0 106496 linear 8:16 10240
tvg-pool_tmeta: 0 8192 linear 8:16 116736

# dmsetup status
tvg-lv1: 0 204800 thin 57216 147583
tvg-pool-tpool: 0 106496 thin-pool 1 13/1024 447/832 - rw discard_passdown
tvg-pool_tdata: 0 106496 linear 
tvg-pool_tmeta: 0 8192 linear

Comment 6 Zdenek Kabelac 2013-08-20 13:27:07 UTC
Created attachment 788502 [details]
Traces for underlaying virtual iscsi debug device

Trace taken at the same time for scsi /dev/sdb device which is used as a PV for my tests  (and keeps data & metadata device)

Comment 7 Zdenek Kabelac 2013-08-20 13:36:32 UTC
lvs for devices after fstrim:

Non working case:

  LV    VG   Attr       LSize   Pool Origin Data%  Move Log Cpy%Sync Convert
  lv1   tvg  Vwi-aotz-- 100,00m pool         27,94                          
  pool  tvg  twi---tz--  52,00m              53,73                          


Working case:

  lv1   tvg  Vwi-aotz-- 100,00m pool          8,00
  pool  tvg  twi---tz--  52,00m              15,38

Comment 8 Mike Snitzer 2013-08-20 13:54:33 UTC
discards are flowing through the thin device (trace from comment#5), to the thin-pool, and down to the underlying device (trace from comment#6).

So if no free space is accumulating in the thin device (and backing thin-pool) then this may be a bio-prison issue, or other accounting bug.  Could be the entire block isn't considered discarded so no blocks get released.

Comment 9 Mike Snitzer 2013-08-20 14:04:24 UTC
I'm wondering if this is an alignment issue, e.g.: the fs _always_ has something in a thinp block.  As such thinp cannot discard the block.

The thin-pool blocksize is 64K.  Which FS is being used?

Does behavior change if you switch from using ext4 to xfs (or vice versa)?

Comment 10 Xiaowei Li 2013-08-21 03:15:47 UTC
(In reply to Mike Snitzer from comment #9)
> I'm wondering if this is an alignment issue, e.g.: the fs _always_ has
> something in a thinp block.  As such thinp cannot discard the block.
> 
> The thin-pool blocksize is 64K.  Which FS is being used?
> 
> Does behavior change if you switch from using ext4 to xfs (or vice versa)?

ext4 was used and the same behavior when using xfs.

Comment 11 Mike Snitzer 2013-08-21 15:19:45 UTC
(In reply to Xiaowei Li from comment #10)
> (In reply to Mike Snitzer from comment #9)
> > I'm wondering if this is an alignment issue, e.g.: the fs _always_ has
> > something in a thinp block.  As such thinp cannot discard the block.
> > 
> > The thin-pool blocksize is 64K.  Which FS is being used?
> > 
> > Does behavior change if you switch from using ext4 to xfs (or vice versa)?
> 
> ext4 was used and the same behavior when using xfs.

OK, but my point stands: ext4 and xfs could still be using portions of the block for filesystem metadata.  Thinp will only pass down discards to the underlying storage if the entire thinp block is no longer used at all.

There could be a thinp bug is lurking here.

But it'd be interesting to see if more care was taken to inform the filesystem about the underlying device's geomtry; does the filesystem allow for cleaner seperation of filesystem data and metadata areas?

(Cc'ing Eric, Dave and Carlos.)

Comment 12 Eric Sandeen 2013-08-21 22:38:02 UTC
ext4 might be a little "better" about constraining some metadata to certain areas, just as a function of its fixed inode table space, vs. xfs's dynamically allocated inodes.  Things would have to be fairly carefully sized accordingly; I have to remind myself how stripe units affect block group sizes.


There are also tools to look at all actual free space in the fs.

For ext4, 'dumpe2fs $DEVICE' will show (pretty verbosely) every free block.

for xfs, 'xfs_db -c "freesp -d" $DEVICE' will show free ranges as well.

-Eric

Comment 13 Mike Snitzer 2013-09-03 13:35:15 UTC
I really don't think there is a bug in the dm-thin kernel code (or in lvm2).  I think the FS is still using the thinp blocks, so thinp cannot discard them.

Please collect:
1) a thin_dump before issuing the fstrim.
2) a thin_dump after issuing the fstrim.

would also be useful to collect a blktrace of:
1) the thinp device to verify that the discards are being processed properly
2) the thin-pool's data volume to collect/see discards get passed down

Comment 18 Mike Snitzer 2013-11-19 17:55:07 UTC
comment#14 (fstrim doesn't work with XFS) vs comment#17 (fstrim works with btrfs) is really a question for the XFS developers.

Could it be that XFS needs an online discard flag set via mount, even though fstrim is being used for issuing async trims, whereas btrfs doesn't?

esandeen can you help and/or enlist the help of either Dave or Carlos?

Comment 19 Eric Sandeen 2013-11-19 19:57:01 UTC
no, xfs shouldn't need -o discard for fstrim to function.  Not sure what's going on but we'll look.

Comment 21 RHEL Program Management 2014-03-22 06:41:20 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.


Note You need to log in before you can comment on or make changes to this bug.