Bug 1628378 - dm-cache does not pass discard I/Os to origin storage device
Summary: dm-cache does not pass discard I/Os to origin storage device
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.5
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Mike Snitzer
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1577173
TreeView+ depends on / blocked
 
Reported: 2018-09-12 20:45 UTC by David Jeffery
Modified: 2019-08-06 12:11 UTC (History)
12 users (show)

Fixed In Version: kernel-3.10.0-1017.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-06 12:10:21 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3613881 None None None 2018-09-14 15:40:29 UTC
Red Hat Product Errata RHSA-2019:2029 None None None 2019-08-06 12:11:01 UTC

Description David Jeffery 2018-09-12 20:45:36 UTC
A customer had a dm-cache device set up to cache for a thin rbd device.  The dm-cache device was mounted and used for an xfs filesystem.  The customer was running fstrim to periodically discard unused blocks of the xfs filesystem.  However, the customer noticed that the space used for the rbd device never went down regardless of how much data was removed from the xfs filesystem or how fstrim was run.

On a local test system, it was also seen that dm-cache does not pass discards through the dm-cache device.  blktrace of the dm-cache device would show discard I/Os sent by xfs to the dm-cache device when running fstrim.  But blktrace attached to the origin device underlying the dm-cache device would never see any discards.

In its process_discard_bio function, dm-cache examines a discard bio and records block regions discarded by the discard bio in dm-cache's metadata.  It then completes the bio without sending any discard bio to its origin or cache-providing device.  Consequently, the origin device for a dm-cache device always functions as if it is thick.  No discard ever reaches it to allow it to release no longer used blocks.


Version-Release number of selected component (if applicable):
kernel-3.10.0-862.el7

How reproducible:
Reproducible with discard-supporting storage.

Steps to Reproduce:
1.  Create dm-cache device on top of storage which supports discard.
2.  mkfs the dm-cache device as xfs and mount the xfs filesystem
3.  Have blktrace listen to the origin storage device.
4.  Run fstrim against the xfs filesystem.
5.  The blktrace output will show no discard I/Os sent to the origin device even as fstrim reports sending discards

Actual results:
Discard does not occur on thin storage under dm-cache device

Expected results:
Discard should release unused storage even with dm-cache in use.

Additional info:
Upstream has the same behavior.

Comment 2 John Pittman 2018-09-13 13:29:13 UTC
Verifying on virtual system.  I/O not being passed down.  Reproduction shown below:

  LV              VG  Attr       LSize   Pool   Origin         Data%  Meta%  Move Log Cpy%Sync Convert
  [data]          vg1 Cwi---C--- 128.00m                       13.57  0.05            0.00            
  [data_cdata]    vg1 Cwi-ao---- 128.00m                                                              
  [data_cmeta]    vg1 ewi-ao---- 128.00m                                                              
  [lvol0_pmspare] vg1 ewi------- 128.00m                                                              
  origin          vg1 Cwi-aoC--- 128.00m [data] [origin_corig] 13.57  0.05            0.00            
  [origin_corig]  vg1 owi-aoC--- 128.00m                                 

/dev/mapper/vg1-origin on /root/cache type ext4 (rw,relatime,stripe=64,data=ordered)

[root@localhost ~]# dd if=/dev/zero of=cache/testfile bs=1M count=50 conv=fsync
50+0 records in
50+0 records out
52428800 bytes (52 MB) copied, 0.150666 s, 348 MB/s

[root@localhost ~]# rm cache/testfile 
rm: remove regular file ‘cache/testfile’? y

[root@localhost ~]# blktrace -d /dev/mapper/vg1-origin &
[1] 1817
[root@localhost ~]# blktrace -d /dev/sdf &
[1] 1857

[root@localhost ~]# fstrim -v cache/
cache/: 55.8 MiB (58456064 bytes) trimmed

[root@localhost ~]# kill $(pidof blktrace)

[root@localhost ~]# blkparse dm-4.blktrace.0
Input file dm-4.blktrace.0 added
Input file dm-4.blktrace.1 added
253,4    0        1     0.000000000  1820  Q   D 8802 + 7584 [fstrim]
253,4    0        2     0.000026766    30  C   D 8802 + 7584 [0]
253,4    0        3     0.000039741  1820  Q   D 16902 + 15868 [fstrim]
253,4    0        4     0.000061083    30  C   D 16902 + 15868 [0]
253,4    0        5     0.000070234  1820  Q   D 32770 + 16384 [fstrim]
253,4    0        6     0.000091269    30  C   D 32770 + 16384 [0]
253,4    0        7     0.000100088  1820  Q   D 49670 + 15868 [fstrim]
253,4    0        8     0.000120463    30  C   D 49670 + 15868 [0]
253,4    0        9     0.000128744  1820  Q   D 65538 + 16384 [fstrim]
253,4    0       10     0.000149603    30  C   D 65538 + 16384 [0]
253,4    0       11     0.000158058  1820  Q   D 82438 + 15868 [fstrim]
253,4    0       12     0.000178476    30  C   D 82438 + 15868 [0]
253,4    0       13     0.000185858  1820  Q   D 106498 + 8192 [fstrim]
253,4    0       14     0.000198713    30  C   D 106498 + 8192 [0]
253,4    0       15     0.000206421  1820  Q   D 115206 + 15868 [fstrim]
253,4    0       16     0.000226850    30  C   D 115206 + 15868 [0]
253,4    0       17     0.000234823  1820  Q   D 131074 + 16384 [fstrim]
253,4    0       18     0.000255777    30  C   D 131074 + 16384 [0]
253,4    0       19     0.000263875  1820  Q   D 147974 + 15868 [fstrim]
253,4    0       20     0.000284202    30  C   D 147974 + 15868 [0]
253,4    0       21     0.000328166  1820  Q   D 163842 + 16384 [fstrim]
253,4    0       22     0.000369097    30  C   D 163842 + 16384 [0]
253,4    0       23     0.000377653  1820  Q   D 180226 + 16384 [fstrim]
253,4    0       24     0.000398561    30  C   D 180226 + 16384 [0]
253,4    0       25     0.000433754  1820  Q   D 196610 + 16384 [fstrim]
253,4    0       26     0.000454838    30  C   D 196610 + 16384 [0]
253,4    0       27     0.000462392  1820  Q   D 212994 + 16384 [fstrim]
253,4    0       28     0.000483178    30  C   D 212994 + 16384 [0]
253,4    0       29     0.000501903  1820  Q  RM 548 + 2 [fstrim]
253,4    0       30     0.001638755     0  C  RM 548 + 2 [0]
253,4    0       31     0.001673387  1820  Q   D 229378 + 16384 [fstrim]
253,4    0       32     0.001701940    30  C   D 229378 + 16384 [0]
253,4    0       33     0.001714154  1820  Q   D 245762 + 16382 [fstrim]
253,4    0       34     0.001735589    30  C   D 245762 + 16382 [0]
CPU0 (dm-4):
 Reads Queued:           1,        1KiB	 Writes Queued:          16,  121,285KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        0,        0KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        1,        1KiB	 Writes Completed:       16,  121,285KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             0
 IO unplugs:             0        	 Timer unplugs:           0
Throughput (R/W): 1,000KiB/s / 121,285,000KiB/s
Events (dm-4): 34 entries
Skips: 0 forward (0 -   0.0%)


[root@localhost ~]# blkparse sdf.blktrace.0
Input file sdf.blktrace.1 added
Throughput (R/W): 0KiB/s / 0KiB/s
Events (sdf): 0 entries
Skips: 0 forward (0 -   0.0%)

Comment 3 Zdenek Kabelac 2018-09-13 13:53:55 UTC
Just a side comment - this is a known existing limitation ATM  (so technically not a bug, just missing support, so it's more or less RFE).

For this reason lvm2 provides caching of thin-pool data LV not the thinLV itself - so thinLV provides a trimming capability, while caching of is handling on data chunk level.   This solution however has also it's limits as lvm2 ATM is not providing automatic extension of cached dataLV.

Comment 10 Bruno Meneguele 2019-03-08 20:48:47 UTC
Patch(es) committed on kernel-3.10.0-1017.el7

Comment 13 Roman Bednář 2019-07-04 10:04:47 UTC
Marking verified, discard is now passed down to underlying device of dm-cache properly.


BEFORE FIX:

kernel-3.10.0-957.21.3.el7.x86_64

# lvs -a -o lv_name,lv_attr,segtype
  LV              Attr       Type
  root            -wi-ao---- linear
  swap            -wi-ao---- linear
  [CPOOL]         Cwi---C--- cache-pool
  [CPOOL_cdata]   Cwi-ao---- linear
  [CPOOL_cmeta]   ewi-ao---- linear
  [lvol0_pmspare] ewi------- linear
  origin          Cwi-aoC--- cache
  [origin_corig]  owi-aoC--- linear

# mount | grep cache
/dev/mapper/vg-origin on /mnt/cache type ext4 (rw,relatime,seclabel,stripe=64,data=ordered)

# dd if=/dev/zero of=/mnt/cache/testfile bs=1M count=50 conv=fsync
50+0 records in
50+0 records out
52428800 bytes (52 MB) copied, 0.140838 s, 372 MB/s

# rm /mnt/cache/testfile
rm: remove regular file ‘/mnt/cache/testfile’? y

# blktrace -d /dev/mapper/vg-origin &
[1] 11018

# blktrace -d /dev/sdf &
[2] 11022

# fstrim -v /mnt/cache
/mnt/cache: 91.3 MiB (95748096 bytes) trimmed

# kill $(pidof blktrace)

# blkparse dm-2.blktrace.0
Input file dm-2.blktrace.0 added
253,2    0        1     0.000000000 11142  Q   D 7020 + 9366 [fstrim]
253,2    0        2     0.000014697 11085  C   D 7020 + 9366 [0]
253,2    0        3     0.000019863 11142  Q   D 16902 + 15868 [fstrim]
253,2    0        4     0.000023668 11085  C   D 16902 + 15868 [0]
253,2    0        5     0.000026978 11142  Q   D 32770 + 16384 [fstrim]
253,2    0        6     0.000030786 11085  C   D 32770 + 16384 [0]
253,2    0        7     0.000033640 11142  Q   D 49670 + 15868 [fstrim]
253,2    0        8     0.000037293 11085  C   D 49670 + 15868 [0]
253,2    0        9     0.000041016 11142  Q   D 131074 + 16384 [fstrim]
253,2    0       10     0.000044729 11085  C   D 131074 + 16384 [0]
253,2    0       11     0.000047490 11142  Q   D 147974 + 15868 [fstrim]
253,2    0       12     0.000050968 11085  C   D 147974 + 15868 [0]
253,2    0       13     0.000053827 11142  Q   D 163842 + 16384 [fstrim]
253,2    0       14     0.000057346 11085  C   D 163842 + 16384 [0]
253,2    0       15     0.000059789 11142  Q   D 180226 + 16384 [fstrim]
253,2    0       16     0.000063332 11085  C   D 180226 + 16384 [0]
253,2    0       17     0.000066372 11142  Q   D 196610 + 8190 [fstrim]
253,2    0       18     0.000068941 11085  C   D 196610 + 8190 [0]
CPU0 (dm-2):
 Reads Queued:           0,        0KiB	 Writes Queued:           9,   65,348KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        0,        0KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        9,   65,348KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             0
 IO unplugs:             0        	 Timer unplugs:           0

Throughput (R/W): 0KiB/s / 0KiB/s
Events (dm-2): 18 entries
Skips: 0 forward (0 -   0.0%)

# blkparse sdf.blktrace.0
Input file sdf.blktrace.0 added

Throughput (R/W): 0KiB/s / 0KiB/s
Events (sdf): 0 entries.  <<<<<
Skips: 0 forward (0 -   0.0%)

==============================

AFTER FIX:

kernel-3.10.0-1059.el7.x86_64

# lvs -a -o lv_name,lv_attr,segtype
  LV              Attr       Type
  root            -wi-ao---- linear
  swap            -wi-ao---- linear
  [CPOOL]         Cwi---C--- cache-pool
  [CPOOL_cdata]   Cwi-ao---- linear
  [CPOOL_cmeta]   ewi-ao---- linear
  [lvol0_pmspare] ewi------- linear
  origin          Cwi-aoC--- cache
  [origin_corig]  owi-aoC--- linear

# mount | grep cache
/dev/mapper/vg-origin on /mnt/cache type ext4 (rw,relatime,seclabel,stripe=64,data=ordered)

# dd if=/dev/zero of=/mnt/cache/testfile bs=1M count=50 conv=fsync
50+0 records in
50+0 records out
52428800 bytes (52 MB) copied, 0.140838 s, 372 MB/s

# rm /mnt/cache/testfile
rm: remove regular file ‘/mnt/cache/testfile’? y

# blktrace -d /dev/mapper/vg-origin &
[1] 8564

# blktrace -d /dev/sdk &
[2] 8569

# fstrim -v /mnt/cache
/mnt/cache: 91.3 MiB (95748096 bytes) trimmed

# kill $(pidof blktrace)

# blkparse sdk.blktrace.0
Input file sdk.blktrace.0 added
  8,161  0        1     0.000000000  8615  A   D 9068 + 9366 <- (253,5) 7020
  8,160  0        2     0.000000340  8615  A   D 9131 + 9366 <- (8,161) 9068
  8,160  0        3     0.000001480  8615  Q   D 9131 + 9366 [kworker/0:0]
  8,160  0        4     0.000005986  8615  G   D 9131 + 9366 [kworker/0:0]
  8,160  0        5     0.000007231  8615  I   D 9131 + 9366 [kworker/0:0]
  8,160  0        6     0.000009061  8615  D   D 9131 + 9366 [kworker/0:0]
  8,160  0        7     0.000934638  8615  C   D 9131 + 9366 [0]
  8,161  0        8     0.001004929  8615  A   D 18950 + 15868 <- (253,5) 16902
  8,160  0        9     0.001005170  8615  A   D 19013 + 15868 <- (8,161) 18950
  8,160  0       10     0.001005518  8615  Q   D 19013 + 15868 [kworker/0:0]
  8,160  0       11     0.001006438  8615  G   D 19013 + 15868 [kworker/0:0]
  8,160  0       12     0.001006770  8615  I   D 19013 + 15868 [kworker/0:0]
  8,160  0       13     0.001007333  8615  D   D 19013 + 15868 [kworker/0:0]
  8,160  0       14     0.002278403  8615  C   D 19013 + 15868 [0]
  8,161  0       15     0.002295502  8615  A   D 34818 + 16384 <- (253,5) 32770
  8,160  0       16     0.002295730  8615  A   D 34881 + 16384 <- (8,161) 34818
  8,160  0       17     0.002296027  8615  Q   D 34881 + 16384 [kworker/0:0]
  8,160  0       18     0.002296705  8615  G   D 34881 + 16384 [kworker/0:0]
  8,160  0       19     0.002297025  8615  I   D 34881 + 16384 [kworker/0:0]
  8,160  0       20     0.002297483  8615  D   D 34881 + 16384 [kworker/0:0]
  8,160  0       21     0.003591576  8615  C   D 34881 + 16384 [0]
  8,161  0       22     0.003620191  8615  A   D 51718 + 15868 <- (253,5) 49670
  8,160  0       23     0.003620388  8615  A   D 51781 + 15868 <- (8,161) 51718
  8,160  0       24     0.003620697  8615  Q   D 51781 + 15868 [kworker/0:0]
  8,160  0       25     0.003621442  8615  G   D 51781 + 15868 [kworker/0:0]
  8,160  0       26     0.003621720  8615  I   D 51781 + 15868 [kworker/0:0]
  8,160  0       27     0.003622228  8615  D   D 51781 + 15868 [kworker/0:0]
  8,160  0       28     0.004859502  8615  C   D 51781 + 15868 [0]
  8,161  0       29     0.004875050  8615  A   D 133122 + 16384 <- (253,5) 131074
  8,160  0       30     0.004875249  8615  A   D 133185 + 16384 <- (8,161) 133122
  8,160  0       31     0.004875519  8615  Q   D 133185 + 16384 [kworker/0:0]
  8,160  0       32     0.004876210  8615  G   D 133185 + 16384 [kworker/0:0]
  8,160  0       33     0.004876466  8615  I   D 133185 + 16384 [kworker/0:0]
  8,160  0       34     0.004877001  8615  D   D 133185 + 16384 [kworker/0:0]
  8,160  0       35     0.006858941  8615  C   D 133185 + 16384 [0]
  8,161  0       36     0.006873758  8615  A   D 150022 + 15868 <- (253,5) 147974
  8,160  0       37     0.006873984  8615  A   D 150085 + 15868 <- (8,161) 150022
  8,160  0       38     0.006874288  8615  Q   D 150085 + 15868 [kworker/0:0]
  8,160  0       39     0.006874962  8615  G   D 150085 + 15868 [kworker/0:0]
  8,160  0       40     0.006875238  8615  I   D 150085 + 15868 [kworker/0:0]
  8,160  0       41     0.006875750  8615  D   D 150085 + 15868 [kworker/0:0]
  8,160  0       42     0.008149716  8615  C   D 150085 + 15868 [0]
  8,161  0       43     0.008163752  8615  A   D 165890 + 16384 <- (253,5) 163842
  8,160  0       44     0.008163974  8615  A   D 165953 + 16384 <- (8,161) 165890
  8,160  0       45     0.008164271  8615  Q   D 165953 + 16384 [kworker/0:0]
  8,160  0       46     0.008164912  8615  G   D 165953 + 16384 [kworker/0:0]
  8,160  0       47     0.008165174  8615  I   D 165953 + 16384 [kworker/0:0]
  8,160  0       48     0.008165634  8615  D   D 165953 + 16384 [kworker/0:0]
  8,160  0       49     0.009443429  8615  C   D 165953 + 16384 [0]
  8,161  0       50     0.009456659  8615  A   D 182274 + 16384 <- (253,5) 180226
  8,160  0       51     0.009456842  8615  A   D 182337 + 16384 <- (8,161) 182274
  8,160  0       52     0.009457138  8615  Q   D 182337 + 16384 [kworker/0:0]
  8,160  0       53     0.009457772  8615  G   D 182337 + 16384 [kworker/0:0]
  8,160  0       54     0.009458028  8615  I   D 182337 + 16384 [kworker/0:0]
  8,160  0       55     0.009458511  8615  D   D 182337 + 16384 [kworker/0:0]
  8,160  0       56     0.010718506  8615  C   D 182337 + 16384 [0]
  8,161  0       57     0.010731518  8615  A   D 198658 + 8190 <- (253,5) 196610
  8,160  0       58     0.010731728  8615  A   D 198721 + 8190 <- (8,161) 198658
  8,160  0       59     0.010732023  8615  Q   D 198721 + 8190 [kworker/0:0]
  8,160  0       60     0.010732678  8615  G   D 198721 + 8190 [kworker/0:0]
  8,160  0       61     0.010732946  8615  I   D 198721 + 8190 [kworker/0:0]
  8,160  0       62     0.010733406  8615  D   D 198721 + 8190 [kworker/0:0]
  8,160  0       63     0.011705810     0  C   D 198721 + 8190 [0]
CPU0 (sdk):
 Reads Queued:           0,        0KiB	 Writes Queued:           9,   65,348KiB
 Read Dispatches:        0,        0KiB	 Write Dispatches:        9,   65,348KiB
 Reads Requeued:         0		 Writes Requeued:         0
 Reads Completed:        0,        0KiB	 Writes Completed:        9,   65,348KiB
 Read Merges:            0,        0KiB	 Write Merges:            0,        0KiB
 Read depth:             0        	 Write depth:             1
 IO unplugs:             0        	 Timer unplugs:           0

Throughput (R/W): 0KiB/s / 5,940,727KiB/s
Events (sdk): 63 entries.  <<<<<
Skips: 0 forward (0 -   0.0%)

Comment 15 errata-xmlrpc 2019-08-06 12:10:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2029


Note You need to log in before you can comment on or make changes to this bug.