Bug 1563697 - Triming on full pool can trigger 'dm_pool_alloc_data_block' failed: error = -28
Summary: Triming on full pool can trigger 'dm_pool_alloc_data_block' failed: error = -28
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Mike Snitzer
QA Contact: Jakub Krysl
URL:
Whiteboard:
Depends On:
Blocks: 1609283
TreeView+ depends on / blocked
 
Reported: 2018-04-04 13:53 UTC by Zdenek Kabelac
Modified: 2018-10-30 08:55 UTC (History)
13 users (show)

Fixed In Version: kernel-3.10.0-921.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1609283 (view as bug list)
Environment:
Last Closed: 2018-10-30 08:54:47 UTC
Target Upstream Version:


Attachments (Terms of Use)
Reproducer for lvm2 test suite (1.19 KB, application/x-shellscript)
2018-04-04 13:53 UTC, Zdenek Kabelac
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1564186 0 urgent CLOSED XFS may block endlessly in xlog_wait() on IO error 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2018:3083 0 None None None 2018-10-30 08:55:48 UTC

Internal Links: 1564186

Description Zdenek Kabelac 2018-04-04 13:53:38 UTC
Created attachment 1417263 [details]
Reproducer for lvm2 test suite

Description of problem:

As reported on  'dm-devel' mailing list by: dennisyang at qnap com

https://www.redhat.com/archives/dm-devel/2018-April/msg00022.html

Recently we have came across an issue that dm-thin pool will be
switched to READ_ONLY mode because dm_pool_alloc_data_block() returns
-ENOSPC. AFAIK, this should not happen since alloc_data_block() will
check if there is any free space (and commit metadata if it first
reports no free space) before it allocates pool block. In addition,
total virtual space of all thin volumes is smaller than the pool
physical space in my testing environment which makes pool impossible
to run out of space.

This issue could be easily reproduced by the following steps.

1) Create a thin pool and a slightly smaller thin volume
> sudo dmsetup create meta --table "0 40000000 linear /dev/sdf 0"
> sudo dmsetup create data --table "0 10240000 linear /dev/md125 0"
> sudo dd if=/dev/zero of=/dev/mapper/meta bs=1M count=1
> sudo dmsetup create pool --table "0 10240000 thin-pool /dev/mapper/meta /dev/mapper/data 1024 0 2 skip_block_zeroing error_if_no_space"
> sudo dmsetup message pool 0 "create_thin 0"
> sudo dmsetup create thin --table "0 10238976 thin /dev/mapper/pool 0"

2) Make a filesystem and mount it
> sudo mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/mapper/thin
> sudo mount /dev/mapper/thin /mnt

3) Write a file to mount point until it takes all the space
> sudo dd if=/dev/zero of=/mnt/zero.img bs=1M

4) Remove this file and trim mount point
> sudo rm /mnt/zero.img
> sudo fstrim /mnt

Repeat step 3 and 4 multiple times and the pool will be switched to
READ_ONLY mode and need_checks flag will be set. Kernel message shows
the following messages.
[ 3952.723937] device-mapper: thin: 252:2: metadata operation
'dm_pool_alloc_data_block' failed: error = -28
[ 3952.723940] device-mapper: thin: 252:2: aborting current metadata transaction
[ 3952.725860] device-mapper: thin: 252:2: switching pool to read-only mode

This root cause of this issue is that dm-thin will first remove
mapping and increase corresponding blocks' reference count to prevent
them from being reused before DISCARD bios get processed by the
underlying layers. However. increasing blocks' reference count could
also increase the nr_allocated_this_transaction in struct sm_disk
which makes smd->old_ll.nr_allocated +
smd->nr_allocated_this_transaction bigger than smd->old_ll.nr_blocks.
In this case, alloc_data_block() will never commit metadata to reset
the begin pointer of struct sm_disk, because sm_disk_get_nr_free()
always return an underflow value.

--

As simple reproducer script shows - it may by easily hit also by
lvm2 thin-pool of 10MB size (using 4k extents to allow fine tuning of sizes)

--
Reproducer revealed another bug in XFS reachable this way (while ext4 properly fails).

Version-Release number of selected component (if applicable):
4.16

How reproducible:
Follow step in list reproduce - or use lvm2 test suite script to hit.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Joe Thornber 2018-06-13 15:59:40 UTC
Reproduced with the following trivial test case:

https://github.com/jthornber/thin-provisioning-tools/commit/570e11b2da6ee55bc755f5211cf80c4f942b4644

Comment 4 Mike Snitzer 2018-06-21 15:47:23 UTC
(In reply to Joe Thornber from comment #3)
> Reproduced with the following trivial test case:
> 
> https://github.com/jthornber/thin-provisioning-tools/commit/
> 570e11b2da6ee55bc755f5211cf80c4f942b4644

This test is just a basic discard test, but the discard ioctl code in this testsuite isn't working like it should.  If I switch away from using your small ft-lib provided C function to just calling out to the blkdiscard utility then everything works.

I chased this down because your test is _so_ basic (it just discards a full thin device) and I knew we had coverage for that in the device-mapper-test-suite.  SO I then launched into reconciling "why does Joe's new thin-provisioning-tools test not work?"

diff --git a/functional-tests/device-mapper/dm-tests.scm b/functional-tests/device-mapper/dm-tests.scm
index 0ca42c7..46778b9 100644
--- a/functional-tests/device-mapper/dm-tests.scm
+++ b/functional-tests/device-mapper/dm-tests.scm
@@ -677,9 +679,14 @@
                       (kilo 64)
                       (thin-pool-options error-if-no-space skip-block-zeroing))
        (with-new-thin (thin pool 0 data-size)
+         (assert-pool-used-data pool (kilo 64) (sectors 0))
          (zero-dev thin)
-         (discard (dm-device-path thin) 0 (to-sectors data-size))
-         (assert-pool-used-data pool (kilo 64) (sectors 0))))))
+         (assert-pool-used-data pool (kilo 64) (meg 128))
+         ;;(discard (dm-device-path thin) 0 (to-sectors data-size))
+        (discard (dm-device-path thin))
+        (info (dsp "raw status: ") (dsp (get-status pool)))
+         (assert-pool-used-data pool (kilo 64) (sectors 0))
+       ))))

   )

diff --git a/functional-tests/device-mapper/ioctl.scm b/functional-tests/device-mapper/ioctl.scm
index 5edb889..4fccbbf 100644
--- a/functional-tests/device-mapper/ioctl.scm
+++ b/functional-tests/device-mapper/ioctl.scm
@@ -50,6 +50,7 @@
   (import (chezscheme)
           (disk-units)
           (fmt fmt)
+         (process)
           (logging)
           (srfi s8 receive)
           (utils))
@@ -395,9 +396,17 @@
                (sectors result)
                (fail (fmt #f "get-dev-size failed: " r))))))

-  (define (discard dev sb se)
-    (define c-discard
-      (foreign-procedure "discard" (string unsigned-64 unsigned-64) int))
-    (unless (zero? (c-discard dev sb se))
-            (fail (fmt #f "discard ioctl failed"))))
+  (define (blkdiscard-cmd . args)
+    (build-command-line (cons "blkdiscard" args)))
+
+  (define discard
+    (case-lambda
+     ((dev)
+      (run-ok (blkdiscard-cmd dev)))
+
+     ((dev sb se)
+      (define c-discard
+       (foreign-procedure "discard" (string unsigned-64 unsigned-64) int))
+      (unless (zero? (c-discard dev sb se))
+             (fail (fmt #f "discard ioctl failed")))) ))
 )

Anyway, all said, I'll now switch to reproducing in terms of the lvm2 reproducer Zdenek attached via comment#0.

Comment 5 Mike Snitzer 2018-06-21 17:33:07 UTC
(In reply to Mike Snitzer from comment #4)
> 
> Anyway, all said, I'll now switch to reproducing in terms of the lvm2
> reproducer Zdenek attached via comment#0.

The lvm2 testsuite based reproducer attached via comment#0 works fine for me with an upstream 4.17 kernel.

(In reply to Zdenek Kabelac from comment #0)
> Created attachment 1417263 [details]
> Reproducer for lvm2 test suite
> 
> Description of problem:
> 
> As reported on  'dm-devel' mailing list by: dennisyang at qnap com
> 
> https://www.redhat.com/archives/dm-devel/2018-April/msg00022.html
...
> As simple reproducer script shows - it may by easily hit also by
> lvm2 thin-pool of 10MB size (using 4k extents to allow fine tuning of sizes)
> 
> --
> Reproducer revealed another bug in XFS reachable this way (while ext4
> properly fails).
> 
> Version-Release number of selected component (if applicable):
> 4.16

I'll try to reproduce with an upstream 4.16 now.

Comment 6 Mike Snitzer 2018-06-21 19:52:35 UTC
(In reply to Mike Snitzer from comment #5)
>
> I'll try to reproduce with an upstream 4.16 now.

reproduced on upstream 4.16 (but again: not on 4.17):

# make check_local T=thin-corruption.sh
make: Circular lib/mke2fs.conf <- lib/mke2fs.conf dependency dropped.
ln -s -f  lib/mke2fs.conf
touch .tests-stamp
VERBOSE=0 ./lib/runner \
        --testdir . --outdir results \
        --flavours ndev-vanilla --only thin-corruption.sh --skip @
running 1 tests
###      running: [ndev-vanilla] shell/thin-corruption.sh 0Jun 21 15:49:52 thegoat kernel: device-mapper: uevent: version 1.0.3
Jun 21 15:49:52 thegoat kernel: device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com
Jun 21 15:49:52 thegoat kernel: loop: module loaded
Jun 21 15:49:52 thegoat kernel: EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
Jun 21 15:49:52 thegoat kernel: device-mapper: thin: 253:4: metadata operation 'dm_pool_alloc_data_block' failed: error = -28
Jun 21 15:49:52 thegoat kernel: device-mapper: thin: 253:4: aborting current metadata transaction
Jun 21 15:49:52 thegoat kernel: device-mapper: thin: 253:4: switching pool to read-only mode
Jun 21 15:49:52 thegoat kernel: EXT4-fs warning (device dm-6): ext4_end_bio:323: I/O error 10 writing to inode 12 (offset 0 size 0 starting block 8384)
Jun 21 15:49:52 thegoat kernel: Buffer I/O error on device dm-6, logical block 8374
Jun 21 15:49:52 thegoat kernel: Buffer I/O error on device dm-6, logical block 8375
Jun 21 15:49:52 thegoat kernel: Buffer I/O error on device dm-6, logical block 8376
Jun 21 15:49:52 thegoat kernel: Buffer I/O error on device dm-6, logical block 8377
Jun 21 15:49:52 thegoat kernel: Buffer I/O error on device dm-6, logical block 8378
Jun 21 15:49:52 thegoat kernel: Buffer I/O error on device dm-6, logical block 8379
Jun 21 15:49:52 thegoat kernel: Buffer I/O error on device dm-6, logical block 8380
Jun 21 15:49:52 thegoat kernel: Buffer I/O error on device dm-6, logical block 8381
Jun 21 15:49:52 thegoat kernel: Buffer I/O error on device dm-6, logical block 8382
Jun 21 15:49:52 thegoat kernel: Buffer I/O error on device dm-6, logical block 8383
Jun 21 15:49:52 thegoat kernel: EXT4-fs warning (device dm-6): ext4_end_bio:323: I/O error 10 writing to inode 12 (offset 0 size 0 starting block 8448)
Jun 21 15:49:52 thegoat kernel: EXT4-fs warning (device dm-6): ext4_end_bio:323: I/O error 10 writing to inode 12 (offset 0 size 0 starting block 8390)
Jun 21 15:49:52 thegoat kernel: EXT4-fs warning (device dm-6): ext4_end_bio:323: I/O error 10 writing to inode 12 (offset 0 size 0 starting block 8394)
Jun 21 15:49:52 thegoat kernel: JBD2: Detected IO errors while flushing file data on dm-6-8
Jun 21 15:49:52 thegoat kernel: Aborting journal on device dm-6-8.
Jun 21 15:49:52 thegoat kernel: Buffer I/O error on dev dm-6, logical block 97, lost sync page write
Jun 21 15:49:52 thegoat kernel: JBD2: Error -5 detected when updating journal superblock for dm-6-8.
Jun 21 15:49:52 thegoat kernel: Buffer I/O error on dev dm-6, logical block 1, lost sync page write
Jun 21 15:49:52 thegoat kernel: EXT4-fs error (device dm-6): ext4_journal_check_start:61: Detected aborted journal
Jun 21 15:49:52 thegoat kernel: EXT4-fs (dm-6): Remounting filesystem read-only
Jun 21 15:49:52 thegoat kernel: EXT4-fs (dm-6): previous I/O error to superblock detected
Jun 21 15:49:52 thegoat kernel: Buffer I/O error on dev dm-6, logical block 1, lost sync page write

Comment 7 Mike Snitzer 2018-06-21 20:09:15 UTC
It also fails with latest RHEL7:

# make check_local T=thin-corruption.sh
make: Circular lib/mke2fs.conf <- lib/mke2fs.conf dependency dropped.
ln -s -f  lib/mke2fs.conf
touch .tests-stamp
VERBOSE=0 ./lib/runner \
        --testdir . --outdir results \
        --flavours ndev-vanilla --only thin-corruption.sh --skip @
running 1 tests
###      running: [ndev-vanilla] shell/thin-corruption.sh 0Jun 21 16:06:29 thegoat kernel: device-mapper: uevent: version 1.0.3
Jun 21 16:06:29 thegoat kernel: device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com
###      running: [ndev-vanilla] shell/thin-corruption.sh 1Jun 21 16:06:29 thegoat kernel: loop: module loaded
Jun 21 16:06:29 thegoat kernel: EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
Jun 21 16:06:29 thegoat kernel: device-mapper: thin: 253:4: metadata operation 'dm_pool_alloc_data_block' failed: error = -28
Jun 21 16:06:29 thegoat kernel: device-mapper: thin: 253:4: aborting current metadata transaction
Jun 21 16:06:29 thegoat kernel: device-mapper: thin: 253:4: switching pool to read-only mode
Jun 21 16:06:29 thegoat kernel: EXT4-fs warning (device dm-6): ext4_end_bio:316: I/O error -5 writing to inode 12 (offset 0 size 0 starting block 8374)
Jun 21 16:06:29 thegoat kernel: Buffer I/O error on device dm-6, logical block 8374
Jun 21 16:06:29 thegoat kernel: Buffer I/O error on device dm-6, logical block 8375
Jun 21 16:06:29 thegoat kernel: Buffer I/O error on device dm-6, logical block 8376
Jun 21 16:06:29 thegoat kernel: Buffer I/O error on device dm-6, logical block 8377
Jun 21 16:06:29 thegoat kernel: Buffer I/O error on device dm-6, logical block 8378
Jun 21 16:06:29 thegoat kernel: Buffer I/O error on device dm-6, logical block 8379
Jun 21 16:06:29 thegoat kernel: Buffer I/O error on device dm-6, logical block 8380
Jun 21 16:06:29 thegoat kernel: Buffer I/O error on device dm-6, logical block 8381
Jun 21 16:06:29 thegoat kernel: Buffer I/O error on device dm-6, logical block 8382
Jun 21 16:06:29 thegoat kernel: Buffer I/O error on device dm-6, logical block 8383
Jun 21 16:06:29 thegoat kernel: EXT4-fs warning (device dm-6): ext4_end_bio:316: I/O error -5 writing to inode 12 (offset 0 size 0 starting block 8384)
Jun 21 16:06:29 thegoat kernel: EXT4-fs warning (device dm-6): ext4_end_bio:316: I/O error -5 writing to inode 12 (offset 0 size 0 starting block 8401)
Jun 21 16:06:29 thegoat kernel: EXT4-fs warning (device dm-6): ext4_end_bio:316: I/O error -5 writing to inode 12 (offset 0 size 0 starting block 8390)
Jun 21 16:06:29 thegoat kernel: EXT4-fs warning (device dm-6): ext4_end_bio:316: I/O error -5 writing to inode 12 (offset 0 size 0 starting block 8394)
Jun 21 16:06:29 thegoat kernel: JBD2: Detected IO errors while flushing file data on dm-6-8
Jun 21 16:06:29 thegoat kernel: Aborting journal on device dm-6-8.
Jun 21 16:06:29 thegoat kernel: EXT4-fs error (device dm-6): ext4_journal_check_start:56: Detected aborted journal
Jun 21 16:06:29 thegoat kernel: EXT4-fs (dm-6): Remounting filesystem read-only

Comment 8 Mike Snitzer 2018-06-22 20:33:16 UTC
(In reply to Mike Snitzer from comment #6)
> (In reply to Mike Snitzer from comment #5)
> >
> > I'll try to reproduce with an upstream 4.16 now.
> 
> reproduced on upstream 4.16 (but again: not on 4.17):

Things just got COMPLETELY WEIRD.. if I:

git checkout v4.16
git show v4.17-rc1 | patch -p1
<rebuild and boot this fake "4.17-rc1" that is really "4.16 final">

THE TEST PASSES! HUH!?

Implies something (e.g. fstrim utility?) is somehow altering its behavior based on linux kernel version...

I think my work week is trying to tell me it is time to step away from the keyboard.

Comment 13 Bruno Meneguele 2018-07-13 14:57:44 UTC
Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 15 Bruno Meneguele 2018-07-16 12:57:34 UTC
Patch(es) available on kernel-3.10.0-921.el7

Comment 17 Jakub Krysl 2018-07-26 08:31:06 UTC
# uname -r
3.10.0-915.el7.x86_64

# for i in `seq 1 10`; do sudo dd if=/dev/zero of=/mnt/zero.img bs=1M; sudo rm /mnt/zero.img; sudo fstrim /mnt; done
dd: error writing ‘/mnt/zero.img’: No space left on device
4758+0 records in
4757+0 records out
4988571648 bytes (5.0 GB) copied, 26.139 s, 191 MB/s
dd: error writing ‘/mnt/zero.img’: Read-only file system
1800+0 records in
1799+0 records out
1886416896 bytes (1.9 GB) copied, 1.75103 s, 1.1 GB/s
rm: cannot remove ‘/mnt/zero.img’: Read-only file system
dd: failed to open ‘/mnt/zero.img’: Read-only file system

/var/log/messages:
[115781.116286] device-mapper: thin: 253:5: metadata operation 'dm_pool_alloc_data_block' failed: error = -28 
[115781.165238] device-mapper: thin: 253:5: aborting current metadata transaction 
[115781.199053] EXT4-fs warning (device dm-6): ext4_end_bio:302: I/O error -5 writing to inode 12 (offset 25165824 size 8388608 starting block 664192) 
[115781.258357] Buffer I/O error on device dm-6, logical block 664192 
[115781.286075] Buffer I/O error on device dm-6, logical block 664193 
[115781.314083] Buffer I/O error on device dm-6, logical block 664194 
[115781.341839] Buffer I/O error on device dm-6, logical block 664195 
[115781.369566] Buffer I/O error on device dm-6, logical block 664196 
[115781.397308] Buffer I/O error on device dm-6, logical block 664197 
[115781.425094] Buffer I/O error on device dm-6, logical block 664198 
[115781.452865] Buffer I/O error on device dm-6, logical block 664199 
[115781.480630] Buffer I/O error on device dm-6, logical block 664200 
[115781.508388] Buffer I/O error on device dm-6, logical block 664201 
[115781.537104] EXT4-fs warning (device dm-6): ext4_end_bio:302: I/O error -5 writing to inode 12 (offset 25165824 size 8388608 starting block 664320) 
[115781.596607] EXT4-fs warning (device dm-6): ext4_end_bio:302: I/O error -5 writing to inode 12 (offset 25165824 size 8388608 starting block 664448) 
[115781.659299] EXT4-fs warning (device dm-6): ext4_end_bio:302: I/O error -5 writing to inode 12 (offset 25165824 size 8388608 starting block 664576) 
[115781.720679] EXT4-fs warning (device dm-6): ext4_end_bio:302: I/O error -5 writing to inode 12 (offset 25165824 size 8388608 starting block 664704) 
[115781.780186] EXT4-fs warning (device dm-6): ext4_end_bio:302: I/O error -5 writing to inode 12 (offset 25165824 size 8388608 starting block 664832) 
[115781.840023] EXT4-fs warning (device dm-6): ext4_end_bio:302: I/O error -5 writing to inode 12 (offset 25165824 size 8388608 starting block 664960) 
[115781.901269] EXT4-fs warning (device dm-6): ext4_end_bio:302: I/O error -5 writing to inode 12 (offset 25165824 size 8388608 starting block 665088) 
[115781.960396] EXT4-fs warning (device dm-6): ext4_end_bio:302: I/O error -5 writing to inode 12 (offset 25165824 size 8388608 starting block 665216) 
[115782.019785] EXT4-fs warning (device dm-6): ext4_end_bio:302: I/O error -5 writing to inode 12 (offset 25165824 size 8388608 starting block 665344) 
[115782.291488] JBD2: Detected IO errors while flushing file data on dm-6-8 
[115782.321777] Aborting journal on device dm-6-8. 
[115782.380863] EXT4-fs error (device dm-6): ext4_journal_check_start:56: [115782.380881] EXT4-fs error (device dm-6): ext4_journal_check_start:56: Detected aborted journal 
[115782.380884] EXT4-fs (dm-6): Remounting filesystem read-only 
[115782.409444] EXT4-fs (dm-6): ext4_writepages: jbd2_start: 6376 pages, ino 12; err -30 



# uname -r
3.10.0-921.el7.x86_64

# for i in `seq 1 10`; do sudo dd if=/dev/zero of=/mnt/zero.img bs=1M; sudo rm /mnt/zero.img; sudo fstrim /mnt; done
dd: error writing ‘/mnt/zero.img’: No space left on device
4758+0 records in
4757+0 records out
4988571648 bytes (5.0 GB) copied, 22.8242 s, 219 MB/s
dd: error writing ‘/mnt/zero.img’: No space left on device
4758+0 records in
4757+0 records out
4988571648 bytes (5.0 GB) copied, 22.9005 s, 218 MB/s

/var/log/messages:
[54859.528420] EXT4-fs warning (device dm-6): ext4_end_bio:302: I/O error -28 writing to inode 12 (offset 10813440 size 1572864 starting block 821120)                                                             
[54859.595469] buffer_io_error: 374 callbacks suppressed                                                 
[54859.619448] Buffer I/O error on device dm-6, logical block 821120                                     
[54859.648199] Buffer I/O error on device dm-6, logical block 821121                                     
[54859.677066] Buffer I/O error on device dm-6, logical block 821122                                     
[54859.705949] Buffer I/O error on device dm-6, logical block 821123                                     
[54859.733717] Buffer I/O error on device dm-6, logical block 821124                                     
[54859.761893] Buffer I/O error on device dm-6, logical block 821125                                     
[54859.790557] Buffer I/O error on device dm-6, logical block 821126                                     
[54859.819081] Buffer I/O error on device dm-6, logical block 821127                                     
[54859.847289] Buffer I/O error on device dm-6, logical block 821128                                     
[54859.874752] Buffer I/O error on device dm-6, logical block 821129                                     
[54860.039418] JBD2: Detected IO errors while flushing file data on dm-6-8                               
[54902.591345] JBD2: Detected IO errors while flushing file data on dm-6-8                               
[54902.626899] JBD2: Detected IO errors while flushing file data on dm-6-8 


So dm-thin no longer switches to read-only, but there is still something bad going on with EXT4. The underlying device is much larger than thin, so I tested repeating steps 3 and 4 with linear target and could not hit these messages. Testing on thin with XFS showed only those 'buffer I/O' messages.

These messages do not happen in the first few runs, only after few iterations. Mike, can you please have a look? I am inclined to set this to verified and open a new one, but I want to be sure. Thanks

Comment 18 Mike Snitzer 2018-07-26 16:18:38 UTC
(In reply to Jakub Krysl from comment #17)

> So dm-thin no longer switches to read-only, but there is still something bad
> going on with EXT4. The underlying device is much larger than thin, so I
> tested repeating steps 3 and 4 with linear target and could not hit these
> messages. Testing on thin with XFS showed only those 'buffer I/O' messages.
> 
> These messages do not happen in the first few runs, only after few
> iterations. Mike, can you please have a look? I am inclined to set this to
> verified and open a new one, but I want to be sure. Thanks

I think it possible ext4 is much worse (than xfs) at avoiding fragmentation (in the sense that a file's blocks are being mapped unaligned to the underlying thin-pool blocksize).  As a result, after a few iterations the blocks allocated from thin-pool are shifted such that discarding the space associated with an ext4 file (via fstrim) doesn't actually cause dm-thinp to reclaim space.  DM thinp needs the discard_granularity of the discards issued to it to be >= the thin-pool's blocksize.

Anyway, that theory aside, an observation from the messages you provided:

The dm thin-pool isn't ever entering OODS (out of data space) mode.  Yet ext4 is getting -28 (-ENOSPC).  That is bizarre.  Maybe you just trimmed those dm-thin-pool messages that speak to the pool being out of space?

And if my theory is correct, the discards aren't actually freeing any space from the thin-pool (again due to poor/misaligned layout of the ext4 file's data blocks relative to the thin-pool's blocks) so ext4 finally getting -28 would make sense.

And yes, this is definitely a different issue and should be tracked using another BZ.  Please feel free to clone this BZ accordingly.

Comment 19 Mike Snitzer 2018-07-26 16:20:38 UTC
FYI, in case others are interested, here is the final commit that went upstream and that was backported to RHEL7 to fix this BZ:
http://git.kernel.org/linus/a685557fbbc3122ed11e8ad3fa63a11ebc5de8c3

Comment 20 Jakub Krysl 2018-07-27 13:07:46 UTC
Thanks Mike, created ticket for the I/O error issue, BZ 1609283

As said before, this is fixed so setting to verified.

Comment 22 errata-xmlrpc 2018-10-30 08:54:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3083


Note You need to log in before you can comment on or make changes to this bug.