1517911 – VDO lets user 'overfill' it leading to I/O error

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1517911 - VDO lets user 'overfill' it leading to I/O error

Summary: VDO lets user 'overfill' it leading to I/O error

Keywords:
Status:	CLOSED DUPLICATE of bug 1657152
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	kmod-kvdo
Sub Component:
Version:	7.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Bryan Gurney
QA Contact:	Jakub Krysl
Docs Contact:
URL:
Whiteboard:
Depends On:	1519304 1519307 1519330 1519377 1534088 1548461 1657152
Blocks:
TreeView+	depends on / blocked

Reported:	2017-11-27 16:58 UTC by Jakub Krysl
Modified:	2022-03-13 14:33 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-04-11 14:16:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Article)	3966841	0	None	None	None	2019-03-27 17:55:17 UTC

Internal Links: 1647562

Description Jakub Krysl 2017-11-27 16:58:17 UTC

Description of problem:
When the last slab of VDO is filled, I/O errors appear:
[48394.697872] Buffer I/O error on dev dm-3, logical block 120879528, lost async page write 
[48394.740443] Buffer I/O error on dev dm-3, logical block 120879530, lost async page write 
[48394.779716] Buffer I/O error on dev dm-3, logical block 120879532, lost async page write 
[48394.818291] Buffer I/O error on dev dm-3, logical block 120879534, lost async page write 
[48394.855832] Buffer I/O error on dev dm-3, logical block 120879536, lost async page write 
[48394.894432] Buffer I/O error on dev dm-3, logical block 120879538, lost async page write 
[48394.931657] Buffer I/O error on dev dm-3, logical block 120879540, lost async page write 
[48394.970347] Buffer I/O error on dev dm-3, logical block 120879542, lost async page write 
[48395.008421] Buffer I/O error on dev dm-3, logical block 120879544, lost async page write 
[48395.045762] Buffer I/O error on dev dm-3, logical block 120879546, lost async page write 

I tried LVM, partion (fdisk), iSCSI, FCoE and local disk as underlying volume for VDO. I tried clen vdo and mounted with filesystem (xfs and ext4). I tried changing slab size on creation to verify it is the last slab. Always got the I/O error when I it started filling the last slab.

The filling method was fio (fio --rw='write' --name='fio_test' --filename='vdo/random_fio' --ioengine='libaio' --iodepth=4) or dd (dd if=/dev/urandom of=vdo/random bs=4K status=progress).

The size of underlying storage was 10G, 13.5G, 20G, 40G (all LUNs) and 466G (2 same local disks, tested both - fio and dd, fs and no fs).

I started with deduplication and compression enabled but later disabled it to fill the VDO more easily.

I guess there are some issues with how VDO responds when the slabs get filled. Here is an example on 13.5G LUN:
# lvcreate -L 13.5G -n lv rhel_storageqe-74

# vdo create --name vdo --device /dev/mapper/rhel_storageqe--74-lv --compression disabled --deduplication disabled

# vdostats --human-readable
Device                    Size      Used Available Use% Space saving%
/dev/mapper/vdo          13.5G      3.5G     10.0G  25%           N/A

# mkfs.xfs -K /dev/mapper/vdo
meta-data=/dev/mapper/vdo        isize=512    agcount=4, agsize=884736 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=3538944, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
# mount /dev/mapper/vdo vdo

# vdostats --human-readable

Device                    Size      Used Available Use% Space saving%
/dev/mapper/vdo          13.5G      3.5G     10.0G  25%           79%

# df -h
Filesystem                           Size  Used Avail Use% Mounted on
--snip--
/dev/mapper/vdo                       14G   33M   14G   1% /root/vdo

# lsblk
NAME                        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                           8:0    0 465.8G  0 disk
├─sda1                        8:1    0     1G  0 part /boot
└─sda2                        8:2    0 464.8G  0 part
  ├─rhel_storageqe--74-root 253:0    0    50G  0 lvm  /
  ├─rhel_storageqe--74-swap 253:1    0   7.8G  0 lvm  [SWAP]
  └─rhel_storageqe--74-lv   253:2    0  13.5G  0 lvm
    └─vdo                   253:3    0  13.5G  0 dm   /root/vdo

# dd if=/dev/urandom of=vdo/random bs=4K status=progress
14450319360 bytes (14 GB) copied, 252.900933 s, 57.1 MB/s
dd: error writing ‘vdo/random’: No space left on device
3528132+0 records in
3528131+0 records out
14451224576 bytes (14 GB) copied, 252.911 s, 57.1 MB/s

# vdo status | grep slab       
        slab count: 5                               
        slab journal blocked count: 0               
        slab journal blocks written: 2009           
        slab journal disk full count: 0             
        slab journal flush count: 259               
        slab journal tail busy count: 0             
        slab summary blocks written: 2001           
        slabs opened: 5                             
        slabs reopened: 288418 

# df -h
Filesystem                           Size  Used Avail Use% Mounted on
--snip--
/dev/mapper/vdo                       14G   14G   20K 100% /root/vdo
# vdostats --human-readable
Device                    Size      Used Available Use% Space saving%
/dev/mapper/vdo          13.5G     13.5G      0.0B 100%            0%

As you can see, df -h reports filled 14G (rounded up from 13.5G) from 14G available (at the beginning), but the vdo volume should be only 10G (5 slabs by 2G), index 2.5G (default) and unallocated 1G (as it is not enough for 2G slab). This is correct according to 'vdostats'.
But dd copied over 13.5G when it should have gotten full disk error at 10G.

Version-Release number of selected component (if applicable):
vdo-6.1.0.55-10
kmod-kvdo-6.1.0.55-9

How reproducible:
100%

Steps to Reproduce:
1. create vdo
2. fill it

Actual results:
filling over the allowed size of VDO, I/O errors

Expected results:
filled only to logical size, no I/O errors

Additional info:

Comment 2 Jakub Krysl 2017-11-27 18:25:30 UTC

For comparison I tested the same on pure LVM, I used configuration from the example, just removed the vdo:

# lsblk
NAME                        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                           8:0    0 465.8G  0 disk
├─sda1                        8:1    0     1G  0 part /boot
└─sda2                        8:2    0 464.8G  0 part
  ├─rhel_storageqe--74-root 253:0    0    50G  0 lvm  /
  ├─rhel_storageqe--74-swap 253:1    0   7.8G  0 lvm  [SWAP]
  └─rhel_storageqe--74-lv   253:2    0  13.5G  0 lvm

# mkfs.xfs -K /dev/mapper/rhel_storageqe--74-lv
meta-data=/dev/mapper/rhel_storageqe--74-lv isize=512    agcount=4, agsize=884736 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=3538944, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

# mount /dev/mapper/rhel_storageqe--74-lv lv                                        

# dd if=/dev/urandom of=lv/random bs=4K status=progress                                                                                                                                      
14450769920 bytes (14 GB) copied, 126.667281 s, 114 MB/s
dd: error writing ‘lv/random’: No space left on device
3528144+0 records in
3528143+0 records out
14451273728 bytes (14 GB) copied, 126.744 s, 114 MB/s


No I/O errors anywhere, nothing in server console at all.

Comment 3 Bryan Gurney 2017-11-27 18:31:28 UTC

For the test on the logical volume, are there any errors in /var/log/messages at the time of the "No space left on device" error?  The key question: was the error passed from the file system driver (XFS), or from LVM (device-mapper)?

Comment 4 Sweet Tea Dorminy 2017-11-27 19:27:33 UTC

I believe the issue here might be using dd without the oflag=direct, through a filesystem. Without direct IO, dd will write to the page cache, which will only occasionally push writes down to VDO. 

Thus, dd could have written, say, 9.5G which actually got written to VDO, and then launched 4G of writes which went to the page cache and not to VDO... later on, the page cache requested those writes actually happen, some of those writes failed because VDO was out of space, and the next time dd tried to write it got the out-of-space error.

Can you attempt to recreate this with the oflag=direct option passed to dd please perchance?

Comment 5 Bryan Gurney 2017-11-27 19:34:32 UTC

The first thing to note here is that the writes being performed are buffered writes to a file in a file system.  This means that the data writes are being buffered in memory prior to being written to disk (or in this case, to the VDO volume).

I recreated this test on a virtual machine with a 40 GB virtual disk /dev/sdb, and created a VDO volume named "vdo1".  I had a transfer rate of about 100 MB/sec during the dd execution, and it finished with the following output:

[root@rhel75test ~]# date; dd if=/dev/urandom of=/mnt/testmount/randomfile.bin bs=4K status=progress; date
Mon Nov 27 13:52:05 EST 2017
40810266624 bytes (41 GB) copied, 408.045466 s, 100 MB/ss
dd: error writing ‘/mnt/testmount/randomfile.bin’: Input/output error
9984355+0 records in
9984354+0 records out
40895913984 bytes (41 GB) copied, 408.942 s, 100 MB/s
Mon Nov 27 13:58:54 EST 2017

Note the end time of the dd command; that corresponds with an error from XFS indicating that the device returned error code 28 (ENOSPC, "No space left on device"):

Nov 27 13:58:54 rhel75test kernel: XFS (dm-2): metadata I/O error: block 0x2800110 ("xlog_iodone") error 28 numblks 64
Nov 27 13:58:54 rhel75test kernel: XFS (dm-2): xfs_do_force_shutdown(0x2) called from line 1200 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffc064f1f0
Nov 27 13:58:54 rhel75test kernel: XFS (dm-2): Log I/O Error Detected.  Shutting down filesystem
Nov 27 13:58:54 rhel75test kernel: XFS (dm-2): Please umount the filesystem and rectify the problem(s)
Nov 27 13:58:54 rhel75test kernel: XFS (dm-2): metadata I/O error: block 0x2800108 ("xlog_iodone") error 28 numblks 64
Nov 27 13:58:54 rhel75test kernel: XFS (dm-2): xfs_do_force_shutdown(0x2) called from line 1200 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffc064f1f0

However, note that the first buffered I/O errors appeared a 19 seconds before: 

Nov 27 13:58:35 rhel75test kernel: buffer_io_error: 17317 callbacks suppressed
Nov 27 13:58:35 rhel75test kernel: Buffer I/O error on dev dm-2, logical block 5210152, lost async page write

...because this was the time that the VDO volume ran out of physical space.

Can you try the same test with direct I/O?

For dd, add the switch "oflag=direct"
For fio, add the parameter "direct=1"

Comment 6 Bryan Gurney 2017-11-27 19:48:04 UTC

One other test to try: can you perform the buffered I/O endless dd write to a thin logical volume (i.e.: "lvcreate -V ..." after creating a thin-pool)?  Create the thin LV with the same size, of 13.5G.

Note that a standard LVM logical volume is a thick-provisioned device, while a VDO volume is a thin-provisioned device.  It would be interesting to test it against thin-pool, another thin-provisioned device.

For VDO, there's one other metadata category that occupies space as data is written: the block map is allocated dynamically as data is written, at a ratio of 0.00123:1.  This would mean that the 13.5 GB of data would require about 19 MB of data for the block map.  Under normal circumstances, this is a very small amount, but it's important to note for a fill test like this.

Comment 7 Jakub Krysl 2017-11-28 10:08:41 UTC

Bryan, this is how it looks like when writing directly without filesystem to new VDO:

# vdo create --name vdo --device /dev/mapper/rhel_storageqe--74-lv --compression dis
abled --deduplication disabled
Creating VDO vdo
Starting VDO vdo
VDO instance 4 volume is ready at /dev/mapper/vdo
# dd if=/dev/urandom of=/dev/mapper/vdo bs=4K status=progress
14435155968 bytes (14 GB) copied, 248.344869 s, 58.1 MB/s
dd: error writing ‘/dev/mapper/vdo’: No space left on device
3538945+0 records in
3538944+0 records out
14495514624 bytes (14 GB) copied, 250.668 s, 57.8 MB/s

And /var/log/messages: (Note: these appear after 12G-13G written on 10G - 5 slabs - vdo...)
Nov 28 08:47:37 localhost kernel: buffer_io_error: 285113 callbacks suppressed
Nov 28 08:47:37 localhost kernel: Buffer I/O error on dev dm-3, logical block 2616268, lost async page w$
Nov 28 08:47:37 localhost kernel: Buffer I/O error on dev dm-3, logical block 2616269, lost async page write
Nov 28 08:47:37 localhost kernel: Buffer I/O error on dev dm-3, logical block 2616270, lost async page write
Nov 28 08:47:37 localhost kernel: Buffer I/O error on dev dm-3, logical block 2616271, lost async page write
Nov 28 08:47:37 localhost kernel: Buffer I/O error on dev dm-3, logical block 2616272, lost async page write
Nov 28 08:47:37 localhost kernel: Buffer I/O error on dev dm-3, logical block 2616273, lost async page write
Nov 28 08:47:37 localhost kernel: Buffer I/O error on dev dm-3, logical block 2616274, lost async page write
Nov 28 08:47:37 localhost kernel: Buffer I/O error on dev dm-3, logical block 2616275, lost async page write
Nov 28 08:47:37 localhost kernel: Buffer I/O error on dev dm-3, logical block 2616276, lost async page write
Nov 28 08:47:37 localhost kernel: Buffer I/O error on dev dm-3, logical block 2616277, lost async page write
Nov 28 08:47:45 localhost kernel: buffer_io_error: 625707 callbacks suppressed
Nov 28 08:47:45 localhost kernel: Buffer I/O error on dev dm-3, logical block 3241985, lost async page write
Nov 28 08:47:45 localhost kernel: Buffer I/O error on dev dm-3, logical block 3241986, lost async page write
Nov 28 08:47:45 localhost kernel: Buffer I/O error on dev dm-3, logical block 3241987, lost async page write
Nov 28 08:47:45 localhost kernel: Buffer I/O error on dev dm-3, logical block 3241988, lost async page write
Nov 28 08:47:45 localhost kernel: Buffer I/O error on dev dm-3, logical block 3241989, lost async page write
Nov 28 08:47:45 localhost kernel: Buffer I/O error on dev dm-3, logical block 3241990, lost async page write
Nov 28 08:47:45 localhost kernel: Buffer I/O error on dev dm-3, logical block 3241991, lost async page write
Nov 28 08:47:45 localhost kernel: Buffer I/O error on dev dm-3, logical block 3241992, lost async page write
Nov 28 08:47:45 localhost kernel: Buffer I/O error on dev dm-3, logical block 3241993, lost async page write
Nov 28 08:47:45 localhost kernel: Buffer I/O error on dev dm-3, logical block 3241994, lost async page write

In the original report I already tested this without filesystem and with xfs or ext4. here is no filesystem at all and the core issue persists. With filesystem there are few more fs-specific errors as the filesystem itself fails because of this.



Testing with oflag=direct is actually impossible, as it would take way too long (80x-50x longer than non-direct):
# dd if=/dev/urandom oflag=direct of=/dev/mapper/vdo bs=4K status=progress
2912256 bytes (2.9 MB) copied, 18.270155 s, 159 kB/s
Filesystem makes no difference here:
# dd if=/dev/urandom oflag=direct of=vdo/random bs=4K status=progress
7421952 bytes (7.4 MB) copied, 46.748874 s, 159 kB/s
Any idea how to speed this up? (with fio it is 120kB/s)



And finally tested thin provisioned 13.5G LV (here with 20G pool but tested with 13.5G before with the same result):
# lvcreate -T -L 20G -n pool rhel_storageqe-74
  Using default stripesize 64.00 KiB.
  Thin pool volume with chunk size 64.00 KiB can address at most 15.81 TiB of data.
  Logical volume "pool" created.
# lvcreate -T -V 13.5G -n lv rhel_storageqe-74/pool
  Using default stripesize 64.00 KiB.
  Logical volume "lv" created.
# dd if=/dev/urandom of=/dev/mapper/rhel_storageqe--74-lv bs=4K status=progress
14438154240 bytes (14 GB) copied, 189.413043 s, 76.2 MB/s
dd: error writing ‘/dev/mapper/rhel_storageqe--74-lv’: No space left on device
3538945+0 records in
3538944+0 records out
14495514624 bytes (14 GB) copied, 218.921 s, 66.2 MB/s

No errors in /var/log/messages, only single message when creating thin:
[70505.659801] device-mapper: thin: Data device (dm-3) discard unsupported: Disabling discard passdown.


Also Bryan, I should *not* be able to write 13.5G data to vdo created on 13.5G backing storage. With 2G slabs I should be able to write 10G tops, because vdo can fit only 5 slabs there, as 2.5G is taken by index of default size. Right after 10G written I should get "No space left on device" error, which is *not* happening with VDO.

Comment 8 Jakub Krysl 2017-11-29 13:47:37 UTC

I managed to provision some faster disk to test this, here are results. I reduced size of 5.5TB physical disk with LVM to get to some usable times. I choose 6GB as size, because the index will take 2.5GB which leaves 3.5GB for the VDO, but slab size being 2GB, only single slab should fit there, leaving the rest unallocated. So in this case user should be able to write 2GB of data, as that is size of single slab.

First I tested thick provisioned LV to be able to compare behavior with vdo:
# lsblk
NAME                    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
--snip--
sdc                       8:32   0  5.5T  0 disk
└─vg-lv                 253:2    0    6G  0 lvm

# dd if=/dev/urandom of=/dev/mapper/vg-lv bs=4k status=progress
5009145856 bytes (5.0 GB) copied, 2.000002 s, 2.5 GB/s
dd: error writing ‘/dev/mapper/vg-lv’: No space left on device
1572865+0 records in
1572864+0 records out
6442450944 bytes (6.4 GB) copied, 33.7821 s, 191 MB/s

The speed of this disk is much faster, 2.5GB/s. Because of this the 'No space left on device' appeared at 5GB, because dd cannot fit another 2.5GB in there. This led to reduced speed and copying over 6442450944 bytes, which is exactly 6GB of data. So I tried direct I/O:

# dd if=/dev/urandom oflag=direct of=/dev/mapper/vg-lv bs=4k status=progress
6409650176 bytes (6.4 GB) copied, 112.010316 s, 42.8 MB/s
dd: error writing ‘/dev/mapper/vg-lv’: No space left on device
1572865+0 records in
1572864+0 records out
6442450944 bytes (6.4 GB) copied, 112.648 s, 42.8 MB/s

The results are exactly the same, but given the lower speed (42.8MB/s), the 'no space left' got triggered much later. So now I tested VDO:

# vdo create --name vdo --device /dev/mapper/vg-lv --compression disabled --deduplication disabled
# lsblk
NAME                    MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
--snip--
sdc                       8:32   0  5.5T  0 disk
└─vg-lv                 253:2    0    6G  0 lvm
  └─vdo                 253:3    0    6G  0 dm

# dd if=/dev/urandom of=/dev/mapper/vdo bs=4k status=progress
6339649536 bytes (6.3 GB) copied, 35.000409 s, 181 MB/s
dd: error writing ‘/dev/mapper/vdo’: No space left on device
1572865+0 records in
1572864+0 records out
6442450944 bytes (6.4 GB) copied, 36.8855 s, 175 MB/s

/var/log/messages:
[ 6190.532994] Buffer I/O error on dev dm-3, logical block 0, lost async page write
--snip--
[ 6190.599577] Buffer I/O error on dev dm-3, logical block 9, lost async page write
[ 6195.812940] buffer_io_error: 13302 callbacks suppressed
[ 6195.818169] Buffer I/O error on dev dm-3, logical block 13312, lost async page write
--snip--
[ 6195.887855] Buffer I/O error on dev dm-3, logical block 13321, lost async page write
[ 6200.826847] buffer_io_error: 294429 callbacks suppressed
[ 6200.832165] Buffer I/O error on dev dm-3, logical block 307751, lost async page write
--snip--
[ 6200.902778] Buffer I/O error on dev dm-3, logical block 307760, lost async page write
[ 6209.253350] buffer_io_error: 202191 callbacks suppressed
[ 6209.258684] Buffer I/O error on dev dm-3, logical block 509952, lost async page write
--snip--
[ 6209.329127] Buffer I/O error on dev dm-3, logical block 509961, lost async page write
[ 6215.941376] buffer_io_error: 581622 callbacks suppressed
[ 6215.946690] Buffer I/O error on dev dm-3, logical block 1091585, lost async page write
--snip--
[ 6216.018202] Buffer I/O error on dev dm-3, logical block 1091594, lost async page write
[ 6221.083221] buffer_io_error: 286407 callbacks suppressed
[ 6221.088541] Buffer I/O error on dev dm-3, logical block 1378002, lost async page write
--snip--
[ 6221.159739] Buffer I/O error on dev dm-3, logical block 1378011, lost async page write
[ 6222.403541] VFS: Dirty inode writeback failed for block device dm-3 (err=-5).

Note: The first I/O errors appeared almost immediately, while still writing to the fist slab, around 1GB data written. Given the higher speed I managed to test direct I/O here:

# dd if=/dev/urandom oflag=direct of=/dev/mapper/vdo bs=4k status=progress
2139389952 bytes (2.1 GB) copied, 358.143349 s, 6.0 MB/s
dd: error writing ‘/dev/mapper/vdo’: Input/output error
523110+0 records in
523109+0 records out
2142654464 bytes (2.1 GB) copied, 358.582 s, 6.0 MB/s

This finally appears almost correct (wrote only to the single 2GB slab) plus there were no I/O errors in /var/log/messages. Yet still the breaking point is 'I/O error' instead of 'no space left'.
I got a bit curious and tested with /dev/zero, the result is quite different:
# dd if=/dev/zero oflag=direct of=/dev/mapper/vdo bs=4k status=progress
6440042496 bytes (6.4 GB) copied, 801.311495 s, 8.0 MB/s
dd: error writing ‘/dev/mapper/vdo’: No space left on device
1572865+0 records in
1572864+0 records out
6442450944 bytes (6.4 GB) copied, 801.523 s, 8.0 MB/s
Finally 'no space left', no I/O errors, but the total size is wrong again. I have no idea where those 0s got written, the VDO appears fine at this point. SO I checked with hexdump:
# hexdump -C /dev/mapper/vdo -n 6g
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
180000000
180000000hex = 6442450944dec = 6GB of 0s. yet I should not be able to read 6GB of 'data' from VDO. (Note: Hexdumping the underlying volume shows everything is OK)
So I dd/d /dev/urandom with direct I/O again to check the real data written:

# hexdump -C /dev/mapper/vdo | tail -n 4
7f657ff0  62 4b 0f 9a 07 bd f5 d9  53 a4 ff 36 b0 31 d2 b8  |bK......S..6.1..|
7f658000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
180000000
It appears really just 2G exactly were written with direct I/O. So I zeroed the whole VDO again and checked how much data gets written there without direct I/O:

# dd if=/dev/urandom of=/dev/mapper/vdo bs=4k status=progress
6288510976 bytes (6.3 GB) copied, 34.000351 s, 185 MB/s
dd: error writing ‘/dev/mapper/vdo’: No space left on device
1572865+0 records in
1572864+0 records out
6442450944 bytes (6.4 GB) copied, 62.7784 s, 103 MB/s
(/var/log/messages I/O errors of course)
# hexdump -C /dev/mapper/vdo | tail -n 4
7f657ff0  f5 40 ce a4 2f 73 8b 46  6d 61 91 42 21 a5 a6 5f  |.@../s.Fma.B!.._| 
7f658000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
180000000

So this proves the actual data written to the VDO is not larger than 'slab size' * 'number of slabs'.


But here are the issues:
1) Difference between logical/physical size and actual usable size of VDO:
# vdo status | grep Physical
    Physical size: 6G
# vdo status | grep Logical
    Logical size: 6G
Yet as proven the actual physical size of VDO is 2G:
# vdo status | grep slab
        slab count: 1
# vdo status | grep Slab
    Slab size: 2G

2) Because the write stream is not stopped when there is no actual space to write to but when it writes over the 'physical' size, where do the data go? To check this I have to test filesystem first:

# mkfs.xfs /dev/mapper/vdo
# mount /dev/mapper/vdo vdo
# hexdump -C /dev/mapper/vdo | tail -n 4
120006030  00 00 00 03 53 34 21 b8  00 00 00 00 00 00 00 00  |....S4!.........|
120006040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
180000000
120006040hex = 4831862848dec ~= 4.5GB

It appears the creation of filesystem managed to write some bytes to parts of VDO normally inaccessible by direct I/O. To further test this I first dd'd direct I/O to file in the mounted directory:


# dd if=/dev/urandom oflag=direct of=vdo/direct bs=4k status=progress
2133049344 bytes (2.1 GB) copied, 359.150537 s, 5.9 MB/s
dd: error writing ‘vdo/direct’: Input/output error
522555+0 records in
522554+0 records out
2140381184 bytes (2.1 GB) copied, 360.112 s, 5.9 MB/s

/var/log/messages after few seconds:
[12878.030516] XFS (dm-3): metadata I/O error: block 0x6000f8 ("xlog_iodone") error 28 numblks 64        
[12878.039201] XFS (dm-3): Log I/O Error Detected.  Shutting down filesystem                             
[12878.045990] XFS (dm-3): Please umount the filesystem and rectify the problem(s) 

And now the filesystem is inaccessible.

To test this a bit more user-like, I created 1G file and copied it 6* (1 by 1) to mounted filesystem:

# dd if=/dev/urandom of=1G.file bs=4k count=262144 status=progress
1067139072 bytes (1.1 GB) copied, 6.000082 s, 178 MB/s
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 6.03731 s, 178 MB/s
# ll
-rw-r--r--.  1 root root 1073741824 Nov 29 12:44 1G.file
# cp 1G.file vdo/1
# cp 1G.file vdo/2
# cp 1G.file vdo/3
# cp 1G.file vdo/4
# cp 1G.file vdo/5
# cp 1G.file vdo/6
cp: error writing ‘vdo/6’: No space left on device
cp: failed to extend ‘vdo/6’: No space left on device

(/var/log/messages I/O errors again)
I waited few seconds between each file.

# dd if=/dev/urandom oflag=direct of=vdo/direct bs=4k status=progress
2133049344 bytes (2.1 GB) copied, 359.150537 s, 5.9 MB/s
dd: error writing ‘vdo/direct’: Input/output error
522555+0 records in
522554+0 records out
2140381184 bytes (2.1 GB) copied, 360.112 s, 5.9 MB/s
and /var/log/messages:
[12878.030516] XFS (dm-3): metadata I/O error: block 0x6000f8 ("xlog_iodone") error 28 numblks 64        
[12878.039201] XFS (dm-3): Log I/O Error Detected.  Shutting down filesystem                             
[12878.045990] XFS (dm-3): Please umount the filesystem and rectify the problem(s) 

As I tested now, I cannot fit more than 2G in that directory, yet it appears I did (at least for few seconds):

[root@storageqe-24 ~]# ll vdo
total 6227404
-rw-r--r--. 1 root root 1073741824 Nov 29 13:38 1
-rw-r--r--. 1 root root 1073741824 Nov 29 13:38 2
-rw-r--r--. 1 root root 1073741824 Nov 29 13:38 3
-rw-r--r--. 1 root root 1073741824 Nov 29 13:38 4
-rw-r--r--. 1 root root 1073741824 Nov 29 13:38 5
-rw-r--r--. 1 root root 1008152576 Nov 29 13:38 6
# df -h
Filesystem                                       Size  Used Avail Use% Mounted on
/dev/mapper/vdo                                  6.0G  6.0G  4.2M 100% /root/vdo

And after the system is settled again (all errors appeared):
# ll vdo
ls: cannot access vdo: Input/output error
# ll
ls: cannot access vdo: Input/output error
total 1049568
-rw-r--r--.  1 root root 1073741824 Nov 29 12:44 1G.file  
--snip--
d??????????  ? ?    ?             ?            ? vdo

Umounting the directory and mounting it again produces this:
# mount /dev/mapper/vdo vdo
mount: mount /dev/mapper/vdo on /root/vdo failed: No space left on device

/var/log/messages: 
[13967.228227] XFS (dm-3): I/O Error Detected. Shutting down filesystem                                  
[13967.234581] XFS (dm-3): Please umount the filesystem and rectify the problem(s)                       
[13967.241893] XFS (dm-3): metadata I/O error: block 0x600058 ("xlog_bwrite") error 28 numblks 8200      
[13967.250862] XFS (dm-3): failed to locate log tail                                                     
[13967.255561] XFS (dm-3): log mount/recovery failed: error -28                                          
[13967.261259] XFS (dm-3): log mount failed  

So now I am unable to access the data on VDO.

So:
3) I am able to fill the vdo to slab_size*slab_count with direct I/O almost without issues, but doing the same with existing filesystem leads to the filesystem crashing.

4) Copying some files over to mounted vdo appear fine for few moments and than the filesystem crashes. Which means I loose the part of write stream that went over the 'slab_size*slab_count' size.

5) At the end I am unable to access the data on VDO easily. Note: I have not tried to verify the data there or dd them somewhere else.


Sorry for the long post, I tried to make myself clear as much as possible.

Comment 9 Sweet Tea Dorminy 2017-11-29 19:59:51 UTC

Hi Jakub!

Thank you for your very detailed comment. I think I understand what you're doing better every time.

>This finally appears almost correct (wrote only to the single 2GB slab) plus there were no I/O errors in /var/log/messages. Yet still the breaking point is 'I/O error' instead of 'no space left'.

I believe this is a Linux limitation --- I believe kernel processes (such as filesystems) can get ENOSPACE or EIO from a write, but userspace processes (such as dd) can only get EIO, which gets translated to Input/output error. If you made a dm-thin device with more logical space than the pool has room for, and used dd to fill it up, I expect you would get EIO there also.

> I got a bit curious and tested with /dev/zero, the result is quite different:
Yes. Even with deduplication turned off, VDO treats writes of zeros specially, using no space to store them. As a result, dd will successfully write to the entire VDO logical space, without being constrained by the 2G slab.

>Finally 'no space left'
I believe this is the result of trying to seek off the end of the device.

>1) Difference between logical/physical size and actual usable size of VDO:
Logical size is intended to reflect the amount of space that VDO makes addressable: here, that's 6.5G, but it could just as well be 6.5T on this system.
Physical size is intended to reflect the size of the device under VDO.
The statistic you might be wanting is "physical blocks" - "data blocks used" - "overhead blocks", that's the amount of free 4K blocks. "overhead blocks" will grow slightly when logical addresses are used; you can get around this by writing /dev/zero to the entire VDO device, which makes each logical address used; 'overhead blocks' won't change after that (unless you do a growLogical or a growPhysical). 

>2) Because the write stream is not stopped when there is no actual space to write to but when it writes over the 'physical' size, where do the data go?
If it's going through the page cache, it will be lost at some point after it's failed to be written. If it's written directly, it's gone and you get an error.

> It appears the creation of filesystem managed to write some bytes to parts of VDO normally inaccessible by direct I/O. 
Filesystems try to spread out files and metadata across the entire device. DD is using VDO logical addresses from 0 to 2G; the filesystem is using a different 2G of addresses.

># dd if=/dev/urandom oflag=direct of=vdo/direct bs=4k status=progress
2133049344 bytes (2.1 GB) copied, 359.150537 s, 5.9 MB/s

This number of bytes, despite dd saying it is 2.1G, is 2034.23 MB, which is slightly less than 2G.

>To test this a bit more user-like, I created 1G file and copied it 6* (1 by 1) to mounted filesystem:
The page cache again. The filesystem wrote all the information required to say 'I'm going to write these 1G files at these logical addresses' but the data is still in the page cache, not on disk. The filesystem said there was no space on the last file, because it uses some space for metadata and it didn't have 1G of logical addresses left for the last file. After that, at some point the page cache decided to actually write the data, VDO ran out of space to store it and started returning errors, and the filesystem stopped operating because it received errors.

You can get around this by running 'sync' after each 'cp', which tells the page cache to start writing everything within. However, sync does not wait for those writes to finish, so it's still possible to get IO errors / no space errors after sync has finished.

>So now I am unable to access the data on VDO.
Yes. Filesystems are not yet thin-provisioning-aware, so they do not react well to their device running out of space. This is a mailing list thread about improving that [https://www.spinics.net/lists/linux-xfs/msg12216.html] but no progress has happened. 

> 3) I am able to fill the vdo to slab_size*slab_count with direct I/O almost without issues, but doing the same with existing filesystem leads to the filesystem crashing.
Yes, filesystems need some space too, so it's hard to know exactly how much you can write to a filesystem atop a VDO.

> 4) Copying some files over to mounted vdo appear fine for few moments and than the filesystem crashes. Which means I loose the part of write stream that went over the 'slab_size*slab_count' size.
Yes, it's the page cache again.

> 5) At the end I am unable to access the data on VDO easily. 
Yes.

Comment 10 Jakub Krysl 2017-11-30 10:57:31 UTC

Hi Sweet Tea,

Thanks for answering some of my concerns, but I still fell like we are not on the same page.

###############################
TL/DR: Skip to the end summary.
###############################

I did some further testing with 6G thin LV on 2G thin pool to get similar environment as I have now with 6G VDO. When creating such thin provisioned LV, there are clear warnings in terminal:
Note: I recreated the LV between each dd to provide stable results.

# lvcreate -n pool -T -L 2G vg                                                       
  Using default stripesize 64.00 KiB.
  Thin pool volume with chunk size 64.00 KiB can address at most 15.81 TiB of data.
  Logical volume "pool" created.
# lvcreate -n lv -T -V 6G vg/pool                                                    
  Using default stripesize 64.00 KiB.
  WARNING: Sum of all thin volume sizes (6.00 GiB) exceeds the size of thin pool vg/pool (2.00 GiB).
  WARNING: You have not turned on protection against thin pools running out of space.
  WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.
  Logical volume "lv" created.

Direct I/O leads to I/O error on the last write:
# dd if=/dev/urandom of=/dev/mapper/vg-lv bs=4K oflag=direct status=progress
2145849344 bytes (2.1 GB) copied, 277.308546 s, 7.7 MB/s
dd: error writing ‘/dev/mapper/vg-lv’: Input/output error
524289+0 records in
524288+0 records out
2147483648 bytes (2.1 GB) copied, 337.773 s, 6.4 MB/s
and no I/O errors in /var/log/messages:
[81469.069613] device-mapper: thin: 253:4: reached low water mark for data device: sending event.
This behaviour is exactly the same as VDO.

Cached I/O results again are almost the same as VDO, just VDO gives more I/O errors:
# dd if=/dev/zero of=/dev/mapper/vg-lv bs=4K status=progress
5181358080 bytes (5.2 GB) copied, 2.000001 s, 2.6 GB/s
dd: error writing ‘/dev/mapper/vg-lv’: No space left on device
1572865+0 records in
1572864+0 records out
6442450944 bytes (6.4 GB) copied, 74.2104 s, 86.8 MB/s
/var/log/messages:
[86091.588933] buffer_io_error: 371685 callbacks suppressed
[86091.594249] Buffer I/O error on dev dm-6, logical block 524288, lost async page write
--snip--
[86091.664726] Buffer I/O error on dev dm-6, logical block 524297, lost async page write

# dd if=/dev/urandom of=/dev/mapper/vg-lv bs=4K status=progress
6396739584 bytes (6.4 GB) copied, 32.000295 s, 200 MB/s
dd: error writing ‘/dev/mapper/vg-lv’: No space left on device
1572865+0 records in
1572864+0 records out
6442450944 bytes (6.4 GB) copied, 81.9572 s, 78.6 MB/s
/var/log/messages:
[86324.811831] buffer_io_error: 1048566 callbacks suppressed
[86324.817227] Buffer I/O error on dev dm-6, logical block 524288, lost async page write
--snip--
[86324.887643] Buffer I/O error on dev dm-6, logical block 524297, lost async page write

And now with filesystem:
# mkfs.xfs -K /dev/mapper/vg-lv
# hexdump -C /dev/mapper/vg-lv | tail -n 4
150006030  00 00 00 07 3e 9c e0 a8  00 00 00 00 00 00 00 00  |....>...........|
150006040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
180000000
So the filesystem wrote some data all around the LV, same as with VDO.

# mount /dev/mapper/vg-lv lv
# dd if=/dev/urandom of=lv/direct oflag=direct bs=4K status=progress
2134970368 bytes (2.1 GB) copied, 390.701373 s, 5.5 MB/s
dd: error writing ‘lv/direct’: Input/output error
521553+0 records in
521552+0 records out
2136276992 bytes (2.1 GB) copied, 451.014 s, 4.7 MB/s
And nothing in /var/log/messages. This is not the same as VDO as at this point filesystem crashed on VDO and became unusable. And user data became inaccessible.

# dd if=/dev/urandom of=lv/normal bs=4K status=progress
6383173632 bytes (6.4 GB) copied, 100.772492 s, 63.3 MB/s
dd: error writing ‘lv/direct’: No space left on device
1562008+0 records in
1562007+0 records out
6397980672 bytes (6.4 GB) copied, 100.855 s, 63.4 MB/s
/var/log/messages:
[87430.594163] buffer_io_error: 1048566 callbacks suppressed
[87430.599570] Buffer I/O error on dev dm-6, logical block 632960, lost async page write
--snip--
[87430.670056] Buffer I/O error on dev dm-6, logical block 632969, lost async page write
Again no filesystem crash.

# cp 1G.file lv/1
# cp 1G.file lv/2
# cp 1G.file lv/3
# cp 1G.file lv/4
# cp 1G.file lv/5
# cp 1G.file lv/6
cp: error writing ‘lv/6’: No space left on device
cp: failed to extend ‘lv/6’: No space left on device
# ll lv
total 6232412
-rw-r--r--. 1 root root 1073741824 Nov 30 10:10 1
-rw-r--r--. 1 root root 1073741824 Nov 30 10:11 2
-rw-r--r--. 1 root root 1073741824 Nov 30 10:11 3
-rw-r--r--. 1 root root 1073741824 Nov 30 10:11 4
-rw-r--r--. 1 root root 1073741824 Nov 30 10:11 5
-rw-r--r--. 1 root root 1013280768 Nov 30 10:11 6
# df -h
Filesystem                                       Size  Used Avail Use% Mounted on
--snip--
/dev/mapper/vg-lv                                6.0G  6.0G  7.3M 100% /root/lv
/var/log/messages:
# [87649.603853] buffer_io_error: 1060157 callbacks suppressed
[87649.609252] Buffer I/O error on dev dm-6, logical block 369680, lost async page write
--snip
[87649.679688] Buffer I/O error on dev dm-6, logical block 369689, lost async page write
Again no filesystem crash.

###############################
TL/DR: Summary starts here.
###############################

So comparing 6G thin LV on 2G thin pool with 6G VDO (1*2G slab), here is the sum:

1) The write stream is lost in both scenarios, so as long as this is documented (expected by user), this is fine.

2) Filesystem does not crash on thin LV but crashes on VDO, which is very bad. This means user is not able to get to his data. This should not happen and should be fixed.

3) There should be clear warning when creating 'thinly provisioned' VDO.

4) Default VDO should not be thinly provisioned, as this was my main source of confusion and I expect common user to behave in similar way. So Changing default logical size to slab_size * slab_count to reflect real writeable storage size. This way even common tools like df will reflect this size and so there will be much less confusion. Than the user can choose to increase logical size however he wishes.

5) VDO Physical size is another source of confusion. At the beginning I expected this to be the real writeable disk space user can write data to on VDO. So maybe switching the logic there and set this to slab_size * slab_count to reflect the real user writeable logical size is the way to go.

Each of these points should have its own BZ to make it easier to discuss/fix/verify. So when we come to an agreement which of these points are feasible, I will create the BZs for them with this one depending on them to provide link. Number 2 is definitely a priority and possibly a blocker material too, as it is about user data.

Comment 11 Jakub Krysl 2017-11-30 14:57:46 UTC

As per discussion I will create BZs to cover each point and link them here, so we can continue on each point separately.

Comment 14 Magnus Glantz 2018-11-27 12:23:19 UTC

I also hit this on RHEL 8 Beta, with:
vdo-6.2.0.239-8.el8.x86_64
kmod-kvdo-6.2.0.239-34.el8.x86_64

Comment 15 Magnus Glantz 2018-11-27 12:31:44 UTC

From my system, when filling up a vdo device (on LVM or directly on a raw device (virtual disk to a VM):

Nov 27 05:35:26 htb-eight kernel: Buffer I/O error on dev dm-2, logical block 0, lost async page write
Nov 27 05:35:51 htb-eight kernel: Buffer I/O error on dev dm-2, logical block 0, lost async page write

Comment 16 Magnus Glantz 2018-12-05 13:49:46 UTC

This problem also exists in RHEL8 Beta Snapshot 1 (latest at time of writing)

Comment 19 Bryan Gurney 2019-03-12 18:17:12 UTC

The KCS article "Managing Thin Provisioning with Virtual Data Optimizer" has been published at https://access.redhat.com/articles/3966841

Comment 21 Bryan Gurney 2019-04-11 14:16:34 UTC


*** This bug has been marked as a duplicate of bug 1657152 ***

Note You need to log in before you can comment on or make changes to this bug.