Bug 1846036

Summary:	LVMVDO volume does not reclaim disk space, eventually becomes read-only & fsck reports filesystem error
Product:	Red Hat Enterprise Linux 8	Reporter:	Petr Beranek <pberanek>
Component:	lvm2	Assignee:	LVM and device-mapper development team <lvm-team>
lvm2 sub component:	Other	QA Contact:	cluster-qe <cluster-qe>
Status:	CLOSED NOTABUG	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	agk, heinzm, jbrassow, msnitzer, pasik, prajnoha, zkabelac
Version:	8.2	Flags:	pm-rhel: mirror+
Target Milestone:	rc
Target Release:	8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-06-16 13:59:33 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Petr Beranek 2020-06-10 16:01:21 UTC

Description of problem:
When writing to and deleting data from LVMVDO volume, the volume does not
reclaim the space freed by data deletion. When checking volume space
utilization via `lvs', "Data%" value is never decresed (but e.g. `df' command
does reflect the data deletion and reports expected values). Eventually, when
"Data%" value of `vdo_pool' becomes 100.00, LVMVDO volume ends up in read-only
mode and `fsck' reports an filesystem error.


Version-Release number of selected component (if applicable):
lvm2-2.03.08-3.el8.x86_64 (RHEL 8.2.1)
lvm2-2.03.09-2.el8.x86_64 (RHEL 8.3.0)
(other versions were not tested)


How reproducible:
always (see steps below)


Steps to Reproduce:
prerequisites: 3 disks, 5GB each
1. pvcreate /dev/sd{a,b,c}
2. vgcreate vdovg /dev/sd{a,b,c}
3. lvcreate --type vdo -n vdolv -L 10G -V 20G vdovg/vdo_pool
4. mkfs.ext4 -E nodiscard /dev/vdovg/vdolv
5. mkdir /mnt/vdolv
6. mount /dev/vdovg/vdolv /mnt/vdolv/
*  check logical volumes stats using `lvs' and `df -h' 
7. dd if=/dev/urandom of=/mnt/vdolv/urandom_data.bin bs=1G count=5 iflag=fullblock
   # the overall result is exactly the same result if you copy the random
   # data using `cp' instead of writing them directly using `dd' command
*  check logical volumes stats using `lvs' and `df -h'
8. rm /mnt/vdolv/urandom_data.bin
*  check logical volumes stats using `lvs' and `df -h'
9. repeat step #7


Actual results:
dd: error writing '/mnt/vdolv/urandom_data.bin': Read-only file system
2+0 records in
1+0 records out 
1384009728 bytes (1.4 GB, 1.3 GiB) copied, 41.5512 s, 33.3 MB/s

[root@virt-175 ~]# lvs 
  LV       VG            Attr       LSize   Pool     Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root     rhel_virt-175 -wi-ao----  <6.20g
  swap     rhel_virt-175 -wi-ao---- 820.00m
  vdo_pool vdovg         dwi-------  10.00g                 100.00
  vdolv    vdovg         vwi-aov---  20.00g vdo_pool        29.94

[root@virt-175 ~]# df -h
Filesystem                       Size  Used Avail Use% Mounted on
devtmpfs                         991M     0  991M   0% /dev
tmpfs                           1011M     0 1011M   0% /dev/shm
tmpfs                           1011M   22M  989M   3% /run
tmpfs                           1011M     0 1011M   0% /sys/fs/cgroup
/dev/mapper/rhel_virt--175-root  6.2G  3.1G  3.2G  49% /
/dev/vda1                       1014M  194M  821M  20% /boot
tmpfs                            202M     0  202M   0% /run/user/0
/dev/mapper/vdovg-vdolv           20G  1.4G   18G   8% /mnt/vdolv

[root@virt-175 ~]# umount /mnt/vdolv/
[root@virt-175 ~]# fsck.ext4 /dev/vdovg/vdolv
e2fsck 1.45.4 (23-Sep-2019)
/dev/vdovg/vdolv: recovering journal
Superblock needs_recovery flag is clear, but journal has data.
Run journal anyway<y>? yes
fsck.ext4: unable to set superblock flags on /dev/vdovg/vdolv


/dev/vdovg/vdolv: ********** WARNING: Filesystem still has errors **********


Expected results:
All data (5GB) has been written. `vdolv' volume is available for further use.

Comment 1 Zdenek Kabelac 2020-06-16 13:58:54 UTC

Getting to the step  8. - since the filesystem is mounted without immediate discard  (and this is usually recommended way) - after 'rm' it's users' responsibility to initiated trimming for release fs block - by running fstrim as step 8.
(But fstrim is quite SLOW operation with VDO volumes compared to Thin volumes)

So report looks like misusage of VDO volumes - but let's just add few more comments:

Primary goal *IS* to avoid hitting out-of-space state. Once user is reaching 'full' pool (applies to both Thin & VDO) user has to deal with consequences.  The most usable 'recovery' scenario is to extend pool to accommodate more user's data. 

Once the pool is out-of-space - there is no easy way to 'repair' i.e. filesystem located on such device
as there are no free blocks to be written by fileystem fsck operation (with VDO situation is even worts,
since even overwrite of already owned block may require new few block in pool)

Unlike filesystems like btrfs/zfs - combination of ext4  & provisioned device works with two different entities.

So to avoid o-o-s above - user should enable/use  autoextension of VDO pool device when it writes more data then it is currently available.

User cannot expect/take out-of-space of VDO device as 'similar' case to  ouf-of-space filesystem - these 2 cases are very different!

When the filesystem 'exhaust' ALL blocks of provisioned device (be it Thin or VDO) - it may not be able to further update even it's metadata - user is basically reaching 'dead-end' and unmounting (when possible) is required and before fsck *new space* has to be added to pool - so 'repair' can proceed and acquire new empty block in pool.

Once the filesystem is 'repaired/fixed' - user can fire i.e. 'fstrim' to reclaim free blocks and return then back to the pool.
(with VDO volume usage of fstrim can be a very lengthy/slow operation for big VDO pool!)

Comment 3 Petr Beranek 2020-06-19 13:18:04 UTC

Thank you, Zdenek, for clarification. This important detail, that user is
responsible for discarding unused fs blocks, is missing in current lvmvdo(7)
manpage and it doesn't seem to be obvious from the product itself. Our
users/customers should be explicitly warn about this, or better, we should
also provide recommendations how to deal with it.

This risk may be mitigated by proper volume monitoring/autoextension, but
anyway, we should not expect, that all our users/customers have always
sufficient VDO expertise.

Therefore I have opened a bug (https://bugzilla.redhat.com/show_bug.cgi?id=1849009)
related to the current lvmvdo documentation.