1119839 – [RFE] LVM Thin: Enable use of error_if_no_space

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1119839 - [RFE] LVM Thin: Enable use of error_if_no_space

Summary: [RFE] LVM Thin: Enable use of error_if_no_space

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	7.1
Assignee:	Zdenek Kabelac
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1059771 (view as bug list)
Depends On:
Blocks:	1044717 1059771 1119323
TreeView+	depends on / blocked

Reported:	2014-07-15 15:45 UTC by Jonathan Earl Brassow
Modified:	2023-03-08 07:26 UTC (History)
CC List:	12 users (show)
Fixed In Version:	lvm2-2.02.114-5.el7
Doc Type:	Enhancement
Doc Text:	lvm2 should support returning instant errors when thin-pool get out-of-space. In some cases, user doesn't want to resize thin pool and thus doesn't want to use default queue policy when pool gets full (which now timeouts in 60 seconds). New lvcreate and lvchange option --errorwhenfull {y\|n} has been implemented to control thin pool's behavior.
Clone Of:
Environment:
Last Closed:	2015-03-05 13:09:21 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:0513	0	normal	SHIPPED_LIVE	lvm2 bug fix and enhancement update	2015-03-05 16:14:41 UTC

Description Jonathan Earl Brassow 2014-07-15 15:45:33 UTC

Some users would rather have thin volumes give errors instead of blocking when they run out of space.  This capability exists in the kernel.  Can we add a tunable to LVM for this?

Comment 3 Jonathan Earl Brassow 2014-10-29 23:34:36 UTC

*** Bug 1059771 has been marked as a duplicate of this bug. ***

Comment 4 Alasdair Kergon 2014-12-02 00:01:44 UTC

Add THIN_FEATURE_ERROR_WHEN_FULL and set if supported by kernel target version.
Add to global/thin_disabled_features (example.conf) as "error_when_full".

Set error_if_no_space feature arg to target line if available and requested.
Add error_when_full to segment metadata.
Add --errorwhenfull to lvcreate/lvchange to store on-disk.

Comment 5 Zdenek Kabelac 2014-12-16 09:11:27 UTC

I assume we want to have here similar use style as with cache.

So with cache pools we have  --cachepolicy & --cachesettings,
We may use  --thinpolicy  --thinsettings.
(Maybe rather [thinpoo] prefix?)

Then command itself could even handle  --policy  and --settings and deduce proper prefix from context ?

lvcreate --policy=[wait|error] ?

and settings for wait could be like   timeout - so dmeventd  could handle switch itself to error in case there is no action and resize of meta/data fails.

Kernel has currently built-in timer 30s - so user would need to reinsert module with different settings if he wants longer timeout.

For policy 'error' there are likely no settings ?

Comment 6 Zdenek Kabelac 2015-01-14 14:29:40 UTC

Initial implementation went upstream with patch:

https://www.redhat.com/archives/lvm-devel/2015-January/msg00022.html

(not yet with support for lvchange - will follow up)

Comment 7 Zdenek Kabelac 2015-01-14 14:33:55 UTC

For now supported commands:

lvcreate --errorwhenfull y  -T -Lsize  vgname/poolname

lvs -o+lv_error_when_full,lv_healt_status.

lvs attr 9 shows F (failed),D (out of data),M (metadata read only), X (unknown)

Comment 9 Peter Rajnoha 2015-01-21 10:40:13 UTC

(In reply to Zdenek Kabelac from comment #7)
> For now supported commands:
> 
> lvcreate --errorwhenfull y  -T -Lsize  vgname/poolname
> 
> lvs -o+lv_error_when_full,lv_healt_status.

We changed that to lv_when_full with values "error", "queue" or "" (blank for undefined - if the LV is not thin pool):

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=7bcb3fb02d6aacc566871326c0d01c331497a5b2

$ lvs -o+lv_when_full vg/pool vg/pool1 vg/linear_lv
  LV        VG   Attr       LSize Data%  Meta%  WhenFull       
  linear_lv vg   -wi-a----- 4.00m                              
  pool      vg   twi-aotz-- 4.00m 0.00   0.98   queue          
  pool1     vg   twi-aotzD- 4.00m 100.00 0.98   error

For -S|--select these synonyms are recognized:
  "error" -> "error when full", "error if no space"
  "queue" -> "queue when full", "queue if no space"
       "" -> "undefined"

This will appear in today's new build.

Comment 10 Corey Marthaler 2015-01-27 00:15:06 UTC

This appears to work properly for thin volumes, marking verifed in the latest rpms.


3.10.0-225.el7.x86_64
lvm2-2.02.115-2.el7    BUILT: Thu Jan 22 06:09:14 CST 2015
lvm2-libs-2.02.115-2.el7    BUILT: Thu Jan 22 06:09:14 CST 2015
lvm2-cluster-2.02.115-2.el7    BUILT: Thu Jan 22 06:09:14 CST 2015
device-mapper-1.02.93-2.el7    BUILT: Thu Jan 22 06:09:14 CST 2015
device-mapper-libs-1.02.93-2.el7    BUILT: Thu Jan 22 06:09:14 CST 2015
device-mapper-event-1.02.93-2.el7    BUILT: Thu Jan 22 06:09:14 CST 2015
device-mapper-event-libs-1.02.93-2.el7    BUILT: Thu Jan 22 06:09:14 CST 2015
device-mapper-persistent-data-0.4.1-2.el7    BUILT: Wed Nov 12 12:39:46 CST 2014
cmirror-2.02.115-2.el7    BUILT: Thu Jan 22 06:09:14 CST 2015



[root@host-116 ~]# lvcreate --errorwhenfull y  -T -L100M  vg/POOL1
  Logical volume "POOL1" created.
[root@host-116 ~]# lvcreate --errorwhenfull n  -T -L100M  vg/POOL2
  Logical volume "POOL2" created.
[root@host-116 ~]# lvcreate --virtualsize 500M --thinpool vg/POOL1 -n virt_1
  Logical volume "virt_1" created.
[root@host-116 ~]# lvcreate --virtualsize 500M --thinpool vg/POOL2 -n virt_2
  Logical volume "virt_2" created.

[root@host-116 ~]# lvs -a -o +lv_health_status,lv_when_full
  LV              VG  Attr       LSize   Pool  Data%  Meta% Health WhenFull       
  POOL1           vg  twi-aotz-- 100.00m       0.00   0.98         error
  [POOL1_tdata]   vg  Twi-ao---- 100.00m 
  [POOL1_tmeta]   vg  ewi-ao----   4.00m
  POOL2           vg  twi-aotz-- 100.00m       0.00   0.98         queue
  [POOL2_tdata]   vg  Twi-ao---- 100.00m
  [POOL2_tmeta]   vg  ewi-ao----   4.00m
  [lvol0_pmspare] vg  ewi-------   4.00m
  virt_1          vg  Vwi-a-tz-- 500.00m POOL1 0.00
  virt_2          vg  Vwi-a-tz-- 500.00m POOL2 0.00

# FIRST ERROR:
[root@host-116 ~]# dd if=/dev/zero of=/dev/vg/virt_1 bs=1M count=600
dd: error writing ‘/dev/vg/virt_1’: No space left on device
501+0 records in
500+0 records out
524288000 bytes (524 MB) copied, 1.7768 s, 295 MB/s

Jan 26 18:04:50 host-116 kernel: device-mapper: thin: 253:4: reached low water mark for data device: sending event.
Jan 26 18:04:50 host-116 lvm[15225]: Thin vg-POOL1-tpool is now 100% full.
Jan 26 18:04:50 host-116 kernel: device-mapper: thin: 253:4: switching pool to out-of-data-space mode
Jan 26 18:04:50 host-116 kernel: Buffer I/O error on device dm-9, logical block 25600
Jan 26 18:04:50 host-116 kernel: lost page write due to I/O error on dm-9

root@host-116 ~]# lvs -a -o +lv_health_status,lv_when_full
  LV              VG  Attr       LSize   Pool  Data%  Meta% Health       WhenFull       
  POOL1           vg  twi-aotzD- 100.00m       100.00 2.15  out_of_data  error
  [POOL1_tdata]   vg  Twi-ao---- 100.00m
  [POOL1_tmeta]   vg  ewi-ao----   4.00m
  POOL2           vg  twi-aotz-- 100.00m       0.00   0.98               queue
  [POOL2_tdata]   vg  Twi-ao---- 100.00m
  [POOL2_tmeta]   vg  ewi-ao----   4.00m
  [lvol0_pmspare] vg  ewi-------   4.00m
  virt_1          vg  Vwi-a-tz-- 500.00m POOL1 20.00
  virt_2          vg  Vwi-a-tz-- 500.00m POOL2 0.00

# SECOND QUEUE:
[root@host-116 ~]# dd if=/dev/zero of=/dev/vg/virt_2 bs=1M count=600
[ HANG (like expected) ]

Jan 26 18:07:34 host-116 kernel: device-mapper: thin: 253:7: reached low water mark for data device: sending event.
Jan 26 18:07:34 host-116 lvm[15225]: Thin vg-POOL2-tpool is now 100% full.
Jan 26 18:07:34 host-116 kernel: device-mapper: thin: 253:7: switching pool to out-of-data-space mode

[root@host-116 ~]# lvs -a -o +lv_health_status,lv_when_full
  LV              VG  Attr       LSize   Pool  Data%  Meta% Health       WhenFull       
  POOL1           vg  twi-aotzD- 100.00m       100.00 2.15  out_of_data  error
  [POOL1_tdata]   vg  Twi-ao---- 100.00m
  [POOL1_tmeta]   vg  ewi-ao----   4.00m
  POOL2           vg  twi-aotzD- 100.00m       100.00 2.15  out_of_data  queue
  [POOL2_tdata]   vg  Twi-ao---- 100.00m
  [POOL2_tmeta]   vg  ewi-ao----   4.00m
  [lvol0_pmspare] vg  ewi-------   4.00m
  virt_1          vg  Vwi-a-tz-- 500.00m POOL1 20.00
  virt_2          vg  Vwi-aotz-- 500.00m POOL2 20.00

Comment 11 Nenad Peric 2015-01-27 07:46:15 UTC

Just some additional notes if someone encounters this situation and is not sure what to do from that point on.

A user should expect that at this point (100% full) thin pool is effectively broken/corrupted and should be manually fixed and its size increased. 
It _cannot_ be resized when it is in the errored out state. 

The way to do so is to deactivate a pool, then activate it again (which will initiate an internal thin_check) then do a lvconvert --repair. And finally resize the pool so it is no longer 100% full. 
A note: some data stored on thin LVs in this thin pool may be missing or corrupted due to overfilling of the thin pool. So any FS on those LVs should undergo deep file system checks as well afterwards (not sure though if that guarantees the data consistency though).

Another way is to reboot :) but a lvconvert --repair + resize has to be executed as well after the machine boots. 


In short, you should never allow the thin pool to get to a state of being full, and should set (ie. reduce) the threshold of auto-resize in lvm.conf (thin_pool_autoextend_threshold) to anything other than 100% depending on your pool size.

Comment 13 errata-xmlrpc 2015-03-05 13:09:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0513.html

Note You need to log in before you can comment on or make changes to this bug.