Bug 1119839
Summary: | [RFE] LVM Thin: Enable use of error_if_no_space | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Jonathan Earl Brassow <jbrassow> |
Component: | lvm2 | Assignee: | Zdenek Kabelac <zkabelac> |
lvm2 sub component: | Thin Provisioning | QA Contact: | Cluster QE <mspqa-list> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | high | CC: | agk, cmarthal, dsulliva, heinzm, jbrassow, msnitzer, nperic, prajnoha, prockai, thornber, tlavigne, zkabelac |
Version: | 7.0 | Keywords: | FutureFeature, Triaged |
Target Milestone: | rc | ||
Target Release: | 7.1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | lvm2-2.02.114-5.el7 | Doc Type: | Enhancement |
Doc Text: |
lvm2 should support returning instant errors when thin-pool get out-of-space.
In some cases, user doesn't want to resize thin pool and thus doesn't want to use default queue policy when pool gets full (which now timeouts in 60 seconds).
New lvcreate and lvchange option --errorwhenfull {y|n} has been implemented to control thin pool's behavior.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2015-03-05 13:09:21 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1044717, 1059771, 1119323 |
Description
Jonathan Earl Brassow
2014-07-15 15:45:33 UTC
*** Bug 1059771 has been marked as a duplicate of this bug. *** Add THIN_FEATURE_ERROR_WHEN_FULL and set if supported by kernel target version. Add to global/thin_disabled_features (example.conf) as "error_when_full". Set error_if_no_space feature arg to target line if available and requested. Add error_when_full to segment metadata. Add --errorwhenfull to lvcreate/lvchange to store on-disk. I assume we want to have here similar use style as with cache. So with cache pools we have --cachepolicy & --cachesettings, We may use --thinpolicy --thinsettings. (Maybe rather [thinpoo] prefix?) Then command itself could even handle --policy and --settings and deduce proper prefix from context ? lvcreate --policy=[wait|error] ? and settings for wait could be like timeout - so dmeventd could handle switch itself to error in case there is no action and resize of meta/data fails. Kernel has currently built-in timer 30s - so user would need to reinsert module with different settings if he wants longer timeout. For policy 'error' there are likely no settings ? Initial implementation went upstream with patch: https://www.redhat.com/archives/lvm-devel/2015-January/msg00022.html (not yet with support for lvchange - will follow up) For now supported commands: lvcreate --errorwhenfull y -T -Lsize vgname/poolname lvs -o+lv_error_when_full,lv_healt_status. lvs attr 9 shows F (failed),D (out of data),M (metadata read only), X (unknown) (In reply to Zdenek Kabelac from comment #7) > For now supported commands: > > lvcreate --errorwhenfull y -T -Lsize vgname/poolname > > lvs -o+lv_error_when_full,lv_healt_status. We changed that to lv_when_full with values "error", "queue" or "" (blank for undefined - if the LV is not thin pool): https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=7bcb3fb02d6aacc566871326c0d01c331497a5b2 $ lvs -o+lv_when_full vg/pool vg/pool1 vg/linear_lv LV VG Attr LSize Data% Meta% WhenFull linear_lv vg -wi-a----- 4.00m pool vg twi-aotz-- 4.00m 0.00 0.98 queue pool1 vg twi-aotzD- 4.00m 100.00 0.98 error For -S|--select these synonyms are recognized: "error" -> "error when full", "error if no space" "queue" -> "queue when full", "queue if no space" "" -> "undefined" This will appear in today's new build. This appears to work properly for thin volumes, marking verifed in the latest rpms. 3.10.0-225.el7.x86_64 lvm2-2.02.115-2.el7 BUILT: Thu Jan 22 06:09:14 CST 2015 lvm2-libs-2.02.115-2.el7 BUILT: Thu Jan 22 06:09:14 CST 2015 lvm2-cluster-2.02.115-2.el7 BUILT: Thu Jan 22 06:09:14 CST 2015 device-mapper-1.02.93-2.el7 BUILT: Thu Jan 22 06:09:14 CST 2015 device-mapper-libs-1.02.93-2.el7 BUILT: Thu Jan 22 06:09:14 CST 2015 device-mapper-event-1.02.93-2.el7 BUILT: Thu Jan 22 06:09:14 CST 2015 device-mapper-event-libs-1.02.93-2.el7 BUILT: Thu Jan 22 06:09:14 CST 2015 device-mapper-persistent-data-0.4.1-2.el7 BUILT: Wed Nov 12 12:39:46 CST 2014 cmirror-2.02.115-2.el7 BUILT: Thu Jan 22 06:09:14 CST 2015 [root@host-116 ~]# lvcreate --errorwhenfull y -T -L100M vg/POOL1 Logical volume "POOL1" created. [root@host-116 ~]# lvcreate --errorwhenfull n -T -L100M vg/POOL2 Logical volume "POOL2" created. [root@host-116 ~]# lvcreate --virtualsize 500M --thinpool vg/POOL1 -n virt_1 Logical volume "virt_1" created. [root@host-116 ~]# lvcreate --virtualsize 500M --thinpool vg/POOL2 -n virt_2 Logical volume "virt_2" created. [root@host-116 ~]# lvs -a -o +lv_health_status,lv_when_full LV VG Attr LSize Pool Data% Meta% Health WhenFull POOL1 vg twi-aotz-- 100.00m 0.00 0.98 error [POOL1_tdata] vg Twi-ao---- 100.00m [POOL1_tmeta] vg ewi-ao---- 4.00m POOL2 vg twi-aotz-- 100.00m 0.00 0.98 queue [POOL2_tdata] vg Twi-ao---- 100.00m [POOL2_tmeta] vg ewi-ao---- 4.00m [lvol0_pmspare] vg ewi------- 4.00m virt_1 vg Vwi-a-tz-- 500.00m POOL1 0.00 virt_2 vg Vwi-a-tz-- 500.00m POOL2 0.00 # FIRST ERROR: [root@host-116 ~]# dd if=/dev/zero of=/dev/vg/virt_1 bs=1M count=600 dd: error writing ‘/dev/vg/virt_1’: No space left on device 501+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 1.7768 s, 295 MB/s Jan 26 18:04:50 host-116 kernel: device-mapper: thin: 253:4: reached low water mark for data device: sending event. Jan 26 18:04:50 host-116 lvm[15225]: Thin vg-POOL1-tpool is now 100% full. Jan 26 18:04:50 host-116 kernel: device-mapper: thin: 253:4: switching pool to out-of-data-space mode Jan 26 18:04:50 host-116 kernel: Buffer I/O error on device dm-9, logical block 25600 Jan 26 18:04:50 host-116 kernel: lost page write due to I/O error on dm-9 root@host-116 ~]# lvs -a -o +lv_health_status,lv_when_full LV VG Attr LSize Pool Data% Meta% Health WhenFull POOL1 vg twi-aotzD- 100.00m 100.00 2.15 out_of_data error [POOL1_tdata] vg Twi-ao---- 100.00m [POOL1_tmeta] vg ewi-ao---- 4.00m POOL2 vg twi-aotz-- 100.00m 0.00 0.98 queue [POOL2_tdata] vg Twi-ao---- 100.00m [POOL2_tmeta] vg ewi-ao---- 4.00m [lvol0_pmspare] vg ewi------- 4.00m virt_1 vg Vwi-a-tz-- 500.00m POOL1 20.00 virt_2 vg Vwi-a-tz-- 500.00m POOL2 0.00 # SECOND QUEUE: [root@host-116 ~]# dd if=/dev/zero of=/dev/vg/virt_2 bs=1M count=600 [ HANG (like expected) ] Jan 26 18:07:34 host-116 kernel: device-mapper: thin: 253:7: reached low water mark for data device: sending event. Jan 26 18:07:34 host-116 lvm[15225]: Thin vg-POOL2-tpool is now 100% full. Jan 26 18:07:34 host-116 kernel: device-mapper: thin: 253:7: switching pool to out-of-data-space mode [root@host-116 ~]# lvs -a -o +lv_health_status,lv_when_full LV VG Attr LSize Pool Data% Meta% Health WhenFull POOL1 vg twi-aotzD- 100.00m 100.00 2.15 out_of_data error [POOL1_tdata] vg Twi-ao---- 100.00m [POOL1_tmeta] vg ewi-ao---- 4.00m POOL2 vg twi-aotzD- 100.00m 100.00 2.15 out_of_data queue [POOL2_tdata] vg Twi-ao---- 100.00m [POOL2_tmeta] vg ewi-ao---- 4.00m [lvol0_pmspare] vg ewi------- 4.00m virt_1 vg Vwi-a-tz-- 500.00m POOL1 20.00 virt_2 vg Vwi-aotz-- 500.00m POOL2 20.00 Just some additional notes if someone encounters this situation and is not sure what to do from that point on. A user should expect that at this point (100% full) thin pool is effectively broken/corrupted and should be manually fixed and its size increased. It _cannot_ be resized when it is in the errored out state. The way to do so is to deactivate a pool, then activate it again (which will initiate an internal thin_check) then do a lvconvert --repair. And finally resize the pool so it is no longer 100% full. A note: some data stored on thin LVs in this thin pool may be missing or corrupted due to overfilling of the thin pool. So any FS on those LVs should undergo deep file system checks as well afterwards (not sure though if that guarantees the data consistency though). Another way is to reboot :) but a lvconvert --repair + resize has to be executed as well after the machine boots. In short, you should never allow the thin pool to get to a state of being full, and should set (ie. reduce) the threshold of auto-resize in lvm.conf (thin_pool_autoextend_threshold) to anything other than 100% depending on your pool size. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0513.html |