Bug 948001
| Summary: | lvm thin pool will hang system when full | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Milos Vyletel <milos.vyletel> | ||||
| Component: | lvm2 | Assignee: | Zdenek Kabelac <zkabelac> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | Cluster QE <mspqa-list> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 6.4 | CC: | agk, dsulliva, dustymabe, dwysocha, heinzm, jbrassow, jkulesa, msnitzer, nperic, prajnoha, prockai, rmarti, thornber, zkabelac | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1059771 (view as bug list) | Environment: | |||||
| Last Closed: | 2014-04-30 13:47:06 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 994246, 1056252 | ||||||
| Attachments: |
|
||||||
|
Description
Milos Vyletel
2013-04-03 18:24:11 UTC
There is another way - to add/extend new PV to VG and resize thin pool volume to bigger size - but yes - we want improve policy handling. Thanks. I've had plenty of space available in VG, I could create/resize the pool to bigger size but the point is that even though I've started at small size (128M) with virtual size of 12G the system hang is not cool. Here's how I got to this point. We have a in house made GUI talking to libvirt and creating guests. While creating we've been using capacity xml tag to pass the size to libvirt and allocation xml tag set to 0 because LV pool in libvirt did not support sparse volumes. RHEL 6.4 however adds sparse LV support so by using allocation of 0 we tell libvirt to create sparse volume. This is obviously something we fix in our code but I went ahead and started to test it. Default snapshot based sparse LVs are useless. Once they are filled to 100% per design of snapshot is invalidated and all data are gone. Since we create the snapshot of 32M (1 extent) in size we will fill it up very quickly because dmeventd is not able to keep up with the autoextend. That lead me to thin volumes when we see same behaviour (dmeventd not able to resize fast enough) and once we fill the thin volume system hangs. Seems you need to be more conservative with the amount of space you add to the thin-pool. I understand your concern but you're not using the thin-pool device as designed. What exactly are you saying you want to happen? I understand that "system hang is not cool" but it isn't system hang; it is a hang of IO being issued to the thin pool. Now if these leads to system-wide deadlock due to interdependent writeback needed to free memory in the VM (as in memory mgmt VM ;) then yes that certainly isn't cool. Again, what would be the ideal response you're looking for? Do you just want the thin-pool's metadata to transition to read-only mode? This means writes will fail with -EIO. Ideally I would like to see behavior similar (or same) as with regular LVs. When we're running out of space and currently allocated size is <= virtual size I would expect delayed IO or EBUSY until it's enlarged. When we hit the virtual size boundary I would expect ENOSPC error while keeping the LV mounted RW so that one can clean up when necessary. Agreed with Milos, if we encounter a situation where we inadvertently run out of space, I can see "hanging" being an answer until the pool is expanded, however, most folks would expect an ENOSPC error to kick back. Indeed its hard to unwedge the situation since the LVM tools also lock up when they stat the thin volume that's hung. Once you run a pool out of space - you are SOL. That's not Enterprise Linux, sorry :-)
I was excited about thin volumes until I discovered this. Pretty much unusable until this is fixed.
For example here is an strace of the lvs command:
ioctl(3, DM_TABLE_STATUS, 0x1813fa0) = 0
stat("/dev/vg-local-test01/thin", {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 5), ...}) = 0
open("/dev/vg-local-test01/thin", O_RDONLY|O_DIRECT|O_NOATIME
[hang]
Hi, I've noticed that this BZ was moved to needinfo. Not sure if I should provide any additional information or not but I've taken a look at the code and found upstream commit(3e1a0699095803e53072699a4a1485af7744601d) that seem to be enhancing error handling in this particular case. I hope I'll find some time to test this in coming days. I'm attaching the patch. Created attachment 888625 [details]
3e1a0699095803e53072699a4a1485af7744601d upstream commit
Quality Engineering Management has reviewed and declined this request. You may appeal this decision by reopening this request. |